3

本地启动RocketMQ未映射主机名产生的超时问题 - nuccch

 6 months ago
source link: https://www.cnblogs.com/nuccch/p/18033047
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

本地启动RocketMQ未映射主机名产生的超时问题

参考RocketMQ官方文档在本地启动一个验证环境的时候遇到超时报错问题。

本地环境OS:CentOS Linux release 8.5.2111

首先,进入到RocketMQ安装目录,如:~/opt/rocketmq-all-5.2.0-bin-release

执行如下命令启动NameServer:

$ sh bin/mqnamesrv

该命令执行很慢,但是最终还是显示启动NameServer成功了,输出日志如下:

Java HotSpot(TM) 64-Bit Server VM warning: Using the DefNew young collector with the CMS collector is deprecated and will likely be removed in a future release
Java HotSpot(TM) 64-Bit Server VM warning: UseCMSCompactAtFullCollection is deprecated and will likely be removed in a future release.
The Name Server boot success. serializeType=JSON, address 0.0.0.0:9876

执行jps命令也能看到相应进程:

$ jps
13730 NamesrvStartup

执行如下命令启动Broker + Proxy:

$ sh bin/mqbroker -n localhost:9876 --enable-proxy

该命令执行非常漫长,差不多要90s左右才会输出如下日志:

Sat Feb 24 19:48:03 CST 2024 rocketmq-proxy startup successfully

~/logs/rocketmqlogs/proxy.log日志中也能看到broker启动成功的日志:

2024-02-24 19:47:53 INFO main - The broker[broker-a, 192.168.88.135:10911] boot success. serializeType=JSON and name server is localhost:9876

注意:日志中的broker-a是在broker.conf文件中配置的brokerName参数,如下所示:

brokerClusterName = DefaultCluster
brokerName = broker-a # 配置的默认brokerName参数
brokerId = 0
deleteWhen = 04
fileReservedTime = 48
brokerRole = ASYNC_MASTER
flushDiskType = ASYNC_FLUSH

再次执行jps命令确认相应进程是否已经启动:

$ jps
jps
13730 NamesrvStartup
14410 ProxyStartup

一切似乎看起来都正常,从~/logs/rocketmqlogs/namesrv.log~/logs/rocketmqlogs/proxy.log日志中也看不出明显的异常。

但是在创建Topic时就会报错:

$ sh bin/mqadmin updatetopic -n localhost:9876 -t TestTopic -c DefaultCluster

该命令在执行大约40s左右就会输出如下报错日志:

org.apache.rocketmq.tools.command.SubCommandException: UpdateTopicSubCommand command failed
        at org.apache.rocketmq.tools.command.topic.UpdateTopicSubCommand.execute(UpdateTopicSubCommand.java:198)
        at org.apache.rocketmq.tools.command.MQAdminStartup.main0(MQAdminStartup.java:164)
        at org.apache.rocketmq.tools.command.MQAdminStartup.main(MQAdminStartup.java:114)
Caused by: org.apache.rocketmq.remoting.exception.RemotingTimeoutException: invokeSync call the addr[127.0.0.1:9876] timeout
        at org.apache.rocketmq.remoting.netty.NettyRemotingClient.invokeSync(NettyRemotingClient.java:549)
        at org.apache.rocketmq.client.impl.MQClientAPIImpl.getBrokerClusterInfo(MQClientAPIImpl.java:1961)
        at org.apache.rocketmq.tools.admin.DefaultMQAdminExtImpl.examineBrokerClusterInfo(DefaultMQAdminExtImpl.java:577)
        at org.apache.rocketmq.tools.admin.DefaultMQAdminExt.examineBrokerClusterInfo(DefaultMQAdminExt.java:318)
        at org.apache.rocketmq.tools.command.CommandUtil.fetchMasterAddrByClusterName(CommandUtil.java:94)
        at org.apache.rocketmq.tools.command.topic.UpdateTopicSubCommand.execute(UpdateTopicSubCommand.java:171)
        ... 2 more

从报错信息看似乎是无法连接127.0.0.1:9876,但是经过验证发现该地址是一定可以连通的,再几经尝试之后依然报错。

于是换了一台Windows机器继续验证,奇怪的是在Windows机器上一切正常,而且我注意到在Windows环境启动RocketMQ的时候brokerName使用是主机名,如下日志:

# zhangsan是主机名
The broker[zhangsan, 20.5.133.188:10911] boot success. serializeType=JSON and name server is localhost:9876

于是脑袋中突然闪现一个疑问,是不是因为没有在CentOS的/etc/hosts文件中映射主机名与127.0.0.1地址导致的。

验证后果然就正常的。

根据相关报错日志梳理RocketMQ的源代码,报错是因为在NettyRemotingClient.invokeSync()方法中做了超时判断。

@Override
public RemotingCommand invokeSync(String addr, final RemotingCommand request, long timeoutMillis)
    throws InterruptedException, RemotingConnectException, RemotingSendRequestException, RemotingTimeoutException {
    long beginStartTime = System.currentTimeMillis();
    final Channel channel = this.getAndCreateChannel(addr);
    String channelRemoteAddr = RemotingHelper.parseChannelRemoteAddr(channel);
    if (channel != null && channel.isActive()) {
        long left = timeoutMillis; // 默认超时时长是5000ms
        try {
            long costTime = System.currentTimeMillis() - beginStartTime;
            left -= costTime;
            if (left <= 0) { // 当执行时长超过5s时直接抛出异常
                throw new RemotingTimeoutException("invokeSync call the addr[" + channelRemoteAddr + "] timeout");
            }
            RemotingCommand response = this.invokeSyncImpl(channel, request, left);
            updateChannelLastResponseTime(addr);
            return response;
        }
        //其他代码省略...
    }
    //其他代码省略...
}

由于是做了超时检查抛出的异常,所以单纯从日志信息看就会认为是无法连接127.0.0.1:9876,实际上该地址是可以连通的。

进一步追踪发现,是在执行Netty的ReflectiveChannelFactory.newChannel()方法耗时较长,约10s左右。

@Override
public T newChannel() {
    try {
        // constructor是NioSocketChannel.class
        // 所以本质上这里是要通过反射的方式实例化一个NioSocketChannel对象
        T t = constructor.newInstance();
        return t;
    } catch (Throwable t) {
        throw new ChannelException("Unable to create Channel from class " + constructor.getDeclaringClass(), t);
    }
}

验证代码如下:

long start = System.currentTimeMillis();
Constructor constructor = NioSocketChannel.class.getConstructor();
constructor.newInstance();
System.out.println(String.format("%s ms", System.currentTimeMillis() - start));

执行后输出日志:

10144 ms

奇怪的是,当在/etc/hosts文件中明确指定主机名与127.0.0.1的映射关系后,执行就非常快。

暂时还不清楚这个地方的深层次原因是什么,为什么通过反射方式实例化NioSocketChannel对象会跟主机名与127.0.0.1的映射有关系呢?

【参考】
Windows 启动RocketMQ


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK