• 本地启动RocketMQ未映射主机名产生的超时问题


    问题描述

    参考RocketMQ官方文档在本地启动一个验证环境的时候遇到超时报错问题。

    本地环境OS:CentOS Linux release 8.5.2111

    首先,进入到RocketMQ安装目录,如:~/opt/rocketmq-all-5.2.0-bin-release

    执行如下命令启动NameServer:

    $ sh bin/mqnamesrv
    

    该命令执行很慢,但是最终还是显示启动NameServer成功了,输出日志如下:

    Java HotSpot(TM) 64-Bit Server VM warning: Using the DefNew young collector with the CMS collector is deprecated and will likely be removed in a future release
    Java HotSpot(TM) 64-Bit Server VM warning: UseCMSCompactAtFullCollection is deprecated and will likely be removed in a future release.
    The Name Server boot success. serializeType=JSON, address 0.0.0.0:9876
    

    执行jps命令也能看到相应进程:

    $ jps
    13730 NamesrvStartup
    

    执行如下命令启动Broker + Proxy:

    $ sh bin/mqbroker -n localhost:9876 --enable-proxy
    

    该命令执行非常漫长,差不多要90s左右才会输出如下日志:

    Sat Feb 24 19:48:03 CST 2024 rocketmq-proxy startup successfully
    

    ~/logs/rocketmqlogs/proxy.log日志中也能看到broker启动成功的日志:

    2024-02-24 19:47:53 INFO main - The broker[broker-a, 192.168.88.135:10911] boot success. serializeType=JSON and name server is localhost:9876
    

    注意:日志中的broker-a是在broker.conf文件中配置的brokerName参数,如下所示:

    brokerClusterName = DefaultCluster
    brokerName = broker-a # 配置的默认brokerName参数
    brokerId = 0
    deleteWhen = 04
    fileReservedTime = 48
    brokerRole = ASYNC_MASTER
    flushDiskType = ASYNC_FLUSH
    

    再次执行jps命令确认相应进程是否已经启动:

    $ jps
    jps
    13730 NamesrvStartup
    14410 ProxyStartup
    

    一切似乎看起来都正常,从~/logs/rocketmqlogs/namesrv.log~/logs/rocketmqlogs/proxy.log日志中也看不出明显的异常。

    但是在创建Topic时就会报错:

    $ sh bin/mqadmin updatetopic -n localhost:9876 -t TestTopic -c DefaultCluster
    

    该命令在执行大约40s左右就会输出如下报错日志:

    org.apache.rocketmq.tools.command.SubCommandException: UpdateTopicSubCommand command failed
            at org.apache.rocketmq.tools.command.topic.UpdateTopicSubCommand.execute(UpdateTopicSubCommand.java:198)
            at org.apache.rocketmq.tools.command.MQAdminStartup.main0(MQAdminStartup.java:164)
            at org.apache.rocketmq.tools.command.MQAdminStartup.main(MQAdminStartup.java:114)
    Caused by: org.apache.rocketmq.remoting.exception.RemotingTimeoutException: invokeSync call the addr[127.0.0.1:9876] timeout
            at org.apache.rocketmq.remoting.netty.NettyRemotingClient.invokeSync(NettyRemotingClient.java:549)
            at org.apache.rocketmq.client.impl.MQClientAPIImpl.getBrokerClusterInfo(MQClientAPIImpl.java:1961)
            at org.apache.rocketmq.tools.admin.DefaultMQAdminExtImpl.examineBrokerClusterInfo(DefaultMQAdminExtImpl.java:577)
            at org.apache.rocketmq.tools.admin.DefaultMQAdminExt.examineBrokerClusterInfo(DefaultMQAdminExt.java:318)
            at org.apache.rocketmq.tools.command.CommandUtil.fetchMasterAddrByClusterName(CommandUtil.java:94)
            at org.apache.rocketmq.tools.command.topic.UpdateTopicSubCommand.execute(UpdateTopicSubCommand.java:171)
            ... 2 more
    

    从报错信息看似乎是无法连接127.0.0.1:9876,但是经过验证发现该地址是一定可以连通的,再几经尝试之后依然报错。

    于是换了一台Windows机器继续验证,奇怪的是在Windows机器上一切正常,而且我注意到在Windows环境启动RocketMQ的时候brokerName使用是主机名,如下日志:

    # zhangsan是主机名
    The broker[zhangsan, 20.5.133.188:10911] boot success. serializeType=JSON and name server is localhost:9876
    

    于是脑袋中突然闪现一个疑问,是不是因为没有在CentOS的/etc/hosts文件中映射主机名与127.0.0.1地址导致的。

    验证后果然就正常的。

    原因追踪

    根据相关报错日志梳理RocketMQ的源代码,报错是因为在NettyRemotingClient.invokeSync()方法中做了超时判断。

    @Override
    public RemotingCommand invokeSync(String addr, final RemotingCommand request, long timeoutMillis)
        throws InterruptedException, RemotingConnectException, RemotingSendRequestException, RemotingTimeoutException {
        long beginStartTime = System.currentTimeMillis();
        final Channel channel = this.getAndCreateChannel(addr);
        String channelRemoteAddr = RemotingHelper.parseChannelRemoteAddr(channel);
        if (channel != null && channel.isActive()) {
            long left = timeoutMillis; // 默认超时时长是5000ms
            try {
                long costTime = System.currentTimeMillis() - beginStartTime;
                left -= costTime;
                if (left <= 0) { // 当执行时长超过5s时直接抛出异常
                    throw new RemotingTimeoutException("invokeSync call the addr[" + channelRemoteAddr + "] timeout");
                }
                RemotingCommand response = this.invokeSyncImpl(channel, request, left);
                updateChannelLastResponseTime(addr);
                return response;
            }
            //其他代码省略...
        }
        //其他代码省略...
    }
    

    由于是做了超时检查抛出的异常,所以单纯从日志信息看就会认为是无法连接127.0.0.1:9876,实际上该地址是可以连通的。

    进一步追踪发现,是在执行Netty的ReflectiveChannelFactory.newChannel()方法耗时较长,约10s左右。

    @Override
    public T newChannel() {
        try {
            // constructor是NioSocketChannel.class
            // 所以本质上这里是要通过反射的方式实例化一个NioSocketChannel对象
            T t = constructor.newInstance();
            return t;
        } catch (Throwable t) {
            throw new ChannelException("Unable to create Channel from class " + constructor.getDeclaringClass(), t);
        }
    }
    

    验证代码如下:

    long start = System.currentTimeMillis();
    Constructor constructor = NioSocketChannel.class.getConstructor();
    constructor.newInstance();
    System.out.println(String.format("%s ms", System.currentTimeMillis() - start));
    

    执行后输出日志:

    10144 ms
    

    奇怪的是,当在/etc/hosts文件中明确指定主机名与127.0.0.1的映射关系后,执行就非常快。

    暂时还不清楚这个地方的深层次原因是什么,为什么通过反射方式实例化NioSocketChannel对象会跟主机名与127.0.0.1的映射有关系呢?

    【参考】
    Windows 启动RocketMQ

  • 相关阅读:
    【一起入门DeepLearning】中科院深度学习_期末总复习
    Fiddler Orchestra用户指南:打造高效协同调试利器
    HTTP(0)-并发服务器
    复现MySQL的索引选择失误以及通过OPTIMIZER_TRACE分析过程
    洛谷 P5682 [CSP-J2019 江西] 次大值
    力扣LeetCode算法题 第6题-Z 字形变换
    学编程的第十八天
    可信执行环境(Tee)入门综述
    MySQL - 创建新用户账户,控制权限 - 学习/实践
    图片如何去雾?这些方法值得收藏
  • 原文地址:https://www.cnblogs.com/nuccch/p/18033047