• 大数据集群修改服务器ip


    【背景】

    因为下周要对大数据开放式平台的服务器进行机房搬迁,开放式平台有90台物理机,其中24台服务器是后来扩容新增的,ip段为19.126.66.*,与另外一个集群共用了同一个网段。根据机房的物理部署规划,搬迁是要对同一个网段批量进行的,因此在搬迁前需要对这24台服务器的ip进行修改。

    修改ip的变更本周四实施,因此今天在测试环境进行方案验证,对一台计算节点进行ip修改。源ip:146.32.19.25,目标ip:146.32.18.100

    【零、CM上停止该节点角色】

    【一、修改ip配置文件】

    将老的ip配置文件移动到/tmp目录下

    d0305001:/etc/sysconfig/network # cat ifcfg-vlan119

    BOOTPROTO='static'

    BROADCAST=''

    ETHERDEVICE='bond0'

    ETHTOOL_OPTIONS=''

    IPADDR='146.32.19.25/24'

    MTU=''

    NAME=''

    NETWORK=''

    REMOTE_IPADDR=''

    STARTMODE='auto'

    USERCONTROL='no'

    VLAN_ID='119'

    d0305001:/etc/sysconfig/network # mv ifcfg-vlan119 /tmp/

    新建ip配置文件ifcfg-vlan118

    d0305001:/etc/sysconfig/network # cat ifcfg-vlan118

    BOOTPROTO='static'

    BROADCAST=''

    ETHERDEVICE='bond0'

    ETHTOOL_OPTIONS=''

    IPADDR='146.32.18.100/24'

    MTU=''

    NAME=''

    NETWORK=''

    REMOTE_IPADDR=''

    STARTMODE='auto'

    USERCONTROL='no'

    VLAN_ID='118'

    【二、修改路由配置文件】

    d0305001:/etc/sysconfig/network # vi routes

    default 146.32.19.254 - -

    修改为

    default 146.32.18.254 - -

    【三、重启网络服务】

    service network restart

    d0305001:/etc/sysconfig/network # ip a|grep global

        inet 146.32.18.100/24 brd 146.32.18.255 scope global vlan118

        inet 146.33.18.100/24 brd 146.33.18.255 scope global vlan218

    【四、修改ntp配置】

    修改/etc/ntp.conf文件中的146.32.19.254网关地址为新ip对应的网关146.32.18.254,并重启ntp服务

    d0305001:/etc/sysconfig/network # service ntp restart

    Shutting down network time protocol daemon (NTPD)                                                                        done

    Starting network time protocol daemon (NTPD)                                                                            done

    【五、修改整个集群、客户端的/etc/hosts文件】

    cp /etc/hosts /etc/hosts.0107

    sed -i 's/19.25/18.100/g' /etc/hosts

    【六、重启该节点agent服务】

    service cloudera-scm-agent restart

    cm上启动该节点角色

    【七、验证】

    该节点角色服务启动后,报错失去namenode连接

    查看日志:Datanode denied communication with namenode because the host is not in the include-list: DatanodeRegistration(146.32.18.100……

    d0305001:/var/log/hadoop-hdfs # tail -100 hadoop-cmf-hdfs-DATANODE-d0305001.log.out

            at javax.security.auth.Subject.doAs(Subject.java:415)

            at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1714)

            at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2135)

    2019-01-07 10:50:21,141 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool BP-1060838331-146.249.31.13-1489136106065 (Datanode Uuid 5538360a-f138-42f2-b219-2b4993c6de2a) service to d0305004/146.32.19.28:8022 beginning handshake with NN

    2019-01-07 10:50:21,143 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool BP-1060838331-146.249.31.13-1489136106065 (Datanode Uuid 5538360a-f138-42f2-b219-2b4993c6de2a) service to d0305004/146.32.19.28:8022 Datanode denied communication with namenode because the host is not in the include-list: DatanodeRegistration(146.32.18.100, datanodeUuid=5538360a-f138-42f2-b219-2b4993c6de2a, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=cluster14;nsid=314642609;c=0)

            at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915)

            at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:5143)

            at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1162)

            at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:100)

            at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:29184)

            at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)

            at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)

            at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2141)

            at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137)

            at java.security.AccessController.doPrivileged(Native Method)

            at javax.security.auth.Subject.doAs(Subject.java:415)

            at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1714)

            at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2135)

    2019-01-07 10:50:21,151 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Block pool BP-1060838331-146.249.31.13-1489136106065 (Datanode Uuid 5538360a-f138-42f2-b219-2b4993c6de2a) service to d0305005/146.32.19.29:8022 beginning handshake with NN

    2019-01-07 10:50:21,152 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool BP-1060838331-146.249.31.13-1489136106065 (Datanode Uuid 5538360a-f138-42f2-b219-2b4993c6de2a) service to d0305005/146.32.19.29:8022 Datanode denied communication with namenode because the host is not in the include-list: DatanodeRegistration(146.32.18.100, datanodeUuid=5538360a-f138-42f2-b219-2b4993c6de2a, infoPort=50075, infoSecurePort=0, ipcPort=50020, storageInfo=lv=-56;cid=cluster14;nsid=314642609;c=0)

            at org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.registerDatanode(DatanodeManager.java:915)

            at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.registerDatanode(FSNamesystem.java:5143)

            at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.registerDatanode(NameNodeRpcServer.java:1162)

            at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolServerSideTranslatorPB.registerDatanode(DatanodeProtocolServerSideTranslatorPB.java:100)

            at org.apache.hadoop.hdfs.protocol.proto.DatanodeProtocolProtos$DatanodeProtocolService$2.callBlockingMethod(DatanodeProtocolProtos.java:29184)

            at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)

            at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)

            at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2141)

            at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2137)

            at java.security.AccessController.doPrivileged(Native Method)

            at javax.security.auth.Subject.doAs(Subject.java:415)

            at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1714)

            at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2135)

    问题原因:造成此问题的原因大部分是由于主机IP方面有问题。报错上面提示的是与active namenode拒绝连接。       

    登陆namenode节点,

    d0305004:~ # find / -name *allow.txt

    find: `/proc/29004': No such file or directory

    /var/run/cloudera-scm-agent/process/6938-yarn-RESOURCEMANAGER-refresh/nodes_allow.txt

    /var/run/cloudera-scm-agent/process/6930-namenodes-failover/dfs_hosts_allow.txt

    /var/run/cloudera-scm-agent/process/6929-hdfs-NAMENODE-safemode-wait/dfs_hosts_allow.txt

    /var/run/cloudera-scm-agent/process/6927-hdfs-NAMENODE-nnRpcWait/dfs_hosts_allow.txt

    /var/run/cloudera-scm-agent/process/6926-hdfs-NAMENODE/dfs_hosts_allow.txt

    /var/run/cloudera-scm-agent/process/6924-hdfs-NAMENODE-jnSyncWait/dfs_hosts_allow.txt

    /var/run/cloudera-scm-agent/process/6920-hdfs-NAMENODE-jnSyncWait/dfs_hosts_allow.txt

    /var/run/cloudera-scm-agent/process/6916-hdfs-NAMENODE-jnSyncWait/dfs_hosts_allow.txt

    /var/run/cloudera-scm-agent/process/6416-yarn-RESOURCEMANAGER/nodes_allow.txt

    /var/run/cloudera-scm-agent/process/6368-hdfs-NAMENODE/dfs_hosts_allow.txt

    d0305004:~ # cat /var/run/cloudera-scm-agent/process/6368-hdfs-NAMENODE/dfs_hosts_allow.txt

    146.33.19.13

    146.32.19.14

    146.32.19.15

    146.32.19.16

    146.32.19.17

    146.32.19.18

    146.32.19.20

    146.32.19.22

    146.32.19.23

    146.32.19.24

    146.32.19.25

    146.32.19.26

    146.32.19.27

    146.32.19.28

    146.32.19.30

    由于报错上面提示的是与active namenode拒绝连接,所以手动在namenode节点上刷新主机列表:

    hadoop dfsadmin -fs hdfs://146.32.19.28:8020 -refreshNodes   //其中146.32.19.28是active namenode的IP

    此时在active namenode也是能看到新添加的机器的:

    d0305004:~ # cd /var/run/cloudera-scm-agent/process/

    d0305004:/var/run/cloudera-scm-agent/process # ls -ltr

    total 0

    drwxr-x--x 3 zookeeper zookeeper 280 Nov  1 15:58 6346-zookeeper-server

    drwxr-x--x 3 hdfs      hdfs      360 Nov  1 15:59 6353-hdfs-DATANODE

    drwxr-x--x 3 hbase    hbase    360 Nov  1 16:00 6372-hbase-MASTER

    drwxr-x--x 3 yarn      hadoop    440 Nov  1 16:00 6407-yarn-NODEMANAGER

    drwxr-x--x 3 yarn      hadoop    500 Nov  1 16:00 6416-yarn-RESOURCEMANAGER

    drwxr-x--x 5 solr      solr      280 Nov  1 16:00 6400-solr-SOLR_SERVER

    drwxr-x--x 4 hive      hive      340 Nov  1 16:00 6417-hive-HIVESERVER2

    drwxr-x--x 4 hive      hive      300 Nov  1 16:00 6418-hive-HIVEMETASTORE

    drwxr-xr-x 4 root      root      100 Nov 12 11:01 ccdeploy_hadoop-conf_etchadoopconf.cloudera.yarn_6266238222486433408

    drwxr-xr-x 4 root      root      100 Nov 12 11:02 ccdeploy_hive-conf_etchiveconf.cloudera.hive_-1465732137655581486

    drwxr-x--x 3 root      root      140 Nov 14 12:11 6533-host-inspector

    drwxr-x--x 4 root      root      140 Nov 14 12:11 6511-collect-host-statistics

    drwxr-x--x 3 root      root      140 Nov 21 12:12 6581-host-inspector

    drwxr-x--x 4 root      root      140 Nov 21 12:12 6559-collect-host-statistics

    drwxr-x--x 3 root      root      140 Nov 28 12:13 6629-host-inspector

    drwxr-x--x 4 root      root      140 Nov 28 12:13 6607-collect-host-statistics

    drwxr-x--x 3 root      root      140 Dec  5 12:14 6677-host-inspector

    drwxr-x--x 4 root      root      140 Dec  5 12:14 6655-collect-host-statistics

    drwxr-x--x 3 root      root      140 Dec 12 12:15 6726-host-inspector

    drwxr-x--x 4 root      root      140 Dec 12 12:15 6704-collect-host-statistics

    drwxr-x--x 3 root      root      140 Dec 19 12:16 6774-host-inspector

    drwxr-x--x 4 root      root      140 Dec 19 12:16 6752-collect-host-statistics

    drwxr-x--x 3 root      root      140 Dec 26 12:17 6822-host-inspector

    drwxr-x--x 4 root      root      140 Dec 26 12:17 6800-collect-host-statistics

    drwxr-x--x 3 root      root      140 Jan  2 12:18 6870-host-inspector

    drwxr-x--x 4 root      root      140 Jan  2 12:18 6848-collect-host-statistics

    drwxr-x--x 3 hdfs      hdfs      340 Jan  7 10:54 6355-hdfs-JOURNALNODE

    drwxr-x--x 3 hdfs      hdfs      320 Jan  7 10:54 6917-hdfs-JOURNALNODE

    drwxr-x--x 3 hdfs      hdfs      500 Jan  7 10:54 6916-hdfs-NAMENODE-jnSyncWait

    drwxr-x--x 3 hdfs      hdfs      500 Jan  7 10:55 6920-hdfs-NAMENODE-jnSyncWait

    drwxr-x--x 3 hdfs      hdfs      500 Jan  7 10:55 6924-hdfs-NAMENODE-jnSyncWait

    drwxr-x--x 3 hdfs      hdfs      500 Jan  7 10:55 6368-hdfs-NAMENODE

    drwxr-x--x 3 hdfs      hdfs      480 Jan  7 10:55 6926-hdfs-NAMENODE

    drwxr-x--x 3 hdfs      hdfs      480 Jan  7 10:56 6927-hdfs-NAMENODE-nnRpcWait

    drwxr-x--x 3 hdfs      hdfs      380 Jan  7 10:56 6362-hdfs-FAILOVERCONTROLLER

    drwxr-x--x 3 hdfs      hdfs      360 Jan  7 10:56 6928-hdfs-FAILOVERCONTROLLER

    drwxr-x--x 3 hdfs      hdfs      480 Jan  7 10:56 6929-hdfs-NAMENODE-safemode-wait

    drwxr-x--x 3 hdfs      hdfs      480 Jan  7 10:57 6930-namenodes-failover

    drwxr-xr-x 4 root      root      100 Jan  7 10:57 ccdeploy_hadoop-conf_etchadoopconf.cloudera.hdfs_1239954674294922633

    drwxr-xr-x 4 root      root      120 Jan  7 10:57 ccdeploy_hadoop-conf_etchadoopconf.cloudera.hdfs_2490906708984413108

    drwxr-x--x 3 yarn      hadoop    500 Jan  7 11:00 6938-yarn-RESOURCEMANAGER-refresh

    d0305004:/var/run/cloudera-scm-agent/process # cd 6926-hdfs-NAMENODE

    d0305004:/var/run/cloudera-scm-agent/process/6926-hdfs-NAMENODE # ls

    cloudera-monitor.properties                  dfs_hosts_exclude.txt      http-auth-signature-secret  ssl-server.xml

    cloudera-stack-monitor.properties            event-filter-rules.json    log4j.properties            supervisor.conf

    cloudera_manager_agent_fencer.py              hadoop-metrics2.properties  logs                        topology.map

    cloudera_manager_agent_fencer_secret_key.txt  hadoop-policy.xml          navigator.client.properties  topology.py

    core-site.xml                                hdfs-site.xml              redaction-rules.json

    dfs_hosts_allow.txt                          hdfs.keytab                ssl-client.xml

    d0305004:/var/run/cloudera-scm-agent/process/6926-hdfs-NAMENODE # cat dfs_hosts_allow.txt

    146.33.19.13

    146.32.19.14

    146.32.19.15

    146.32.19.16

    146.32.19.17

    146.32.19.18

    146.32.19.20

    146.32.19.22

    146.32.19.23

    146.32.19.24

    146.32.18.100

    146.32.19.26

    146.32.19.27

    146.32.19.28

    146.32.19.30

    d0305004:/var/run/cloudera-scm-agent/process/6926-hdfs-NAMENODE #

    最后一步,在cm页面上执行刷新集群操作就可以了。

    当然,还有一个简单一点的办法,那就是滚动重启hdfs服务喽,只要没业务就行~

  • 相关阅读:
    IntelliJ IDEA 2022.2 正式发布:已完全支持 Spring 6 和 Spring Boot 3
    AndroidAuto PCTS A118解决杂音问题
    近红外荧光标记包裹/包覆二氧化锰
    【MySQL】 MySQL的增删改查(进阶)--壹
    java的泛型
    数字孪生+燃气管理,开启智慧燃气管理新模式
    seaborn学习2:displot()
    [QNX 自适应分区用户指南] 目录
    初识Golang的面向对象 为结构体(类)绑定方法
    设计模式|组合模式(Composite Pattern)
  • 原文地址:https://blog.csdn.net/javastart/article/details/127630044