• Hadoop大数据应用:HDFS 集群节点扩容


    目录

     一、实验

    1.环境

    2.HDFS 集群节点扩容

    二、问题

    1.rsync 同步报错


     一、实验

    1.环境

    (1)主机

    表1  主机

    主机架构软件版本IP备注
    hadoop

    NameNode (已部署)

    SecondaryNameNode (已部署)

    ResourceManager(已部署)

    hadoop

    2.7.7192.168.204.50

    node01

    DataNode(已部署)

    NodeManager(已部署)

    hadoop

    2.7.7192.168.204.51
    node02

    DataNode(已部署)

    NodeManager(已部署)

    hadoop

    2.7.7192.168.204.52
    node03

    DataNode(已部署)

    NodeManager(已部署)

    hadoop

    2.7.7192.168.204.53
    node04

    DataNode

    hadoop

    2.7.7192.168.204.54

    (2)查看jps

    hadoop节点

    [root@hadoop hadoop]# jps
    

    node01节点

    node02节点

    node03节点

    (3) 查看节点

    1. [root@hadoop hadoop]# ./bin/yarn node -list
    2. 24/03/14 13:40:21 INFO client.RMProxy: Connecting to ResourceManager at hadoop/192.168.204.50:8032
    3. Total Nodes:3
    4. Node-Id Node-State Node-Http-Address Number-of-Running-Containers
    5. node01:40551 RUNNING node01:8042 0
    6. node02:46073 RUNNING node02:8042 0
    7. node03:40601 RUNNING node03:8042 0

    2.HDFS 集群节点扩容

    (1)查看IP

    地址为192.168.204.54

    [root@localhost ~]# ip addr
    

     (2)安全机制

    查看

    [root@localhost ~]# sestatus
    

    关闭

    1. [root@localhost ~]# vim /etc/selinux/config
    2. ……
    3. SELINUX=disabled
    4. ……

    再次查看(需要reboot重启)

    [root@localhost ~]# sestatus
    

    (3)防火墙

    关闭

    1. [root@localhost ~]# systemctl stop firewalld
    2. [root@localhost ~]# systemctl mask firewalld

    (4)安装java

    [root@localhost ~]# yum install -y java-1.8.0-openjdk-devel.x86_64
    

    查看

    [root@localhost ~]# jps
    

     (5)修改主机名

    1. [root@localhost ~]# hostnamectl set-hostname node04
    2. [root@localhost ~]# bash

    (6)添加免密登录

    1. [root@hadoop ~]# cd /root/.ssh/
    2. [root@hadoop .ssh]# ls
    3. authorized_keys id_rsa id_rsa.pub known_hosts
    4. [root@hadoop .ssh]# ssh-copy-id -i id_rsa.pub 192.168.204.54

    验证:

    [root@hadoop .ssh]# ssh 192.168.204.54
    

     (7)域名主机名(hadoop节点)

    1. [root@hadoop ~]# vim /etc/hosts
    2. ……
    3. 192.168.205.50 hadoop
    4. 192.168.205.51 node01
    5. 192.168.205.52 node02
    6. 192.168.205.53 node03
    7. 192.168.204.54 node04

    (8)同步域名配置文件

    1. [root@hadoop ~]# rsync -av /etc/hosts node01:/etc/
    2. sending incremental file list
    3. hosts
    4. sent 359 bytes received 41 bytes 266.67 bytes/sec
    5. total size is 269 speedup is 0.67
    6. [root@hadoop ~]# rsync -av /etc/hosts node02:/etc/
    7. sending incremental file list
    8. hosts
    9. sent 359 bytes received 41 bytes 800.00 bytes/sec
    10. total size is 269 speedup is 0.67
    11. [root@hadoop ~]# rsync -av /etc/hosts node03:/etc/
    12. sending incremental file list
    13. hosts
    14. sent 359 bytes received 41 bytes 800.00 bytes/sec
    15. total size is 269 speedup is 0.67
    16. [root@hadoop ~]# rsync -av /etc/hosts node04:/etc/
    17. Warning: Permanently added 'node04' (ECDSA) to the list of known hosts.
    18. sending incremental file list
    19. hosts
    20. sent 359 bytes received 41 bytes 266.67 bytes/sec
    21. total size is 269 speedup is 0.67

    (9)同步Hadoop文件

    [root@hadoop ~]# rsync -aXSH --delete /usr/local/hadoop node04:/usr/local/
    

    (10) 清除日志(node04节点)

    1. [root@node04 ~]# cd /usr/local/hadoop/
    2. [root@node04 hadoop]# ls
    3. bin etc include lib libexec LICENSE.txt logs NOTICE.txt README.txt sbin share
    4. [root@node04 hadoop]# cd logs/
    5. [root@node04 logs]# ls
    6. hadoop-root-namenode-hadoop.log hadoop-root-secondarynamenode-hadoop.log SecurityAuth-root.audit
    7. hadoop-root-namenode-hadoop.out hadoop-root-secondarynamenode-hadoop.out yarn-root-resourcemanager-hadoop.log
    8. hadoop-root-namenode-hadoop.out.1 hadoop-root-secondarynamenode-hadoop.out.1 yarn-root-resourcemanager-hadoop.out
    9. [root@node04 logs]# rm -f *
    10. [root@node04 logs]# ls

    (11)查看slaves  (hadoop节点)

    1. [root@hadoop ~]# cd /usr/local/hadoop/etc/hadoop/
    2. [root@hadoop hadoop]# cat slaves

    (12)添加slaves

    1. [root@hadoop hadoop]# vim slaves
    2. node01
    3. node02
    4. node03
    5. node04

    (13)同步配置到所有主机

    1. [root@hadoop hadoop]# rsync -aXSH --delete /usr/local/hadoop/etc node01:/usr/local/hadoop/
    2. [root@hadoop hadoop]# rsync -aXSH --delete /usr/local/hadoop/etc node02:/usr/local/hadoop/
    3. [root@hadoop hadoop]# rsync -aXSH --delete /usr/local/hadoop/etc node03:/usr/local/hadoop/
    4. [root@hadoop hadoop]# rsync -aXSH --delete /usr/local/hadoop/etc node04:/usr/local/hadoop/

    (14)启动服务 (node04节点)

    [root@node04 hadoop]# ./sbin/hadoop-daemon.sh start datanode
    

    查看jps

    (15) 验证 (hadoop节点)

    查看报告,Live datanodes 显示节点为4个。

    1. [root@hadoop hadoop]# ./bin/hdfs dfsadmin -report
    2. Configured Capacity: 822126559232 (765.67 GB)
    3. Present Capacity: 798787727360 (743.93 GB)
    4. DFS Remaining: 798786990080 (743.93 GB)
    5. DFS Used: 737280 (720 KB)
    6. DFS Used%: 0.00%
    7. Under replicated blocks: 0
    8. Blocks with corrupt replicas: 0
    9. Missing blocks: 0
    10. Missing blocks (with replication factor 1): 0
    11. -------------------------------------------------
    12. Live datanodes (4):
    13. Name: 192.168.204.54:50010 (node04)
    14. Hostname: node04
    15. Decommission Status : Normal
    16. Configured Capacity: 205531639808 (191.42 GB)
    17. DFS Used: 4096 (4 KB)
    18. Non DFS Used: 5658746880 (5.27 GB)
    19. DFS Remaining: 199872888832 (186.15 GB)
    20. DFS Used%: 0.00%
    21. DFS Remaining%: 97.25%
    22. Configured Cache Capacity: 0 (0 B)
    23. Cache Used: 0 (0 B)
    24. Cache Remaining: 0 (0 B)
    25. Cache Used%: 100.00%
    26. Cache Remaining%: 0.00%
    27. Xceivers: 1
    28. Last contact: Thu Mar 14 15:00:23 CST 2024
    29. Name: 192.168.204.53:50010 (node03)
    30. Hostname: node03
    31. Decommission Status : Normal
    32. Configured Capacity: 205531639808 (191.42 GB)
    33. DFS Used: 266240 (260 KB)
    34. Non DFS Used: 5621547008 (5.24 GB)
    35. DFS Remaining: 199909826560 (186.18 GB)
    36. DFS Used%: 0.00%
    37. DFS Remaining%: 97.26%
    38. Configured Cache Capacity: 0 (0 B)
    39. Cache Used: 0 (0 B)
    40. Cache Remaining: 0 (0 B)
    41. Cache Used%: 100.00%
    42. Cache Remaining%: 0.00%
    43. Xceivers: 1
    44. Last contact: Thu Mar 14 15:00:24 CST 2024
    45. Name: 192.168.204.51:50010 (node01)
    46. Hostname: node01
    47. Decommission Status : Normal
    48. Configured Capacity: 205531639808 (191.42 GB)
    49. DFS Used: 180224 (176 KB)
    50. Non DFS Used: 6029209600 (5.62 GB)
    51. DFS Remaining: 199502249984 (185.80 GB)
    52. DFS Used%: 0.00%
    53. DFS Remaining%: 97.07%
    54. Configured Cache Capacity: 0 (0 B)
    55. Cache Used: 0 (0 B)
    56. Cache Remaining: 0 (0 B)
    57. Cache Used%: 100.00%
    58. Cache Remaining%: 0.00%
    59. Xceivers: 1
    60. Last contact: Thu Mar 14 15:00:22 CST 2024
    61. Name: 192.168.204.52:50010 (node02)
    62. Hostname: node02
    63. Decommission Status : Normal
    64. Configured Capacity: 205531639808 (191.42 GB)
    65. DFS Used: 286720 (280 KB)
    66. Non DFS Used: 6029328384 (5.62 GB)
    67. DFS Remaining: 199502024704 (185.80 GB)
    68. DFS Used%: 0.00%
    69. DFS Remaining%: 97.07%
    70. Configured Cache Capacity: 0 (0 B)
    71. Cache Used: 0 (0 B)
    72. Cache Remaining: 0 (0 B)
    73. Cache Used%: 100.00%
    74. Cache Remaining%: 0.00%
    75. Xceivers: 1
    76. Last contact: Thu Mar 14 15:00:25 CST 2024

    (16)查看命令

    设置带宽命令为 -setBalancerBandwidth

    1. [root@hadoop hadoop]# ./bin/hdfs dfsadmin
    2. Usage: hdfs dfsadmin
    3. Note: Administrative commands can only be run as the HDFS superuser.
    4. [-report [-live] [-dead] [-decommissioning]]
    5. [-safemode wait>]
    6. [-saveNamespace]
    7. [-rollEdits]
    8. [-restoreFailedStorage true|false|check]
    9. [-refreshNodes]
    10. [-setQuota <dirname>...<dirname>]
    11. [-clrQuota <dirname>...<dirname>]
    12. [-setSpaceQuota [-storageType ] <dirname>...<dirname>]
    13. [-clrSpaceQuota [-storageType ] <dirname>...<dirname>]
    14. [-finalizeUpgrade]
    15. [-rollingUpgrade []]
    16. [-refreshServiceAcl]
    17. [-refreshUserToGroupsMappings]
    18. [-refreshSuperUserGroupsConfiguration]
    19. [-refreshCallQueue]
    20. [-refresh [arg1..argn]
    21. [-reconfig ]
    22. [-printTopology]
    23. [-refreshNamenodes datanode_host:ipc_port]
    24. [-deleteBlockPool datanode_host:ipc_port blockpoolId [force]]
    25. [-setBalancerBandwidth in bytes per second>]
    26. [-fetchImage <local directory>]
    27. [-allowSnapshot ]
    28. [-disallowSnapshot ]
    29. [-shutdownDatanode [upgrade]]
    30. [-getDatanodeInfo ]
    31. [-metasave filename]
    32. [-triggerBlockReport [-incremental] ]
    33. [-help [cmd]]
    34. Generic options supported are
    35. -conf specify an application configuration file
    36. -D use value for given property
    37. -fs <local|namenode:port> specify a namenode
    38. -jt <local|resourcemanager:port> specify a ResourceManager
    39. -files specify comma separated files to be copied to the map reduce cluster
    40. -libjars specify comma separated jar files to include in the classpath.
    41. -archives specify comma separated archives to be unarchived on the compute machines.
    42. The general command line syntax is
    43. bin/hadoop command [genericOptions] [commandOptions]

    (17)设置带宽平衡数据

    000为KB,000000为MB,

    500+000000 为500MB

    [root@hadoop hadoop]# ./bin/hdfs dfsadmin -setBalancerBandwidth 500000000
    

    执行脚本

    [root@hadoop hadoop]# ./sbin/start-balancer.sh
    

    (18)查看状态

    DFS Used 为使用情况

    1. [root@hadoop hadoop]# ./bin/hdfs dfsadmin -report
    2. Configured Capacity: 822126559232 (765.67 GB)
    3. Present Capacity: 798788423680 (743.93 GB)
    4. DFS Remaining: 798787682304 (743.93 GB)
    5. DFS Used: 741376 (724 KB)
    6. DFS Used%: 0.00%
    7. Under replicated blocks: 0
    8. Blocks with corrupt replicas: 0
    9. Missing blocks: 0
    10. Missing blocks (with replication factor 1): 0
    11. -------------------------------------------------
    12. Live datanodes (4):
    13. Name: 192.168.204.54:50010 (node04)
    14. Hostname: node04
    15. Decommission Status : Normal
    16. Configured Capacity: 205531639808 (191.42 GB)
    17. DFS Used: 8192 (8 KB)
    18. Non DFS Used: 5658730496 (5.27 GB)
    19. DFS Remaining: 199872901120 (186.15 GB)
    20. DFS Used%: 0.00%
    21. DFS Remaining%: 97.25%
    22. Configured Cache Capacity: 0 (0 B)
    23. Cache Used: 0 (0 B)
    24. Cache Remaining: 0 (0 B)
    25. Cache Used%: 100.00%
    26. Cache Remaining%: 0.00%
    27. Xceivers: 1
    28. Last contact: Thu Mar 14 15:16:33 CST 2024
    29. Name: 192.168.204.53:50010 (node03)
    30. Hostname: node03
    31. Decommission Status : Normal
    32. Configured Capacity: 205531639808 (191.42 GB)
    33. DFS Used: 266240 (260 KB)
    34. Non DFS Used: 5620936704 (5.23 GB)
    35. DFS Remaining: 199910436864 (186.18 GB)
    36. DFS Used%: 0.00%
    37. DFS Remaining%: 97.27%
    38. Configured Cache Capacity: 0 (0 B)
    39. Cache Used: 0 (0 B)
    40. Cache Remaining: 0 (0 B)
    41. Cache Used%: 100.00%
    42. Cache Remaining%: 0.00%
    43. Xceivers: 1
    44. Last contact: Thu Mar 14 15:16:33 CST 2024
    45. Name: 192.168.204.51:50010 (node01)
    46. Hostname: node01
    47. Decommission Status : Normal
    48. Configured Capacity: 205531639808 (191.42 GB)
    49. DFS Used: 180224 (176 KB)
    50. Non DFS Used: 6029176832 (5.62 GB)
    51. DFS Remaining: 199502282752 (185.80 GB)
    52. DFS Used%: 0.00%
    53. DFS Remaining%: 97.07%
    54. Configured Cache Capacity: 0 (0 B)
    55. Cache Used: 0 (0 B)
    56. Cache Remaining: 0 (0 B)
    57. Cache Used%: 100.00%
    58. Cache Remaining%: 0.00%
    59. Xceivers: 1
    60. Last contact: Thu Mar 14 15:16:34 CST 2024
    61. Name: 192.168.204.52:50010 (node02)
    62. Hostname: node02
    63. Decommission Status : Normal
    64. Configured Capacity: 205531639808 (191.42 GB)
    65. DFS Used: 286720 (280 KB)
    66. Non DFS Used: 6029291520 (5.62 GB)
    67. DFS Remaining: 199502061568 (185.80 GB)
    68. DFS Used%: 0.00%
    69. DFS Remaining%: 97.07%
    70. Configured Cache Capacity: 0 (0 B)
    71. Cache Used: 0 (0 B)
    72. Cache Remaining: 0 (0 B)
    73. Cache Used%: 100.00%
    74. Cache Remaining%: 0.00%
    75. Xceivers: 1
    76. Last contact: Thu Mar 14 15:16:34 CST 2024

    二、问题

    1.rsync 同步报错

    (1)报错

    (2)原因分析

    同步主机名称错误。

    (3)解决方法

    修改同步主机名称。

    [root@hadoop ~]# rsync -av /etc/hosts node01:/etc/
    

  • 相关阅读:
    Golang必知必会Go Mod命令
    《强化学习周刊》第69期:ICLR2023强化学习论文推荐、MIT实现自动调整内在奖励的强化学习...
    金融贷款行业实时高精准获客 ——三网运营商大数据
    web前端网页设计与制作:HTML+CSS旅游网页设计——桂林旅游(3页) web前端旅游风景网页设计与制作 div静态网页设计
    TreeMap的排序
    JavaScript:生成器函数
    后端工程师之路(8)-springboot
    QT day3
    流程图拖拽视觉编程-流程编辑器
    【前端】CSS定位
  • 原文地址:https://blog.csdn.net/cronaldo91/article/details/136709547