• 搭建HBase分布式集群


    0. Prerequisite
    There are 3 VMs - hadoop3/hadoop4/hadoop5 for fully-distributed HBase cluster, the setup plan looks like:

    hadoop3hadoop4hadoop5
    Hadoop hdfs

    NameNode:8020

    DateNode:50010

    JobHistoryServer:19888

    DataNode:50010

    SecondaryNameNode:50090

    DateNode:50010

    Hadoop yarnNodeManger:8040

    ResourceMananger:8088

    NodeManger:8040

    NodeManger:8040
    ZookeeperQuorumPeerMain:2181QuorumPeerMain:2181QuorumPeerMain:2181
    HBase

    HMaster:16000

    HRegionServer:16020

    HRegionServer:16020HRegionServer:16020

    And JDK/Zookeeper/Hadoop/HBase have been installed under /opt on 3 VMs with user sunxo:

    1. $ ls /opt
    2. hadoop-2.10.2 hbase-2.4.16 jdk zookeeper-3.8.1

    1) configure passwordless SSH access
    hadoop3 who has Namenode needs to access all VMs as sunxo

    1. $ ssh-keygen -t rsa
    2. $ ssh-copy-id hadoop3
    3. $ ssh-copy-id hadoop4
    4. $ ssh-copy-id hadoop5

    and root as well

    1. # ssh-keygen -t rsa
    2. # ssh-copy-id hadoop3
    3. # ssh-copy-id hadoop4
    4. # ssh-copy-id hadoop5

    haddoop4 who has ResourceMananger needs to access all VMs as sunxo

    1. $ ssh-keygen -t rsa
    2. $ ssh-copy-id hadoop3
    3. $ ssh-copy-id hadoop4
    4. $ ssh-copy-id hadoop5

    2) hadoop3 add environment variable in $HOME/.bashrc (not .bash.profile)

    1. export JAVA_HOME=/opt/jdk
    2. export ZOOKEEPER_HOME=/opt/zookeeper-3.8.1
    3. export KAFKA_HOME=/opt/kafka-3.3.1
    4. export HADOOP_HOME=/opt/hadoop-2.10.2
    5. export HBASE_HOME=/opt/hbase-2.4.16
    6. export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin:$JAVA_HOME/bin:$HOME/bin:.:$PATH

    And distribute .bashrc to hadoop4 hadoop5

    1. $ rsync.sh .bashrc hadoop4 hadoop5
    2. rsync -rvl /home/sunxo/.bashrc sunxo@hadoop4:/home/sunxo
    3. sending incremental file list
    4. .bashrc
    5. sent 614 bytes received 37 bytes 1302.00 bytes/sec
    6. total size is 541 speedup is 0.83
    7. rsync -rvl /home/sunxo/.bashrc sunxo@hadoop5:/home/sunxo
    8. sending incremental file list
    9. .bashrc

    1. Zookeeper on hadoop3
    1) configurate and distribute zoo.cfg

    1. $ cd $ZOOKEEPER_HOME/conf
    2. $ diff -u zoo_sample.cfg zoo.cfg
    3. --- zoo_sample.cfg 2023-01-26 00:31:05.000000000 +0800
    4. +++ zoo.cfg 2023-10-17 14:30:06.598229298 +0800
    5. @@ -9,7 +9,7 @@
    6. # the directory where the snapshot is stored.
    7. # do not use /tmp for storage, /tmp here is just
    8. # example sakes.
    9. -dataDir=/tmp/zookeeper
    10. +dataDir=/opt/zookeeper-3.8.1/tmp
    11. # the port at which the clients will connect
    12. clientPort=2181
    13. # the maximum number of client connections.
    14. @@ -25,7 +25,7 @@
    15. #autopurge.snapRetainCount=3
    16. # Purge task interval in hours
    17. # Set to "0" to disable auto purge feature
    18. -#autopurge.purgeInterval=1
    19. +autopurge.purgeInterval=1
    20. ## Metrics Providers
    21. #
    22. @@ -35,3 +35,7 @@
    23. #metricsProvider.httpPort=7000
    24. #metricsProvider.exportJvmInfo=true
    25. +# cluster
    26. +server.3=hadoop3:2888:3888
    27. +server.4=hadoop4:2888:3888
    28. +server.5=hadoop5:2888:3888
    29. $ rsync.sh zoo.cfg hadoop4 hadoop5

    2) create and distribute data dir assigned in zoo.cfg

    1. cd $ZOOKEEPER_HOME
    2. $ mkdir -p tmp
    3. $ rsync.sh tmp hadoop4 hadoop5

    3) start zookeeper cluster

    1. $ hosts="hadoop3 hadoop4 hadoop5"
    2. $ for host in $hosts
    3. > do
    4. > echo "============= zk start on $host ============="
    5. > ssh $host $ZOOKEEPER_HOME/bin/zkServer.sh start
    6. > done
    7. ============= zk start on hadoop3 =============
    8. ZooKeeper JMX enabled by default
    9. Using config: /opt/zookeeper-3.8.1/bin/../conf/zoo.cfg
    10. Starting zookeeper ... STARTED
    11. ============= zk start on hadoop4 =============
    12. ZooKeeper JMX enabled by default
    13. Using config: /opt/zookeeper-3.8.1/bin/../conf/zoo.cfg
    14. Starting zookeeper ... STARTED
    15. ============= zk start on hadoop5 =============
    16. ZooKeeper JMX enabled by default
    17. Using config: /opt/zookeeper-3.8.1/bin/../conf/zoo.cfg
    18. Starting zookeeper ... STARTED
    19. $ jps.sh hadoop3 hadoop4 hadoop5
    20. ============= hadoop3 =============
    21. 30495 QuorumPeerMain
    22. ============= hadoop4 =============
    23. 313 QuorumPeerMain
    24. ============= hadoop5 =============
    25. 4264 QuorumPeerMain

    2. Hadoop on hadoop3
    1) configurate

    1. $ cd $HADOOP_HOME/etc/hadoop
    2. $ diff -u hadoop-env.sh.orig hadoop-env.sh
    3. ...
    4. -export JAVA_HOME=${JAVA_HOME}
    5. +export JAVA_HOME=/opt/jdk
    6. $ cat core-site.xml
    7. ...
    8. fs.defaultFS
    9. hdfs://hadoop3:8020
    10. hadoop.tmp.dir
    11. /opt/hadoop-2.10.2/data/tmp
    12. $ cat hdfs-site.xml
    13. ...
    14. dfs.namenode.secondary.http-address
    15. hadoop5:50090
    16. $ cat mapred-site.xml
    17. ...
    18. mapreduce.framework.name
    19. yarn
    20. mapreduce.jobhistory.address
    21. hadoop3:10020
    22. mapreduce.jobhistory.webapp.address
    23. hadoop3:19888
    24. $ cat yarn-site.xml
    25. ...
    26. yarn.resourcemanager.hostname
    27. hadoop4
    28. yarn.nodemanager.aux-services
    29. mapreduce_shuffle
    30. yarn.log-aggregation-enable
    31. true
    32. yarn.log-aggregation.retain-seconds
    33. 604800
    34. yarn.resourcemanager.scheduler.class
    35. org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler
    36. $ cat slaves
    37. hadoop3
    38. hadoop4
    39. hadoop5

    2) distribute configuration to hadoop4, hadoop5

    1. $ cd $HADOOP_HOME/etc
    2. $ rsync.sh hadoop/ hadoop4 hadoop5

    3) format filesystem

    1. $ cd $HADOOP_HOME
    2. $ rm.sh data/ hadoop3 hadoop4 hadoop5 # remove old data if need
    3. $ rm.sh log/ hadoop3 hadoop4 hadoop5 # remove old log if need
    4. $ bin/hdfs namenode -format

    4) start hdfs/yarn/historyserver

    1. $ echo "============= dfs start from hadoop3 ============="
    2. ssh hadoop3 $HADOOP_HOME/sbin/start-dfs.sh
    3. echo "============= yarn start from hadoop4 ============="
    4. ssh hadoop4 $HADOOP_HOME/sbin/start-yarn.sh
    5. echo "============= history start on hadoop3 ============="
    6. ssh hadoop3 $HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver
    7. ============= dfs start from hadoop3 =============
    8. Starting namenodes on [hadoop3]
    9. hadoop3: starting namenode, logging to /opt/hadoop-2.10.2/logs/hadoop-sunxo-namenode-hadoop3.out
    10. hadoop4: starting datanode, logging to /opt/hadoop-2.10.2/logs/hadoop-sunxo-datanode-hadoop4.out
    11. hadoop3: starting datanode, logging to /opt/hadoop-2.10.2/logs/hadoop-sunxo-datanode-hadoop3.out
    12. hadoop5: starting datanode, logging to /opt/hadoop-2.10.2/logs/hadoop-sunxo-datanode-hadoop5.out
    13. Starting secondary namenodes [hadoop5]
    14. ============= yarn start from hadoop4 =============
    15. starting yarn daemons
    16. starting resourcemanager, logging to /opt/hadoop-2.10.2/logs/yarn-sunxo-resourcemanager-hadoop4.out
    17. hadoop3: starting nodemanager, logging to /opt/hadoop-2.10.2/logs/yarn-sunxo-nodemanager-hadoop3.out
    18. hadoop4: starting nodemanager, logging to /opt/hadoop-2.10.2/logs/yarn-sunxo-nodemanager-hadoop4.out
    19. hadoop5: starting nodemanager, logging to /opt/hadoop-2.10.2/logs/yarn-sunxo-nodemanager-hadoop5.out
    20. ============= history start on hadoop3 =============
    21. starting historyserver, logging to /opt/hadoop-2.10.2/logs/mapred-sunxo-historyserver-hadoop3.out
    22. $ jps.sh hadoop3 hadoop4 hadoop5
    23. ============= hadoop3 =============
    24. 816 DataNode
    25. 616 NameNode
    26. 1385 JobHistoryServer
    27. 1166 NodeManager
    28. 30495 QuorumPeerMain
    29. ============= hadoop4 =============
    30. 2065 DataNode
    31. 2354 NodeManager
    32. 313 QuorumPeerMain
    33. 2222 ResourceManager
    34. ============= hadoop5 =============
    35. 5892 DataNode
    36. 6023 SecondaryNameNode
    37. 4264 QuorumPeerMain
    38. 6120 NodeManager

    3. HBase on hadoop3
    1) configurate

    1. $ diff -u hbase-env.sh.orig hbase-env.sh
    2. --- hbase-env.sh.orig 2020-01-22 23:10:15.000000000 +0800
    3. +++ hbase-env.sh 2023-10-19 18:21:33.098131203 +0800
    4. @@ -25,7 +25,7 @@
    5. # into the startup scripts (bin/hbase, etc.)
    6. # The java implementation to use. Java 1.8+ required.
    7. -# export JAVA_HOME=/usr/java/jdk1.8.0/
    8. +export JAVA_HOME=/opt/jdk
    9. # Extra Java CLASSPATH elements. Optional.
    10. # export HBASE_CLASSPATH=
    11. @@ -123,7 +123,7 @@
    12. # export HBASE_SLAVE_SLEEP=0.1
    13. # Tell HBase whether it should manage it's own instance of ZooKeeper or not.
    14. -# export HBASE_MANAGES_ZK=true
    15. +export HBASE_MANAGES_ZK=false
    16. # The default log rolling policy is RFA, where the log file is rolled as per the size defined for the
    17. $ cat hbase-site.xml
    18. hbase.cluster.distributed
    19. true
    20. hbase.zookeeper.quorum
    21. hadoop3,hadoop4,hadoop5
    22. hbase.rootdir
    23. hdfs://hadoop3:8020/hbase
    24. $ cat regionservers
    25. hadoop3
    26. hadoop4
    27. hadoop5

    2) start hbase

    1. $ echo "============= hbase start from hadoop3 ============="
    2. $HBASE_HOME/bin/start-hbase.sh
    3. ============= hbase start from hadoop3 =============
    4. running master, logging to /opt/hbase-2.4.16/logs/hbase-sunxo-master-hadoop3.out
    5. hadoop3: running regionserver, logging to /opt/hbase-2.4.16/logs/hbase-sunxo-regionserver-hadoop3.out
    6. hadoop4: running regionserver, logging to /opt/hbase-2.4.16/logs/hbase-sunxo-regionserver-hadoop4.out
    7. hadoop5: running regionserver, logging to /opt/hbase-2.4.16/logs/hbase-sunxo-regionserver-hadoop5.out
    8. $ jps.sh hadoop3 hadoop4 hadoop5
    9. ============= hadoop3 =============
    10. 816 DataNode
    11. 2064 HMaster
    12. 616 NameNode
    13. 2280 HRegionServer
    14. 1385 JobHistoryServer
    15. 1166 NodeManager
    16. 30495 QuorumPeerMain
    17. ============= hadoop4 =============
    18. 2065 DataNode
    19. 2354 NodeManager
    20. 2995 HRegionServer
    21. 313 QuorumPeerMain
    22. 2222 ResourceManager
    23. ============= hadoop5 =============
    24. 5892 DataNode
    25. 6023 SecondaryNameNode
    26. 4264 QuorumPeerMain
    27. 6120 NodeManager
    28. 6616 HRegionServer

    Note: check related url
    Hdfs - http://hadoop3:50070/explorer.html#/
    Yarn - http://hadoop4:8088/cluster
    HBase - http://hadoop3:16010/master-status

     

  • 相关阅读:
    微前端总结
    vscode插件开发
    Apache Doris (四十五): Doris数据更新与删除 - Sequence 列
    软件工程与计算(十八)代码设计
    【CV】第 15 章:结合计算机视觉和 NLP 技术
    DTCloud 复杂字段类型
    flex布局 H5携程移动端中间布局 (六)
    [附源码]计算机毕业设计ssm新能源电动汽车充电桩服务APP
    城商行本地高可用建设实践与落地效果分享
    自定义通用分页标签01
  • 原文地址:https://blog.csdn.net/sun_xo/article/details/133925336