0. Prerequisite
There are 3 VMs - hadoop3/hadoop4/hadoop5 for fully-distributed HBase cluster, the setup plan looks like:
| hadoop3 | hadoop4 | hadoop5 | |
| Hadoop hdfs | NameNode:8020 DateNode:50010 JobHistoryServer:19888 | DataNode:50010 | SecondaryNameNode:50090 DateNode:50010 |
| Hadoop yarn | NodeManger:8040 | ResourceMananger:8088 NodeManger:8040 | NodeManger:8040 |
| Zookeeper | QuorumPeerMain:2181 | QuorumPeerMain:2181 | QuorumPeerMain:2181 |
| HBase | HMaster:16000 HRegionServer:16020 | HRegionServer:16020 | HRegionServer:16020 |
And JDK/Zookeeper/Hadoop/HBase have been installed under /opt on 3 VMs with user sunxo:
- $ ls /opt
- hadoop-2.10.2 hbase-2.4.16 jdk zookeeper-3.8.1
1) configure passwordless SSH access
hadoop3 who has Namenode needs to access all VMs as sunxo
- $ ssh-keygen -t rsa
- $ ssh-copy-id hadoop3
- $ ssh-copy-id hadoop4
- $ ssh-copy-id hadoop5
and root as well
- # ssh-keygen -t rsa
- # ssh-copy-id hadoop3
- # ssh-copy-id hadoop4
- # ssh-copy-id hadoop5
haddoop4 who has ResourceMananger needs to access all VMs as sunxo
- $ ssh-keygen -t rsa
- $ ssh-copy-id hadoop3
- $ ssh-copy-id hadoop4
- $ ssh-copy-id hadoop5
2) hadoop3 add environment variable in $HOME/.bashrc (not .bash.profile)
- export JAVA_HOME=/opt/jdk
- export ZOOKEEPER_HOME=/opt/zookeeper-3.8.1
- export KAFKA_HOME=/opt/kafka-3.3.1
- export HADOOP_HOME=/opt/hadoop-2.10.2
- export HBASE_HOME=/opt/hbase-2.4.16
- export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin:$JAVA_HOME/bin:$HOME/bin:.:$PATH
And distribute .bashrc to hadoop4 hadoop5
- $ rsync.sh .bashrc hadoop4 hadoop5
- rsync -rvl /home/sunxo/.bashrc sunxo@hadoop4:/home/sunxo
- sending incremental file list
- .bashrc
-
- sent 614 bytes received 37 bytes 1302.00 bytes/sec
- total size is 541 speedup is 0.83
- rsync -rvl /home/sunxo/.bashrc sunxo@hadoop5:/home/sunxo
- sending incremental file list
- .bashrc
1. Zookeeper on hadoop3
1) configurate and distribute zoo.cfg
- $ cd $ZOOKEEPER_HOME/conf
- $ diff -u zoo_sample.cfg zoo.cfg
- --- zoo_sample.cfg 2023-01-26 00:31:05.000000000 +0800
- +++ zoo.cfg 2023-10-17 14:30:06.598229298 +0800
- @@ -9,7 +9,7 @@
- # the directory where the snapshot is stored.
- # do not use /tmp for storage, /tmp here is just
- # example sakes.
- -dataDir=/tmp/zookeeper
- +dataDir=/opt/zookeeper-3.8.1/tmp
- # the port at which the clients will connect
- clientPort=2181
- # the maximum number of client connections.
- @@ -25,7 +25,7 @@
- #autopurge.snapRetainCount=3
- # Purge task interval in hours
- # Set to "0" to disable auto purge feature
- -#autopurge.purgeInterval=1
- +autopurge.purgeInterval=1
-
- ## Metrics Providers
- #
- @@ -35,3 +35,7 @@
- #metricsProvider.httpPort=7000
- #metricsProvider.exportJvmInfo=true
-
- +# cluster
- +server.3=hadoop3:2888:3888
- +server.4=hadoop4:2888:3888
- +server.5=hadoop5:2888:3888
-
- $ rsync.sh zoo.cfg hadoop4 hadoop5
2) create and distribute data dir assigned in zoo.cfg
- cd $ZOOKEEPER_HOME
- $ mkdir -p tmp
- $ rsync.sh tmp hadoop4 hadoop5
3) start zookeeper cluster
- $ hosts="hadoop3 hadoop4 hadoop5"
- $ for host in $hosts
- > do
- > echo "============= zk start on $host ============="
- > ssh $host $ZOOKEEPER_HOME/bin/zkServer.sh start
- > done
- ============= zk start on hadoop3 =============
- ZooKeeper JMX enabled by default
- Using config: /opt/zookeeper-3.8.1/bin/../conf/zoo.cfg
- Starting zookeeper ... STARTED
- ============= zk start on hadoop4 =============
- ZooKeeper JMX enabled by default
- Using config: /opt/zookeeper-3.8.1/bin/../conf/zoo.cfg
- Starting zookeeper ... STARTED
- ============= zk start on hadoop5 =============
- ZooKeeper JMX enabled by default
- Using config: /opt/zookeeper-3.8.1/bin/../conf/zoo.cfg
- Starting zookeeper ... STARTED
-
- $ jps.sh hadoop3 hadoop4 hadoop5
- ============= hadoop3 =============
- 30495 QuorumPeerMain
- ============= hadoop4 =============
- 313 QuorumPeerMain
- ============= hadoop5 =============
- 4264 QuorumPeerMain
2. Hadoop on hadoop3
1) configurate
- $ cd $HADOOP_HOME/etc/hadoop
- $ diff -u hadoop-env.sh.orig hadoop-env.sh
- ...
- -export JAVA_HOME=${JAVA_HOME}
- +export JAVA_HOME=/opt/jdk
- $ cat core-site.xml
- ...
-
-
fs.defaultFS -
hdfs://hadoop3:8020 -
-
-
hadoop.tmp.dir -
/opt/hadoop-2.10.2/data/tmp -
- $ cat hdfs-site.xml
- ...
-
-
dfs.namenode.secondary.http-address -
hadoop5:50090 -
- $ cat mapred-site.xml
- ...
-
-
mapreduce.framework.name -
yarn -
-
-
mapreduce.jobhistory.address -
hadoop3:10020 -
-
-
mapreduce.jobhistory.webapp.address -
hadoop3:19888 -
- $ cat yarn-site.xml
- ...
-
-
yarn.resourcemanager.hostname -
hadoop4 -
-
-
yarn.nodemanager.aux-services -
mapreduce_shuffle -
-
-
yarn.log-aggregation-enable -
true -
-
-
yarn.log-aggregation.retain-seconds -
604800 -
-
-
yarn.resourcemanager.scheduler.class -
org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler -
- $ cat slaves
- hadoop3
- hadoop4
- hadoop5
2) distribute configuration to hadoop4, hadoop5
- $ cd $HADOOP_HOME/etc
- $ rsync.sh hadoop/ hadoop4 hadoop5
3) format filesystem
- $ cd $HADOOP_HOME
- $ rm.sh data/ hadoop3 hadoop4 hadoop5 # remove old data if need
- $ rm.sh log/ hadoop3 hadoop4 hadoop5 # remove old log if need
- $ bin/hdfs namenode -format
4) start hdfs/yarn/historyserver
- $ echo "============= dfs start from hadoop3 ============="
- ssh hadoop3 $HADOOP_HOME/sbin/start-dfs.sh
- echo "============= yarn start from hadoop4 ============="
- ssh hadoop4 $HADOOP_HOME/sbin/start-yarn.sh
- echo "============= history start on hadoop3 ============="
- ssh hadoop3 $HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver
- ============= dfs start from hadoop3 =============
- Starting namenodes on [hadoop3]
- hadoop3: starting namenode, logging to /opt/hadoop-2.10.2/logs/hadoop-sunxo-namenode-hadoop3.out
- hadoop4: starting datanode, logging to /opt/hadoop-2.10.2/logs/hadoop-sunxo-datanode-hadoop4.out
- hadoop3: starting datanode, logging to /opt/hadoop-2.10.2/logs/hadoop-sunxo-datanode-hadoop3.out
- hadoop5: starting datanode, logging to /opt/hadoop-2.10.2/logs/hadoop-sunxo-datanode-hadoop5.out
- Starting secondary namenodes [hadoop5]
- ============= yarn start from hadoop4 =============
- starting yarn daemons
- starting resourcemanager, logging to /opt/hadoop-2.10.2/logs/yarn-sunxo-resourcemanager-hadoop4.out
- hadoop3: starting nodemanager, logging to /opt/hadoop-2.10.2/logs/yarn-sunxo-nodemanager-hadoop3.out
- hadoop4: starting nodemanager, logging to /opt/hadoop-2.10.2/logs/yarn-sunxo-nodemanager-hadoop4.out
- hadoop5: starting nodemanager, logging to /opt/hadoop-2.10.2/logs/yarn-sunxo-nodemanager-hadoop5.out
- ============= history start on hadoop3 =============
- starting historyserver, logging to /opt/hadoop-2.10.2/logs/mapred-sunxo-historyserver-hadoop3.out
-
- $ jps.sh hadoop3 hadoop4 hadoop5
- ============= hadoop3 =============
- 816 DataNode
- 616 NameNode
- 1385 JobHistoryServer
- 1166 NodeManager
- 30495 QuorumPeerMain
- ============= hadoop4 =============
- 2065 DataNode
- 2354 NodeManager
- 313 QuorumPeerMain
- 2222 ResourceManager
- ============= hadoop5 =============
- 5892 DataNode
- 6023 SecondaryNameNode
- 4264 QuorumPeerMain
- 6120 NodeManager
3. HBase on hadoop3
1) configurate
- $ diff -u hbase-env.sh.orig hbase-env.sh
- --- hbase-env.sh.orig 2020-01-22 23:10:15.000000000 +0800
- +++ hbase-env.sh 2023-10-19 18:21:33.098131203 +0800
- @@ -25,7 +25,7 @@
- # into the startup scripts (bin/hbase, etc.)
-
- # The java implementation to use. Java 1.8+ required.
- -# export JAVA_HOME=/usr/java/jdk1.8.0/
- +export JAVA_HOME=/opt/jdk
-
- # Extra Java CLASSPATH elements. Optional.
- # export HBASE_CLASSPATH=
- @@ -123,7 +123,7 @@
- # export HBASE_SLAVE_SLEEP=0.1
-
- # Tell HBase whether it should manage it's own instance of ZooKeeper or not.
- -# export HBASE_MANAGES_ZK=true
- +export HBASE_MANAGES_ZK=false
-
- # The default log rolling policy is RFA, where the log file is rolled as per the size defined for the
- $ cat hbase-site.xml
-
-
hbase.cluster.distributed -
true -
-
-
hbase.zookeeper.quorum -
hadoop3,hadoop4,hadoop5 -
-
-
hbase.rootdir -
hdfs://hadoop3:8020/hbase -
- $ cat regionservers
- hadoop3
- hadoop4
- hadoop5
2) start hbase
- $ echo "============= hbase start from hadoop3 ============="
- $HBASE_HOME/bin/start-hbase.sh
- ============= hbase start from hadoop3 =============
- running master, logging to /opt/hbase-2.4.16/logs/hbase-sunxo-master-hadoop3.out
- hadoop3: running regionserver, logging to /opt/hbase-2.4.16/logs/hbase-sunxo-regionserver-hadoop3.out
- hadoop4: running regionserver, logging to /opt/hbase-2.4.16/logs/hbase-sunxo-regionserver-hadoop4.out
- hadoop5: running regionserver, logging to /opt/hbase-2.4.16/logs/hbase-sunxo-regionserver-hadoop5.out
-
- $ jps.sh hadoop3 hadoop4 hadoop5
- ============= hadoop3 =============
- 816 DataNode
- 2064 HMaster
- 616 NameNode
- 2280 HRegionServer
- 1385 JobHistoryServer
- 1166 NodeManager
- 30495 QuorumPeerMain
- ============= hadoop4 =============
- 2065 DataNode
- 2354 NodeManager
- 2995 HRegionServer
- 313 QuorumPeerMain
- 2222 ResourceManager
- ============= hadoop5 =============
- 5892 DataNode
- 6023 SecondaryNameNode
- 4264 QuorumPeerMain
- 6120 NodeManager
- 6616 HRegionServer
Note: check related url
Hdfs - http://hadoop3:50070/explorer.html#/
Yarn - http://hadoop4:8088/cluster
HBase - http://hadoop3:16010/master-status