Apache Hadoop 3.3.4 – HDFS High Availability Using the Quorum Journal Manager
linux121 | linux122 | linux123 |
NameNode | NameNode | |
JournalNode | JournalNode | JournalNode |
DataNode | DataNode | DataNode |
ZK | ZK | ZK |
ResourceManager | ||
NodeManager | NodeManager | NodeManager |
启动zookeeper集群
zk.sh start
查看状态
zk.sh status
注意:这里的zk.sh是我写的群起脚本命令。
(1)停止原先HDFS集群
stop-dfs.sh
(2)在所有节点,/opt/lagou/servers目录下创建一个ha文件夹
mkdir /opt/lagou/servers/ha
(3)将/opt/lagou/servers/目录下的 hadoop-2.9.2拷贝到ha目录下
cp -r hadoop-2.9.2 ha
(4)删除原集群data目录
rm -rf /opt/lagou/servers/ha/hadoop-2.9.2/data
(5)配置hdfs-site.xml(后续配置都要清空原先的配置)
- <property>
- <name>dfs.nameservicesname>
- <value>lagouclustervalue>
- property>
- <property>
- <name>dfs.ha.namenodes.lagouclustername>
- <value>nn1,nn2value>
- property>
- <property>
- <name>dfs.namenode.rpc-address.lagoucluster.nn1name>
- <value>linux121:9000value>
- property>
- <property>
- <name>dfs.namenode.rpc-address.lagoucluster.nn2name>
- <value>linux122:9000value>
- property>
- <property>
- <name>dfs.namenode.http-address.lagoucluster.nn1name>
- <value>linux121:50070value>
- property>
- <property>
- <name>dfs.namenode.http-address.lagoucluster.nn2name>
- <value>linux122:50070value>
- property>
- <property>
- <name>dfs.namenode.shared.edits.dirname>
- <value>qjournal://linux121:8485;linux122:8485;linux123:8485/lagouvalue>
- property>
- <property>
- <name>dfs.client.failover.proxy.provider.lagouclustername>
- <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvidervalue>
- property>
- <property>
- <name>dfs.ha.fencing.methodsname>
- <value>sshfencevalue>
- property>
- <property>
- <name>dfs.ha.fencing.ssh.private-key-filesname>
- <value>/root/.ssh/id_rsavalue>
- property>
- <property>
- <name>dfs.journalnode.edits.dirname>
- <value>/opt/journalnodevalue>
- property>
- <property>
- <name>dfs.ha.automatic-failover.enabledname>
- <value>truevalue>
- property>
(6)配置core-site.xml
- <property>
- <name>fs.defaultFSname>
- <value>hdfs://lagouclustervalue>
- property>
- <property>
- <name>hadoop.tmp.dirname>
- <value>/opt/lagou/servers/ha/hadoop-2.9.2/data/tmpvalue>
- property>
- <property>
- <name>ha.zookeeper.quorumname>
- <value>linux121:2181,linux122:2181,linux123:2181value>
- property>
(7)拷贝配置好的hadoop环境到其他节点
(1)在各个JournalNode节点上,输入以下命令启动journalnode服务(去往HA安装目录,不要使用环境变量中命令)
/opt/lagou/servers/ha/hadoop-2.9.2/sbin/hadoop-daemon.sh start journalnode
(2)在[nn1]上,对其进行格式化,并启动
/opt/lagou/servers/ha/hadoop-2.9.2/bin/hdfs namenode -format
/opt/lagou/servers/ha/hadoop-2.9.2/sbin/hadoop-daemon.sh start namenode
(3)在[nn2]上,同步nn1的元数据信息
/opt/lagou/servers/ha/hadoop-2.9.2/bin/hdfs namenode -bootstrapStandby
(4)在[nn1]上初始化zkfc
/opt/lagou/servers/ha/hadoop-2.9.2/bin/hdfs zkfc -formatZK
(5)在[nn1]上,启动集群
/opt/lagou/servers/ha/hadoop-2.9.2/sbin/start-dfs.sh
(6)验证
官方文档
Apache Hadoop 3.3.4 – ResourceManager High Availability
YARN-HA工作机制
(1)配置YARN-HA集群
(2)具体配置
(3)yarn-site.xml(清空原有内容)
- <configuration>
- <property>
- <name>yarn.nodemanager.aux-servicesname>
- <value>mapreduce_shufflevalue>
- property>
-
- <property>
- <name>yarn.resourcemanager.ha.enabledname>
- <value>truevalue>
- property>
-
- <property>
- <name>yarn.resourcemanager.cluster-idname>
- <value>cluster-yarnvalue>
- property>
- <property>
- <name>yarn.resourcemanager.ha.rm-idsname>
- <value>rm1,rm2value>
- property>
- <property>
- <name>yarn.resourcemanager.hostname.rm1name>
- <value>linux122value>
- property>
- <property>
- <name>yarn.resourcemanager.hostname.rm2name>
- <value>linux123value>
- property>
-
- <property>
- <name>yarn.resourcemanager.zk-addressname>
- <value>linux121:2181,linux122:2181,linux123:2181value>
- property>
-
- <property>
- <name>yarn.resourcemanager.recovery.enabledname>
- <value>truevalue>
- property>
-
- <property>
- <name>yarn.resourcemanager.store.classname>
- <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStorevalue>
- property>
- configuration>
(4)同步更新其他节点的配置信息
(5)启动hdfs
sbin/start-yarn.sh