hadoop源码地址:https://gitee.com/CHNnoodle/hadoop.git
git clone错误:
Filename too long错误,使用git config --global core.longpaths true
git clone https://gitee.com/CHNnoodle/hadoop.git -b rel/release-3.2.2 拉取指定tag版本
hadoop安装包地址:https://mirrors.cloud.tencent.com/apache/hadoop/common/
windows平台下载,https://github.com/cdarlint/winutils替换hadoop对应版本的bin文件
<configuration>
<property>
<name>hadoop.tmp.dirname>
<value>/D:/ruanjian/hadoop/hadoop-3.3.5/datavalue>
property>
<property>
<name>fs.defaultFSname>
<value>hdfs://localhost:9000value>
property>
configuration>
<configuration>
<property>
<name>dfs.replicationname>
<value>1value>
property>
<property>
<name>dfs.namenode.name.dirname>
<value>/D:/ruanjian/hadoop/hadoop-3.3.5/data/namenodevalue>
property>
<property>
<name>dfs.datanode.data.dirname>
<value>/D:/ruanjian/hadoop/hadoop-3.3.5/data/datanodevalue>
property>
configuration>
<configuration>
<property>
<name>mapreduce.framework.namename>
<value>yarnvalue>
property>
configuration>
<configuration>
<property>
<name>yarn.nodemanager.aux-servicesname>
<value>mapreduce_shufflevalue>
property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.classname>
<value>org.apache.hahoop.mapred.ShuffleHandlervalue>
property>
configuration>
启动hadoop
hdfs namenode -format //格式化节点
sbin/start-all.cmd //启动hadoop
浏览器访问
访问集群节点:http://localhost:8088/

访问HDFS:http://localhost:9870/

hadoop的mapreduce适合离线的数据,实时数据使用flink和spark(实时性不高)模型