• Debian下Hadoop集群安装


    Debian下Hadoop集群安装

    依赖安装

    jdk 8

    sudo apt-get update && sudo apt-get install -y wget apt-transport-https
    wget -O - https://packages.adoptium.net/artifactory/api/gpg/key/public | sudo tee /etc/apt/keyrings/adoptium.asc
    echo "deb [signed-by=/etc/apt/keyrings/adoptium.asc] https://mirrors.tuna.tsinghua.edu.cn/Adoptium/deb $(awk -F= '/^VERSION_CODENAME/{print$2}' /etc/os-release) main" | sudo tee /etc/apt/sources.list.d/adoptium.list
    sudo apt-get update
    sudo apt-get install -y temurin-8-jdk
    
    • 1
    • 2
    • 3
    • 4
    • 5

    hadoop

    mkdir -p /root/packages
    wget -P /root/packages https://dlcdn.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6-aarch64.tar.gz
    tar -zxvf /root/packages/hadoop-3.3.6-aarch64.tar.gz -C /usr/local
    
    • 1
    • 2
    • 3

    配置环境变量

    export JAVA_HOME=/usr/lib/jvm/temurin-8-jdk-amd64
    export HADOOP_HOME="/usr/local/hadoop-3.3.6"
    export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
    
    • 1
    • 2
    • 3

    环境配置

    设置hosts

    127.0.0.1       localhost
    192.168.50.201  node1.node1.com node1
    192.168.50.202  node2.node2.com node2
    192.168.50.203  node3.node3.com node3
    
    # The following lines are desirable for IPv6 capable hosts
    ::1     localhost ip6-localhost ip6-loopback
    ff02::1 ip6-allnodes
    ff02::2 ip6-allrouters
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9

    生成SSH rsa

    ssh-keygen -t rsa -C "node1@example.com"
    
    • 1

    允许root登录

    vim /etc/ssh/sshd_config
    
    • 1

    修改PermitRootLogin yes

    复制ssh到其他主机

    ssh-copy-id node1
    ssh-copy-id node2
    ssh-copy-id node3
    
    • 1
    • 2
    • 3

    至少需要完成从node1 -> node1,node2,node3的免密登录

    Hadoop配置

    编辑Hadoop配置文件

    hadoop-env.sh

    vim $HADOOP_HOME/etc/hadoop/hadoop-env.sh
    
    • 1

    在文件末尾加上以下内容

    # 配置JAVA_HOME
    export JAVA_HOME=$JAVA_HOME
    
    # 设置用户以执行对应角色shell命令
    export HDFS_NAMENODE_USER=root
    export HDFS_DATANODE_USER=root
    export HDFS_SECONDARYNAMENODE_USER=root
    export YARN_RESOURCEMANAGER_USER=root
    export YARN_NODEMANAGER_USER=root   
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9

    core-site.xml

    vim $HADOOP_HOME/etc/hadoop/core-site.xml
    
    • 1

    在configuration标签中添加以下内容

    
    
    
    
    <property>
        <name>fs.defaultFSname>
        <value>hdfs://node1:8020value>
    property>
    
    
    <property>
        <name>hadoop.tmp.dirname>
        <value>/export/data/hadoopvalue>
    property>
    
    
    <property>
        <name>hadoop.http.staticuser.username>
        <value>rootvalue>
    property>
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20

    mapred-site.xml

    vim $HADOOP_HOME/etc/hadoop/mapred-site.xml
    
    • 1

    在configuration标签中添加以下内容

    
    <property>
        <name>mapreduce.framework.namename>
        <value>yarnvalue>
    property>
    
    
    <property>
        <name>mapreduce.jobhistory.addressname>
        <value>node1:10020value>
    property>
    
    
    <property>
        <name>mapreduce.jobhistory.webapp.addressname>
        <value>node1:19888value>
    property>
    
    <property>
        <name>yarn.app.mapreduce.am.envname>
        <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}value>
    property>
    
    <property>
        <name>mapreduce.map.envname>
        <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}value>
    property>
    
    <property>
        <name>mapreduce.reduce.envname>
        <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}value>
    property>
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32

    yarn-site.xml

    vim $HADOOP_HOME/etc/hadoop/yarn-site.xml
    
    • 1

    在configuration标签中添加以下内容

    
    <property>
        <name>yarn.resourcemanager.hostnamename>
        <value>node1value>
    property>
    
    
    <property>
        <name>yarn.nodemanager.aux-servicesname>
        <value>mapreduce_shufflevalue>
    property>
    
    
    <property>
        <name>yarn.scheduler.minimum-allocation-mbname>
        <value>256value>
    property>
    
    
    <property>
        <name>yarn.scheduler.maximum-allocation-mbname>
        <value>512value>
    property>
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23

    workers

    vim $HADOOP_HOME/etc/hadoop/workers
    
    • 1

    workers文件中添加主机名称或IP

    node1
    node2
    node3
    
    • 1
    • 2
    • 3

    Hadoop启动

    NameNode format(格式化操作)

    首次启动HDFS时,必须对其进行格式化操作

    format本质上是初始化工作,进行HDFS清理和准备工作

    hdfs namenode -format
    
    • 1

    当格式化日志中出现以下内容说明格式化成功

    STARTUP_MSG:   build = https://github.com/apache/hadoop.git -r 1be78238728da9266a4f88195058f08fd012bf9c; compiled by 'ubuntu' on 2023-06-18T23:15Z
    STARTUP_MSG:   java = 1.8.0_382
    ************************************************************/
    2023-09-12 21:18:41,575 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
    2023-09-12 21:18:41,758 INFO namenode.NameNode: createNameNode [-format]
    2023-09-12 21:18:42,039 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    2023-09-12 21:18:43,011 INFO namenode.NameNode: Formatting using clusterid: CID-fcf657e4-d0df-4a9a-8f7d-a2a8f0a910df
    2023-09-12 21:18:43,082 INFO namenode.FSEditLog: Edit logging is async:true
    2023-09-12 21:18:43,167 INFO namenode.FSNamesystem: KeyProvider: null
    2023-09-12 21:18:43,173 INFO namenode.FSNamesystem: fsLock is fair: true
    2023-09-12 21:18:43,179 INFO namenode.FSNamesystem: Detailed lock hold time metrics enabled: false
    2023-09-12 21:18:43,229 INFO namenode.FSNamesystem: fsOwner                = root (auth:SIMPLE)
    2023-09-12 21:18:43,233 INFO namenode.FSNamesystem: supergroup             = supergroup
    2023-09-12 21:18:43,237 INFO namenode.FSNamesystem: isPermissionEnabled    = true
    2023-09-12 21:18:43,238 INFO namenode.FSNamesystem: isStoragePolicyEnabled = true
    2023-09-12 21:18:43,240 INFO namenode.FSNamesystem: HA Enabled: false
    2023-09-12 21:18:43,321 INFO common.Util: dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
    2023-09-12 21:18:43,593 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit : configured=1000, counted=60, effected=1000
    2023-09-12 21:18:43,598 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
    2023-09-12 21:18:43,613 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
    2023-09-12 21:18:43,618 INFO blockmanagement.BlockManager: The block deletion will start around 2023 Sep 12 21:18:43
    2023-09-12 21:18:43,625 INFO util.GSet: Computing capacity for map BlocksMap
    2023-09-12 21:18:43,628 INFO util.GSet: VM type       = 64-bit
    2023-09-12 21:18:43,637 INFO util.GSet: 2.0% max memory 475.6 MB = 9.5 MB
    2023-09-12 21:18:43,638 INFO util.GSet: capacity      = 2^20 = 1048576 entries
    2023-09-12 21:18:43,665 INFO blockmanagement.BlockManager: Storage policy satisfier is disabled
    2023-09-12 21:18:43,668 INFO blockmanagement.BlockManager: dfs.block.access.token.enable = false
    2023-09-12 21:18:43,702 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.threshold-pct = 0.999
    2023-09-12 21:18:43,709 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.min.datanodes = 0
    2023-09-12 21:18:43,711 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.extension = 30000
    2023-09-12 21:18:43,717 INFO blockmanagement.BlockManager: defaultReplication         = 3
    2023-09-12 21:18:43,722 INFO blockmanagement.BlockManager: maxReplication             = 512
    2023-09-12 21:18:43,723 INFO blockmanagement.BlockManager: minReplication             = 1
    2023-09-12 21:18:43,724 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
    2023-09-12 21:18:43,726 INFO blockmanagement.BlockManager: redundancyRecheckInterval  = 3000ms
    2023-09-12 21:18:43,727 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
    2023-09-12 21:18:43,730 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
    2023-09-12 21:18:43,923 INFO namenode.FSDirectory: GLOBAL serial map: bits=29 maxEntries=536870911
    2023-09-12 21:18:43,927 INFO namenode.FSDirectory: USER serial map: bits=24 maxEntries=16777215
    2023-09-12 21:18:43,928 INFO namenode.FSDirectory: GROUP serial map: bits=24 maxEntries=16777215
    2023-09-12 21:18:43,931 INFO namenode.FSDirectory: XATTR serial map: bits=24 maxEntries=16777215
    2023-09-12 21:18:43,964 INFO util.GSet: Computing capacity for map INodeMap
    2023-09-12 21:18:43,967 INFO util.GSet: VM type       = 64-bit
    2023-09-12 21:18:43,973 INFO util.GSet: 1.0% max memory 475.6 MB = 4.8 MB
    2023-09-12 21:18:43,974 INFO util.GSet: capacity      = 2^19 = 524288 entries
    2023-09-12 21:18:43,979 INFO namenode.FSDirectory: ACLs enabled? true
    2023-09-12 21:18:43,980 INFO namenode.FSDirectory: POSIX ACL inheritance enabled? true
    2023-09-12 21:18:43,980 INFO namenode.FSDirectory: XAttrs enabled? true
    2023-09-12 21:18:43,987 INFO namenode.NameNode: Caching file names occurring more than 10 times
    2023-09-12 21:18:43,996 INFO snapshot.SnapshotManager: Loaded config captureOpenFiles: false, skipCaptureAccessTimeOnlyChange: false, snapshotDiffAllowSnapRootDescendant: true, maxSnapshotLimit: 65536
    2023-09-12 21:18:44,006 INFO snapshot.SnapshotManager: SkipList is disabled
    2023-09-12 21:18:44,016 INFO util.GSet: Computing capacity for map cachedBlocks
    2023-09-12 21:18:44,018 INFO util.GSet: VM type       = 64-bit
    2023-09-12 21:18:44,022 INFO util.GSet: 0.25% max memory 475.6 MB = 1.2 MB
    2023-09-12 21:18:44,034 INFO util.GSet: capacity      = 2^17 = 131072 entries
    2023-09-12 21:18:44,070 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
    2023-09-12 21:18:44,076 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
    2023-09-12 21:18:44,082 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
    2023-09-12 21:18:44,141 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
    2023-09-12 21:18:44,143 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
    2023-09-12 21:18:44,156 INFO util.GSet: Computing capacity for map NameNodeRetryCache
    2023-09-12 21:18:44,157 INFO util.GSet: VM type       = 64-bit
    2023-09-12 21:18:44,159 INFO util.GSet: 0.029999999329447746% max memory 475.6 MB = 146.1 KB
    2023-09-12 21:18:44,159 INFO util.GSet: capacity      = 2^14 = 16384 entries
    2023-09-12 21:18:44,234 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1199298961-192.168.50.201-1694524724216
    2023-09-12 21:18:44,315 INFO common.Storage: Storage directory /export/data/hadoop/dfs/name has been successfully formatted.
    2023-09-12 21:18:44,450 INFO namenode.FSImageFormatProtobuf: Saving image file /export/data/hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
    2023-09-12 21:18:44,588 INFO namenode.FSImageFormatProtobuf: Image file /export/data/hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 396 bytes saved in 0 seconds .
    2023-09-12 21:18:44,615 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
    2023-09-12 21:18:44,656 INFO namenode.FSNamesystem: Stopping services started for active state
    2023-09-12 21:18:44,660 INFO namenode.FSNamesystem: Stopping services started for standby state
    2023-09-12 21:18:44,672 INFO namenode.FSImage: FSImageSaver clean checkpoint: txid=0 when meet shutdown.
    2023-09-12 21:18:44,677 INFO namenode.NameNode: SHUTDOWN_MSG: 
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at node1.node1.com/192.168.50.201
    ************************************************************/
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77

    Hadoop集群启动关闭-手动逐个进程启停

    每台机器上每次手动启动关闭一个角色进程

    • HDFS集群
    hdfs --daemon start namenode|datanode|secondarynamenode
    hdfs --daemon stop namenode|datanode|secondarynamenode
    
    • 1
    • 2
    • YARN集群
    yarn --daemon start resourcemanager|nodemanager
    yarn --daemon stop resourcemanager|nodemanager
    
    • 1
    • 2

    在node1中启动

    hdfs --daemon start namenode
    hdfs --daemon start datanode
    yarn --daemon start resourcemanager
    yarn --daemon start nodemanager
    jps
    
    • 1
    • 2
    • 3
    • 4
    • 5

    在node2中启动

    hdfs --daemon start datanode
    hdfs --daemon start secondarynamenode
    yarn --daemon start nodemanager
    jps
    
    • 1
    • 2
    • 3
    • 4

    在node3中启动

    hdfs --daemon start datanode
    yarn --daemon start nodemanager
    jps
    
    • 1
    • 2
    • 3

    Hadoop集群启动关闭-shell脚本一键启停

    在node1上,使用自带的shell脚本一键启动

    前提:配置好机器之间的SSH免密登录和workers文件。

    • HDFS集群

    start-dfs.sh

    stop-dfs.sh

    • YARN集群

    start-yarn.sh

    stop-yarn.sh

    • Hadoop集群

    start-all.sh
    stop-all.sh

    Hadoop Web UI页面-HDFS集群

    地址:http://node1:9870

    Hadoop Web UI页面-YARN集群

    地址:http://node1:8088

  • 相关阅读:
    机器学习入门四
    Java项目:药品商城系统(java+SSM+JSP+jQuery+Mysql)
    Linux—进程管理
    Mysql词法分析实验(二)
    机器学习笔记 十九:由浅入深的随机森林模型之分类
    python之numpy
    Java内部类详解
    设计模式(上)
    【C++】智能指针
    IB考试45分是如何做到的?
  • 原文地址:https://blog.csdn.net/Star_SDK/article/details/132841667