• 08 spark 集群搭建


    前言

    呵呵 最近有一系列环境搭建的相关需求

    记录一下

    spark 三个节点 : 192.168.110.150, 192.168.110.151, 192.168.110.152

    150 为 master, 151 为 slave01, 152 为 slave02

    三台机器都做了 trusted shell

    spark 版本是 spark-3.2.1-bin-hadoop2.7 

    spark 集群搭建

    spark 三个节点 : 192.168.110.150, 192.168.110.151, 192.168.110.152

    1. 基础环境准备

    192.168.110.150, 192.168.110.151, 192.168.110.152 上面安装 jdk, 上传 spark 的安装包

    安装包来自于 Downloads | Apache Spark

    2. spark 配置调整

    复制如下 三个配置文件, 进行调整, 调整了之后 scp 到 slave01, slave02 上面 

    1. root@master:/usr/local/ProgramFiles/spark-3.2.1-bin-hadoop2.7# cp conf/spark-defaults.conf.template conf/spark-defaults.conf
    2. root@master:/usr/local/ProgramFiles/spark-3.2.1-bin-hadoop2.7# cp conf/spark-env.sh.template conf/spark-env.sh
    3. root@master:/usr/local/ProgramFiles/spark-3.2.1-bin-hadoop2.7# cp conf/workers.template conf/workers

    更新 workers

    1. # A Spark Worker will be started on each of the machines listed below.
    2. slave01
    3. slave02

    更新 spark-defaults.conf

    1. spark.master spark://master:7077
    2. # spark.eventLog.enabled true
    3. # spark.eventLog.dir hdfs://namenode:8021/directory
    4. spark.serializer org.apache.spark.serializer.KryoSerializer
    5. spark.driver.memory 1g
    6. # spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"

    更新 spark-env.sh 

    1. export JAVA_HOME=/usr/local/ProgramFiles/jdk1.8.0_291
    2. export HADOOP_HOME=/usr/local/ProgramFiles/hadoop-2.10.1
    3. export HADOOP_CONF_DIR=/usr/local/ProgramFiles/hadoop-2.10.1/etc/hadoop
    4. export SPARK_DIST_CLASSPATH=$(/usr/local/ProgramFiles/hadoop-2.10.1/bin/hadoop classpath)
    5. export SPARK_MASTER_HOST=master
    6. export SPARK_MASTER_PORT=7077

    3. 启动集群 

    master 所在的机器执行 start-all.sh 

    1. root@master:/usr/local/ProgramFiles/spark-3.2.1-bin-hadoop2.7# ./sbin/start-all.sh
    2. starting org.apache.spark.deploy.master.Master, logging to /usr/local/ProgramFiles/spark-3.2.1-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.master.Master-1-master.out
    3. slave01: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/ProgramFiles/spark-3.2.1-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave01.out
    4. slave02: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/ProgramFiles/spark-3.2.1-bin-hadoop2.7/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-slave02.out
    5. root@master:/usr/local/ProgramFiles/spark-3.2.1-bin-hadoop2.7#

    测试集群

    使用 spark-submit 提交 SparkPI 迭代 1000 次

    spark-submit --class org.apache.spark.examples.SparkPi /usr/local/ProgramFiles/spark-3.2.1-bin-hadoop2.7/examples/jars/spark-examples_2.12-3.2.1.jar 1000

    java driver 提交 spark 任务 

    spark web ui 监控页面

  • 相关阅读:
    java校园论坛贴吧系统分享
    《深度学习推荐系统》读书笔记
    Win10怎么把登录密码去掉
    前端自动生成二维码并长按识别跳转 Vue
    微信小程序|基于小程序实现人脸数量检测
    抖音集团基于 Apache Doris 的实时数据仓库实践
    requestIdleCallback
    vue知识点————插槽 slot
    vue 实现自定义分页打印 window.print
    对游戏的底层逻辑发起改变的区块链
  • 原文地址:https://blog.csdn.net/u011039332/article/details/124900140