• 安装spark2.1.1


    目录

    安装spark

    1.安装

    1.上传安装包

    2.解压

    2.配置

    1.配置slaves

    2.修改spark-env.sh文件

    3.配置spark-config.sh文件

    3.同步其他集群

    4.启动

    5.查看UI页面,端口号8080

    6.JobHistoryServer配置

    1.修改spark-default.conf.template名称

     2.修改spark-default.conf文件,开启Log:

    3.修改spark-env.sh文件,添加如下配置:

    4.同步到其他集群

    5.启动历史服务

    6.再次执行任务

    7.查看历史服务,端口号18080


    安装spark

    1.安装

    1.上传安装包

    spark-2.1.1-bin-hadoop2.7.tgz,官网下载

    2.解压

    [root@hadoop01 software]# tar -zxvf spark-2.1.1-bin-hadoop2.7.tgz -C /opt/module/
    

    2.配置

    1.配置slaves

    1. [root@hadoop01 module]# mv spark-2.1.1-bin-hadoop2.7 spark
    2. [root@hadoop01 conf]# mv slaves.template slaves
    3. [root@hadoop01 conf]# vim slaves
    4. #
    5. # Licensed to the Apache Software Foundation (ASF) under one or more
    6. # contributor license agreements. See the NOTICE file distributed with
    7. # this work for additional information regarding copyright ownership.
    8. # The ASF licenses this file to You under the Apache License, Version 2.0
    9. # (the "License"); you may not use this file except in compliance with
    10. # the License. You may obtain a copy of the License at
    11. #
    12. # http://www.apache.org/licenses/LICENSE-2.0
    13. #
    14. # Unless required by applicable law or agreed to in writing, software
    15. # distributed under the License is distributed on an "AS IS" BASIS,
    16. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    17. # See the License for the specific language governing permissions and
    18. # limitations under the License.
    19. #
    20. # A Spark Worker will be started on each of the machines listed below.
    21. hadoop01
    22. hadoop02
    23. hadoop03

    2.修改spark-env.sh文件

    1. [root@hadoop01 conf]# vim spark-env.sh
    2. SPARK_MASTER_HOST=hadoop01
    3. SPARK_MASTER_PORT=7077

    3.配置spark-config.sh文件

    1. [root@hadoop01 sbin]# vim spark-config.sh
    2. export JAVA_HOME=/opt/module/jdk1.8.0_144

    3.同步其他集群

    [root@hadoop01 module]$ xsync spark/

    4.启动

    1. [root@hadoop01 spark]$ sbin/start-all.sh
    2. [root@hadoop01 spark]$ util.sh
    3. ================root@hadoop01================
    4. 3330 Jps
    5. 3238 Worker
    6. 3163 Master
    7. ================root@hadoop02================
    8. 2966 Jps
    9. 2908 Worker
    10. ================root@hadoop03================
    11. 2978 Worker
    12. 3036 Jps

    5.查看UI页面,端口号8080

    6.JobHistoryServer配置

    1.修改spark-default.conf.template名称

    [root@hadoop01 conf]$ mv spark-defaults.conf.template spark-defaults.conf

     2.修改spark-default.conf文件,开启Log:

    注意:HDFS上的目录需要提前存在。

     没有就创建目录

    [root@hadoop01 conf]# hdfs dfs -mkdir directory
    
    1. [root@hadoop01 conf]# vim spark-defaults.conf
    2. #
    3. # Licensed to the Apache Software Foundation (ASF) under one or more
    4. # contributor license agreements. See the NOTICE file distributed with
    5. # this work for additional information regarding copyright ownership.
    6. # The ASF licenses this file to You under the Apache License, Version 2.0
    7. # (the "License"); you may not use this file except in compliance with
    8. # the License. You may obtain a copy of the License at
    9. #
    10. # http://www.apache.org/licenses/LICENSE-2.0
    11. #
    12. # Unless required by applicable law or agreed to in writing, software
    13. # distributed under the License is distributed on an "AS IS" BASIS,
    14. # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    15. # See the License for the specific language governing permissions and
    16. # limitations under the License.
    17. #
    18. # Default system properties included when running spark-submit.
    19. # This is useful for setting default environmental settings.
    20. # Example:
    21. # spark.master spark://master:7077
    22. # spark.eventLog.enabled true
    23. # spark.eventLog.dir hdfs://namenode:8021/directory
    24. # spark.serializer org.apache.spark.serializer.KryoSerializer
    25. # spark.driver.memory 5g
    26. # spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
    27. spark.eventLog.enabled true
    28. spark.eventLog.dir hdfs://hadoop01:9000/directory

    3.修改spark-env.sh文件,添加如下配置:

    1. [root@hadoop01 conf]# vim spark-env.sh
    2. export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080
    3. -Dspark.history.retainedApplications=30
    4. -Dspark.history.fs.logDirectory=hdfs://hadoop01:9000/directory"

    4.同步到其他集群

    [root@hadoop01 conf]# xsync /opt/module/spark/conf
    

    5.启动历史服务

    [root@hadoop01 spark]# sbin/start-history-server.sh

    6.再次执行任务

    1. [root@hadoop102 spark]$ bin/spark-submit \
    2. --class org.apache.spark.examples.SparkPi \
    3. --master spark://hadoop102:7077 \
    4. --executor-memory 1G \
    5. --total-executor-cores 2 \
    6. ./examples/jars/spark-examples_2.11-2.1.1.jar \
    7. 100

    7.查看历史服务,端口号18080

  • 相关阅读:
    molecular-graph-bert(一)
    从read 系统调用到 C10M 问题
    Windows11搭建kafka-python环境
    Ruby 条件判断
    嵌入式单片机项目开发的基本思想分享
    【概率论与数理统计】【线性代数】计算机保研复习
    matlab展示两个向量之间的差异
    Vue基础案例-成绩显示
    【通关MySQL】MySQL增删改查(CRUD)详解
    frps内网穿透
  • 原文地址:https://blog.csdn.net/m0_55834564/article/details/126898643