• 手把手教你 在IDEA搭建 SparkSQL的开发环境


     


    1. 创建maven项目 在IDEA中添加scala插件 并添加scala的sdk

    https://www.cnblogs.com/bajiaotai/p/15381309.html

    2. 相关依赖jar的引入 配置pom.xml

    2.1 pom.xml 示例 (spark版本: 3.0.0  scala版本: 2.12)

    复制代码
    xml version="1.0" encoding="UTF-8"?>
    <project xmlns="http://maven.apache.org/POM/4.0.0"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
        <modelVersion>4.0.0modelVersion>
    
        <groupId>com.dxm.sparksqlgroupId>
        <artifactId>sparksqlartifactId>
        <version>1.0-SNAPSHOTversion>
    
         
        <properties>
            <spark.version>3.0.0spark.version>
            <scala.version>2.12scala.version>
        properties>
    
        <dependencies>
    
            <dependency>
                <groupId>org.apache.sparkgroupId>
                <artifactId>spark-core_${scala.version}artifactId>
                <version>${spark.version}version>
            dependency>
    
            <dependency>
                <groupId>org.apache.sparkgroupId>
                <artifactId>spark-yarn_${scala.version}artifactId>
                <version>${spark.version}version>
            dependency>
            <dependency>
                <groupId>org.apache.sparkgroupId>
                <artifactId>spark-sql_${scala.version}artifactId>
                <version>${spark.version}version>
            dependency>
    
            <dependency>
                <groupId>mysqlgroupId>
                <artifactId>mysql-connector-javaartifactId>
                <version>5.1.27version>
            dependency>
    
            <dependency>
                <groupId>org.apache.sparkgroupId>
                <artifactId>spark-hive_${scala.version}artifactId>
                <version>${spark.version}version>
            dependency>
    
            <dependency>
                <groupId>org.apache.hivegroupId>
                <artifactId>hive-execartifactId>
                <version>1.2.1version>
            dependency>
    
        dependencies>
    
    
    project>
    复制代码

    2.2 spark版本与scala版本对应关系的问题

    #根据下面链接 即可查询 spark版本和scala版本的对应关系及依赖配置
    https://www.cnblogs.com/bajiaotai/p/16270971.html

    2.3 在scala代码中查看运行时的scala版本

    println(util.Properties.versionString)

    2.4 FAQ 因Spark版本和Scala版本不一致导致的报错

    待补充

    3. 代码测试

    复制代码
    object TestSparkSQLEnv extends App {
    
      //1.初始化 SparkSession 对象
      val spark = SparkSession
        .builder
        .master("local")
        //.appName("SparkSql Entrance Class SparkSession")
        //.config("spark.some.config.option", "some-value")
        .getOrCreate()
    
      //2.通过 SparkSession 获取 SparkContext
      private val sc: SparkContext = spark.sparkContext
    
      //3.设置日志级别
      // Valid log levels include: ALL, DEBUG, ERROR, FATAL, INFO, OFF, TRACE, WARN
      // This overrides any user-defined log settings //会覆盖掉 用户设置的日志级别 比如 log4j.properties
      sc.setLogLevel("ERROR")
    
      import spark.implicits._
    
      //4.创建DataFream
      private val rdd2DfByCaseClass: DataFrame = spark.sparkContext
        .makeRDD(Array(Person("疫情", "何时"), Person("结束", "呢")))
        .toDF("名称", "行动")
      rdd2DfByCaseClass.show()
      //  +----+----+
      //  |名称|行动|
      //  +----+----+
      //  |疫情|何时|
      //  |结束|  呢|
      //  +----+----+
    
      //5.关闭资源
      spark.stop()
    
    }
    复制代码

    4. 设置日志级别

    4.1 运行时日志级别(优先级最高)

    //运行时指定 日志级别 (只在提交的Application有效)
    spark.sparkContext.setLogLevel("INFO")

    4.2 添加 resources/log4j.properties 配置文件

    当不指定时,默认使用 Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties

    复制代码
    #
    # Licensed to the Apache Software Foundation (ASF) under one or more
    # contributor license agreements.  See the NOTICE file distributed with
    # this work for additional information regarding copyright ownership.
    # The ASF licenses this file to You under the Apache License, Version 2.0
    # (the "License"); you may not use this file except in compliance with
    # the License.  You may obtain a copy of the License at
    #
    #    http://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    #
    
    # Set everything to be logged to the console
    log4j.rootCategory=info, console
    log4j.appender.console=org.apache.log4j.ConsoleAppender
    log4j.appender.console.target=System.err
    log4j.appender.console.layout=org.apache.log4j.PatternLayout
    log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
    
    # Set the default spark-shell log level to WARN. When running the spark-shell, the
    # log level for this class is used to overwrite the root logger's log level, so that
    # the user can have different defaults for the shell and regular Spark apps.
    log4j.logger.org.apache.spark.repl.Main=WARN
    
    # Settings to quiet third party logs that are too verbose
    log4j.logger.org.sparkproject.jetty=WARN
    log4j.logger.org.sparkproject.jetty.util.component.AbstractLifeCycle=ERROR
    log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
    log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
    
    # SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs
    # in SparkSQL with Hive support
    log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
    log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR
    
    # Parquet related logging
    log4j.logger.org.apache.parquet.CorruptStatistics=ERROR
    log4j.logger.parquet.CorruptStatistics=ERROR
    复制代码

    5. 结束语

    如果能正常执行,恭喜你环境搭建没问题,如果遇到问题请留言共同探讨,如果对您有所帮助,麻烦点赞加评论

     

  • 相关阅读:
    Java网络编程
    springmvc入门
    vuex怎么防止数据刷新丢失?
    7月更新 | Java on Azure Tooling
    idea断点调试如何设置条件
    计算机毕业设计SSM办公自动化系统【附源码数据库】
    QFrame类学习笔记
    OTA设计思路
    进程,线程切换
    图像操作的基石Numpy
  • 原文地址:https://www.cnblogs.com/bajiaotai/p/16270916.html