• Idea中运行sparkSQL


    这是一个wordcount例子

    1.准备wordcount的文本

    2.hadoop环境搭建

    下载

     解压,hadoop到某个目录

    将hadoop.dll和winutils.exe放入C:\Windows\System32目录下

    将hadoop.dll和winutils.exe放入解压后的hadoop\bin目录下

    配置环境变量

    cmd,输入hadoop -version

    成功显示版本

    重启IDEA 

    3.scala环境搭建

    离线在idea中安装scala插件_我要用代码向我喜欢的女孩表白的博客-CSDN博客_idea离线安装scala插件

    4.spark的相关依赖

    版本一定要精准,否则,会报错

    1. <project xmlns="http://maven.apache.org/POM/4.0.0"
    2. xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    3. xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    4. <modelVersion>4.0.0modelVersion>
    5. <properties>
    6. <project.build.sourceEncoding>UTF-8project.build.sourceEncoding>
    7. <project.reporting.outputEncoding>UTF-8project.reporting.outputEncoding>
    8. <maven.compiler.source>1.8maven.compiler.source>
    9. <maven.compiler.target>1.8maven.compiler.target>
    10. <scala.version>2.11.12scala.version>
    11. <spark.version>2.4.0spark.version>
    12. <java.version>1.8java.version>
    13. properties>
    14. <groupId>org.examplegroupId>
    15. <artifactId>DataPrepareartifactId>
    16. <version>1.0-SNAPSHOTversion>
    17. <dependencies>
    18. <dependency>
    19. <groupId>org.scala-langgroupId>
    20. <artifactId>scala-libraryartifactId>
    21. <version>${scala.version}version>
    22. dependency>
    23. <dependency>
    24. <groupId>org.apache.sparkgroupId>
    25. <artifactId>spark-core_2.11artifactId>
    26. <version>${spark.version}version>
    27. <exclusions>
    28. <exclusion>
    29. <groupId>org.slf4jgroupId>
    30. <artifactId>slf4j-log4j12artifactId>
    31. exclusion>
    32. exclusions>
    33. dependency>
    34. <dependency>
    35. <groupId>org.slf4jgroupId>
    36. <artifactId>slf4j-simpleartifactId>
    37. <version>1.7.25version>
    38. <scope>compilescope>
    39. dependency>
    40. <dependency>
    41. <groupId>org.apache.sparkgroupId>
    42. <artifactId>spark-sql_2.11artifactId>
    43. <version>${spark.version}version>
    44. dependency>
    45. dependencies>
    46. <build>
    47. <sourceDirectory>src/main/scalasourceDirectory>
    48. <plugins>
    49. <plugin>
    50. <groupId>org.scala-toolsgroupId>
    51. <artifactId>maven-scala-pluginartifactId>
    52. <version>2.15.0version>
    53. <executions>
    54. <execution>
    55. <goals>
    56. <goal>compilegoal>
    57. <goal>testCompilegoal>
    58. goals>
    59. <configuration>
    60. <args>
    61. <arg>-dependencyfilearg>
    62. <arg>${project.build.directory}/.scala_dependenciesarg>
    63. args>
    64. configuration>
    65. execution>
    66. executions>
    67. plugin>
    68. <plugin>
    69. <groupId>org.apache.maven.pluginsgroupId>
    70. <artifactId>maven-surefire-pluginartifactId>
    71. <version>2.6version>
    72. <configuration>
    73. <useFile>falseuseFile>
    74. <disableXmlReport>truedisableXmlReport>
    75. <includes>
    76. <include>**/*Test.*include>
    77. <include>**/*Suite.*include>
    78. includes>
    79. configuration>
    80. plugin>
    81. <plugin>
    82. <artifactId>maven-assembly-pluginartifactId>
    83. <configuration>
    84. <archive>
    85. <manifest>
    86. <mainClass>mainClass>
    87. manifest>
    88. archive>
    89. <descriptorRefs>
    90. <descriptorRef>jar-with-dependenciesdescriptorRef>
    91. descriptorRefs>
    92. configuration>
    93. plugin>
    94. plugins>
    95. build>
    96. project>

    4.你很可能会遇到的问题

    忘记截图了

    5.代码部分(最简单)

    一个maven工程,创建一个scala目录。右键目录->mark directory as ->sources Root

    他就会变蓝,变蓝就可以创建工程

     

    将wordcount.txt发送到,hdfs上

    先上传到服务器,然后hadoop fs -put /wordcount.txt /

    1. import org.apache.spark.rdd.RDD
    2. import org.apache.spark.{SparkConf, SparkContext}
    3. object WordCount {
    4. def main(args: Array[String]): Unit = {
    5. print(1)
    6. val conf: SparkConf = new SparkConf().setMaster("local[3]").setAppName("hdfsTest")
    7. val sc = new SparkContext(conf)
    8. val value: RDD[String] = sc.textFile("hdfs://192.168.30.101:8020/wordcount.txt")
    9. value.flatMap(line=>line.split(",")).map(word=>(word,1)).reduceByKey(_+_).collect().foreach(
    10. r=>print(r)
    11. )
    12. }
    13. }

    6.将测试好的代码,打包成jar

    7.打包后,想在linux上跑

  • 相关阅读:
    代码整洁之原则
    C/C++面经嵌入式面经软件开发面经<25/30>-- 操作系统(四)
    APP自动化测试-12.Appium日志分析(原理)
    Redis -- 基本知识说明
    算法练习|Leetcode49字母异位词分词 ,Leetcode128最长连续序列,Leetcode3无重复字符的最长子串,sql总结
    Unity塔防游戏的制作与实现
    课程大纲:图像处理中的矩阵计算
    SwiftUI 如何保证 Text 中字符数量相等的字符串显示宽度一定相同?
    EMQX ECP 2.0 工业互联数据平台产品发布会
    【算法】树形DP③ 监控二叉树 ⭐(二叉树染色&二叉树灯饰)!
  • 原文地址:https://blog.csdn.net/qq_38403590/article/details/126118004