• 编译Hudi


    图片
    大数据技术AI

    Flink/Spark/Hadoop/数仓,数据分析、面试,源码解读等干货学习资料

    129篇原创内容

    公众号

    版本分布

    • centos:centos8

    • hudi:0.10.1

    • spark:3.1.3

    • scala:2.12

    1、Maven安装

    1.1 手动安装

    (1)下载maven

    https://maven.apache.org/download.cgi

    图片

    (2)上传解压maven

    tar -zxvf apache-maven-3.6.1-bin.tar.gz -C /bigdata/
    
    • 1

    (3)添加环境变量到/etc/profile中

    #MAVEN_HOME
    export MAVEN_HOME=/bigdata/apache-maven-3.6.1
    export PATH=$PATH:$MAVEN_HOME/bin
    source /etc/profile
    
    • 1
    • 2
    • 3
    • 4

    (4)测试安装结果

    duo@bigdata100:~$ mvn -v
    Apache Maven 3.6.3
    Maven home: /bigdata/apache-maven-3.6.1
    Java version: 1.8.0_321, vendor: Oracle Corporation, runtime: /bigdata/module/jdk1.8.0_321/jre
    Default locale: en_US, platform encoding: UTF-8
    OS name: "linux", version: "5.13.0-44-generic", arch: "aarch64", family: "unix"
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    (5)修改setting.xml,指定为阿里云
    
    • 1
    nexus-aliyun central Nexus aliyun http://maven.aliyun.com/nexus/content/groups/public
    
    ![图片](https://mmbiz.qpic.cn/mmbiz_png/3xUEcImibkwI6fll2pI0aDLzCnu6MicRDL4wWHicmDAcibsa7tVWW47ibPdxkqeD7ibwFPVWcQfUWPld5VhRcdgBHoxw/640?wx_fmt=png)
    
    ### 1.2 apt或yum安装
    
    
    • 1
    • 2
    • 3
    • 4
    • 5

    apt install maven

    
    2、安装git
    -------
    
    
    • 1
    • 2
    • 3
    • 4

    yum install gitduo@bigdata100:~$ git --versiongit version 2.25.1

    
    3、构建hudi
    --------
    
    ### 3.1 通过国内镜像拉取源码
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6

    git clone --branch release-0.10.1 https://gitee.com/apache/Hudi.git

    
    
    • 1

    3.2 修改pom.xml

    
    
    • 1

    root@bigdata100:~# vim Hudi/pom.xml

    nexus-aliyun
    nexus-aliyun
    http://maven.aliyun.com/nexus/content/groups/public/

    true


    false

    
    
    • 1
    
    ### 3.3 构建
    
    不同spark版本的编译
    
      
    
    | Maven build options | Expected Spark bundle jar name | Notes |
    | :-- | :-- | :-- |
    | (empty) | hudi-spark-bundle\_2.11 (legacy bundle name) | For Spark 2.4.4 and Scala 2.11 (default options) |
    | `-Dspark2.4` | hudi-spark2.4-bundle\_2.11 | For Spark 2.4.4 and Scala 2.11 (same as default) |
    | `-Dspark2.4 -Dscala-2.12` | hudi-spark2.4-bundle\_2.12 | For Spark 2.4.4 and Scala 2.12 |
    | `-Dspark3.1 -Dscala-2.12` | hudi-spark3.1-bundle\_2.12 | For Spark 3.1.x and Scala 2.12 |
    | `-Dspark3.2 -Dscala-2.12` | hudi-spark3.2-bundle\_2.12 | For Spark 3.2.x and Scala 2.12 |
    | `-Dspark3` | hudi-spark3-bundle\_2.12 (legacy bundle name) | For Spark 3.2.x and Scala 2.12 |
    | `-Dscala-2.12` | hudi-spark-bundle\_2.12 (legacy bundle name) | For Spark 2.4.4 and Scala 2.12 |
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17

    mvn clean package -DskipTests -Dspark3 -Dscala-2.12

    
    耗时周末一天,终于编译成功
    
    ![图片](https://mmbiz.qpic.cn/mmbiz_png/3xUEcImibkwI6fll2pI0aDLzCnu6MicRDLuicgvz13fjBlATeErJ5CRWpC3hBapF9iarib4ISLhpM48XZcTt7gnDBtw/640?wx_fmt=png)
    
    ### 4、问题总结
    
    #### **Q1:dependencies at io.confluent:kafka-avro-serializer:jar**
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9

    ERROR] Failed to execute goal on project hudi-utilities_2.12: Could not resolve dependencies for project org.apache.hudi:hudi-utilities_2.12:jar:0.10.1: Failed to collect dependencies at io.confluent:kafka-avro-serializer:jar:5.3.4: Failed to read artifact descriptor for io.confluent:kafka-avro-serializer:jar:5.3.4: Could not transfer artifact io.confluent:kafka-avro-serializer:pom:5.3.4 from/to maven-default-http-blocker (http://0.0.0.0/): Blocked mirror for repositories: [nexus-aliyun (http://maven.aliyun.com/nexus/content/groups/public/, default, releases)] -> [Help 1]

    
    解决:将原来的mirror也打开,阿里仓库没有
    
    
    • 1
    • 2
    • 3

    Starting from versions 0.11, Hudi no longer requires spark-avro to be specified using --packages

    
    ![图片](https://img-blog.csdnimg.cn/img_convert/affd70b2b0cdbcff892968516699ce75.png)
    
    #### **Q2:The goal you specified requires a project to execute but there is no POM in this directory (/root). Please verify you invoked Maven from the correct directory**
    
    解决:切换到有pom的文件夹下才能执行
    
      
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
  • 相关阅读:
    Kamailio不被重视的模块:nat_traversal
    2022 年安徽省职业院校技能大赛高职组“软件测试”赛项竞赛任务书
    细品以太坊的“四棵树”——Merkle Patricia Trie
    2022-08-30 第五组 张明敏 学习笔记
    GC0053-STM32单片机NTC热敏电阻温度采集及控制LCD1602
    leetcode:367. 有效的完全平方数
    【Unity插件】 BehaviorTree 原理
    多目标决策之熵权法
    2022年国赛nginx 加固题
    Linux redict 输入输出重定向 详细使用方法 文件描述符
  • 原文地址:https://blog.csdn.net/hyunbar/article/details/126068957