• Oozie 集成 Hive


    1) prepare shell case
    $ cd ~/work/oozie-5.2.1
    $ tree oozie/apps/hive2

    1. oozie/apps/hive2
    2. ├── job.properties
    3. ├── script.q
    4. └── workflow.xml

    $ cat oozie/apps/hive2/script.q

    1. --
    2. DROP TABLE IF EXISTS test;
    3. CREATE EXTERNAL TABLE test (a INT) STORED AS TEXTFILE LOCATION '${INPUT}';
    4. insert into test values(10);
    5. insert into test values(20);
    6. insert into test values(30);
    7. -- INSERT OVERWRITE DIRECTORY '${OUTPUT}' SELECT * FROM test;

    $ cat oozie/apps/hive2/job.properties

    1. nameNode=hdfs://localhost:9000
    2. resourceManager=localhost:8032
    3. queueName=default
    4. jdbcURL=jdbc:hive2://localhost:10000/default
    5. oozieRoot=user/${user.name}/oozie
    6. oozie.use.system.libpath=true
    7. oozie.wf.application.path=${nameNode}/${oozieRoot}/apps/hive2
    8. inputDir=data/hive2/table
    9. outputDir=data/hive2/output

    $ cat oozie/apps/hive2/workflow.xml

    1. <workflow-app xmlns="uri:oozie:workflow:1.0" name="hive2-wf">
    2. <start to="hive2-node"/>
    3. <action name="hive2-node">
    4. <hive2 xmlns="uri:oozie:hive2-action:1.0">
    5. <resource-manager>${resourceManager}</resource-manager>
    6. <name-node>${nameNode}</name-node>
    7. <prepare>
    8. <delete path="/${oozieRoot}/${outputDir}"/>
    9. <mkdir path="/${oozieRoot}/${outputDir}"/>
    10. </prepare>
    11. <configuration>
    12. <property>
    13. <name>mapred.job.queue.name</name>
    14. <value>${queueName}</value>
    15. </property>
    16. </configuration>
    17. <jdbc-url>${jdbcURL}</jdbc-url>
    18. <script>script.q</script>
    19. <param>INPUT=/${oozieRoot}/${inputDir}</param>
    20. <param>OUTPUT=/${oozieRoot}/${outputDir}</param>
    21. </hive2>
    22. <ok to="end"/>
    23. <error to="fail"/>
    24. </action>
    25. <kill name="fail">
    26. <message>Hive2 (Beeline) action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    27. </kill>
    28. <end name="end"/>
    29. </workflow-app>

    2) upload to hdfs
    $ hdfs dfs -put oozie/apps/hive2 oozie/apps/

    3) run and check
    firstly, guarantee HiveMetaStore and HiveServer2 have been started
    $ bin/oozie job -config oozie/apps/hive2/job.properties -run  
    job: 0000000-220630151459153-oozie-sun_-W
    $ bin/oozie job -info 0000000-220630151459153-oozie-sun_-W

    1. Job ID : 0000000-220630151459153-oozie-sun_-W
    2. ------------------------------------------------------------------------------------------------------------------------------------
    3. Workflow Name : hive2-wf
    4. App Path : hdfs://localhost:9000/user/sun_xo/oozie/apps/hive2
    5. Status : SUCCEEDED
    6. Run : 0
    7. User : sun_xo
    8. Group : -
    9. Created : 2022-06-30 09:58 GMT
    10. Started : 2022-06-30 09:58 GMT
    11. Last Modified : 2022-06-30 09:59 GMT
    12. Ended : 2022-06-30 09:59 GMT
    13. CoordAction ID: -
    14. Actions
    15. ------------------------------------------------------------------------------------------------------------------------------------
    16. ID Status Ext ID Ext Status Err Code
    17. ------------------------------------------------------------------------------------------------------------------------------------
    18. 0000000-220630151459153-oozie-sun_-W@:start: OK - OK -
    19. ------------------------------------------------------------------------------------------------------------------------------------
    20. 0000000-220630151459153-oozie-sun_-W@hive2-node OK application_1656559415643_0018SUCCEEDED -
    21. ------------------------------------------------------------------------------------------------------------------------------------
    22. 0000000-220630151459153-oozie-sun_-W@end OK - OK -
    23. ------------------------------------------------------------------------------------------------------------------------------------

    $ hdfs dfs -text "oozie/data/hive2/table/*"

    1. 10
    2. 20
    3. 30

    And you can get relevant job log as following:
    $ hdfs dfs -get /tmp/logs/sun_xo/logs/application_1656559415643_0018 logs/

  • 相关阅读:
    基于分时电价策略的家庭能量系统优化附Matlab代码
    Exception in thread “Thread-2“ java.util.ConcurrentModificationException异常的解决方案
    django开发个人博客系统
    Tensorflow 2.x入门教程
    [ansible]playbook结合项目解释执行步骤
    R语言绘制分组方框图四
    信道估计 | 信道
    2024年阿里云4月服务器有哪些优惠活动?
    Java Static
    图卷积神经网络GCN及其Pytorch实现
  • 原文地址:https://blog.csdn.net/sun_xo/article/details/125544121