/worker-server/conf/dolphinscheduler_env.sh

org.apache.hadoop.security.AccessControlException): Permission denied: user=dolphin, access=WRITE
/worker-server/conf/common.properties
dolphinscheduler-data-quality-3.0.1-SNAPSHOT.jar :保持jar包名称和编译后的名称一致,默认为dolphinscheduler-data-quality-dev-SNAPSHOT.jarsudo usermod -a -G bigdata dolphin
直接对着官网操作吧,最开始官网资料还没有3.0,后来就不看了,貌似多走了一些路。

按理说应该成功才对,结果不会小于0的

错误信息:
[INFO] 2022-10-26 08:20:10.909 +0000 [taskAppId=TASK-20221026-7339496693088_2-471-499] TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.dq.DataQualityTask:[205] - process has exited, execute path:/tmp/dolphinscheduler3/exec/process/7338799615584/7339496693088_2/471/499, processId:260177 ,exitStatusCode:1 ,processWaitForStatus:true ,processExitValue:1
[INFO] 2022-10-26 08:20:11.582 +0000 [taskAppId=TASK-20221026-7339496693088_2-471-499] TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.dq.DataQualityTask:[63] - -> 22/10/26 16:20:10 INFO Client: Application report for application_1658989604268_0102 (state: FAILED)
22/10/26 16:20:10 INFO Client:
client token: N/A
diagnostics: Application application_1658989604268_0102 failed 1 times (global limit =2; local limit is =1) due to ApplicationMaster for attempt appattempt_1658989604268_0102_000001 timed out. Failing the application.
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1666771787933
final status: FAILED
tracking URL: http://bigdata02:8088/cluster/app/application_1658989604268_0102
user: hadoop
22/10/26 16:20:10 ERROR Client: Application diagnostics message: Application application_1658989604268_0102 failed 1 times (global limit =2; local limit is =1) due to ApplicationMaster for attempt appattempt_1658989604268_0102_000001 timed out. Failing the application.
Exception in thread "main" org.apache.spark.SparkException: Application application_1658989604268_0102 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1283)
at org.apache.spark.deploy.yarn.YarnClusterApplication.start(Client.scala:1677)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
22/10/26 16:20:10 INFO ShutdownHookManager: Shutdown hook called
22/10/26 16:20:10 INFO ShutdownHookManager: Deleting directory /tmp/spark-869c7fa1-4136-4458-8626-27961ce772e1
22/10/26 16:20:10 INFO ShutdownHookManager: Deleting directory /tmp/spark-873afb0a-8352-48c7-981a-7be426e1c145
[INFO] 2022-10-26 08:20:11.582 +0000 [taskAppId=TASK-20221026-7339496693088_2-471-499] TaskLogLogger-class org.apache.dolphinscheduler.plugin.task.dq.DataQualityTask:[57] - FINALIZE_SESSION

测试用的spark是3.2版本,不知道报错是不是这个原因,暂时没时间研究了,有知道的欢迎指教




简单过了下,不深究了,后面还会单独研究后台核心代码
组装好的命令没有看到--class配置,因为数据质量jar包中已有main函数,会直接调用该Main-Class: org.apache.dolphinscheduler.data.quality.DataQualityApplication
${SPARK_HOME2}/bin/spark-submit --master yarn --deploy-mode cluster --driver-cores 1 --driver-memory 512M --num-executors 2 --executor-cores 2 --executor-memory 2G --queue default --conf spark.yarn.maxAppAttempts=1 /home/dolphinscheduler/app/dolphinscheduler/worker-server/libs/dolphinscheduler-data-quality-3.0.1-SNAPSHOT.jar


jar包设置了Main-Class
