主机名:cmcc01为例
操作系统:centos7
| 安装部署软件 | 版本 | 部署方式 |
| centos | 7 | |
| zookeeper | zookeeper-3.4.10 | 伪分布式 |
| hadoop | hadoop-3.1.3 | 伪分布式 |
| hive | hive-3.1.3-bin | 伪分布式 |
| clickhouse | 21.11.10.1-2 | 单节点多实例 |
| dolphinscheduler | 3.0.0 | 单节点 |
| kettle | pdi-ce-9.3.0.0 | 单节点 |
| sqoop | sqoop-1.4.7 | 单节点 |
| seatunnel | seatunnel-incubating-2.1.2 | 单节点 |
| spark | spark-2.4.8 | 单节点 |
整合mysql+hive
官网:https://sourceforge.net/projects/pentaho/files/
unzip /opt/package/pdi-ce-9.3.0.0-428.zip -d /opt/software/
- vim ~/.bash_profile
- # 添加以下内容
-
- # JDK
- export JAVA_HOME=/opt/software/jdk1.8.0_321
- export PATH=$PATH:${JAVA_HOME}/bin
使配置生效
source /etc/profile
chmod g+x /opt/software/data-integration/kitchen.sh
- [root@cmcc01 data-integration]#
- [root@cmcc01 data-integration]#
- [root@cmcc01 data-integration]# ./kitchen.sh
- #######################################################################
- WARNING: no libwebkitgtk-1.0 detected, some features will be unavailable
- Consider installing the package with apt-get or yum.
- e.g. 'sudo apt-get install libwebkitgtk-1.0-0'
- #######################################################################
-
- Options:
- -rep = Repository name
- -user = Repository username
- -trustuser = !Kitchen.ComdLine.RepUsername!
- -pass = Repository password
- -job = The name of the job to launch
- -dir = The directory (dont forget the leading /)
- -file = The filename (Job XML) to launch
- -level = The logging level (Basic, Detailed, Debug, Rowlevel, Error, Minimal, Nothing)
- -logfile = The logging file to write to
- -listdir = List the directories in the repository
- -listjobs = List the jobs in the specified directory
- -listrep = List the available repositories
- -norep = Do not log into the repository
- -version = show the version, revision and build date
- -param = Set a named parameter
=. For example -param:FILE=customers.csv - -listparam = List information concerning the defined parameters in the specified job.
- -export = Exports all linked resources of the specified job. The argument is the name of a ZIP file.
- -custom = Set a custom plugin specific option as a String value in the job using
=, for example: -custom:COLOR=Red - -maxloglines = The maximum number of log lines that are kept internally by Kettle. Set to 0 to keep all rows (default)
- -maxlogtimeout = The maximum age (in minutes) of a log line while being kept internally by Kettle. Set to 0 to keep all rows indefinitely (default)
-
- [root@cmcc01 data-integration]#
- [root@cmcc01 data-integration]#
此处有告警
- wget ftp://ftp.pbone.net/mirror/ftp5.gwdg.de/pub/opensuse/repositories/home:/matthewdva:/build:/EPEL:/el7/RHEL_7/x86_64/webkitgtk-2.4.9-1.el7.x86_64.rpm
- yum -y install webkitgtk-2.4.9-1.el7.x86_64.rpm
-
- # 再次执行命令,告警消除
- [root@cmcc01 package]#
- [root@cmcc01 package]# /opt/software/data-integration/kitchen.sh
- Options:
- -rep = Repository name
- -user = Repository username
- -trustuser = !Kitchen.ComdLine.RepUsername!
- -pass = Repository password
- -job = The name of the job to launch
- -dir = The directory (dont forget the leading /)
- -file = The filename (Job XML) to launch
- -level = The logging level (Basic, Detailed, Debug, Rowlevel, Error, Minimal, Nothing)
- -logfile = The logging file to write to
- -listdir = List the directories in the repository
- -listjobs = List the jobs in the specified directory
- -listrep = List the available repositories
- -norep = Do not log into the repository
- -version = show the version, revision and build date
- -param = Set a named parameter
=. For example -param:FILE=customers.csv - -listparam = List information concerning the defined parameters in the specified job.
- -export = Exports all linked resources of the specified job. The argument is the name of a ZIP file.
- -custom = Set a custom plugin specific option as a String value in the job using
=, for example: -custom:COLOR=Red - -maxloglines = The maximum number of log lines that are kept internally by Kettle. Set to 0 to keep all rows (default)
- -maxlogtimeout = The maximum age (in minutes) of a log line while being kept internally by Kettle. Set to 0 to keep all rows indefinitely (default)
-
- [root@cmcc01 package]#
- [root@cmcc01 package]#
-
- # 执行转换
- # 编写测试转换,执行如下命令即可
- /opt/software/data-integration/pan.sh -file=/opt/kettle-spoon/ktr/test/test1.ktr log=test1.log
-
- # 执行job
- /opt/software/data-integration/kitchen.sh -file=/opt/kettle-spoon/ktr/test/SechuldUpdate.kjb log=timeLogUpdate.log
此时当前用户下会多一个文件: ~/.kettle/kettle.properties
如果没有可自行创建
- vim ~/.kettle/kettle.properties
- 添加以下内容:
-
- ##MYSQL
- MYSQL_HOST=localhost
- MYSQL_DB_PORT=3306
- MYSQL_DB_USER=root
- MYSQL_DB_PASSWORD=123qwe
- MYSQL_DB_NAME=flinkcdc

cp /opt/package/mysql-connector-java-8.0.20.jar /opt/software/data-integration/lib




- # 运行job
- /opt/software/data-integration/kitchen.sh -file=/opt/package/kettle_job_test.kjb

- # 创建hive jar包软连接
- ln -s /opt/software/hive-3.1.3-bin/lib/*.jar /opt/software/data-integration/lib
可能会报错:File exists,可忽略



/opt/software/data-integration/kitchen.sh -file=/opt/package/kettle_job_hive_test.kjb
