官网地址:https://sqoop.apache.org/;
主要版本Sqoop1 和 Sqoop2 两个版本
注意:最新的稳定版本是1.4.7(下载,文档)。Sqoop2的最新版本是1.99.7(下载,文档)。请注意,1.99.7与1.4.7不兼容,且功能不完整,不适用于生产部署。 总结就是2版本不好用于生产
Sqoop ===》 SQL to hadoop ;Sqoop 是一款开源工具,主要用于Hadoop (Hive) 数据和传统数据库(mysql ,postgresql)数据传递;可以把传统数据库数据转换到Hadoop HDFS 中,也可以把HDFS 数据导入到关系型数据库中;
发展历史:起于2009 ,起初作为Hadoop 的第三方模块,后来为了方便部署和快速迭代,Sqoop独立出来作为apache 的项目;目前Apache已经终止Sqoop项目了
导入和导出命令翻译成MR查询来执行
主要是对 MapReduce的inputformat 和outputformat 定制
下载地址 https://archive.apache.org/dist/sqoop/

上传到服务器解压 后可以看到下面模块
[root@node1 sqoop]# pwd
/soft/sqoop
[root@node1 sqoop]# ls
bin CHANGELOG.txt conf ivy lib NOTICE.txt README.txt sqoop-patch-review.py src
build.xml COMPILING.txt docs ivy.xml LICENSE.txt pom-old.xml sqoop-1.4.6.jar sqoop-test-1.4.6.jar testdata
[root@node1 sqoop]#
//模板配置文件修改 在备份
[root@node1 conf]# cp sqoop-env-template.sh sqoop-env.sh
//修改配置文件
[root@node1 conf]# vim sqoop-env.sh
# 先配置hadoop hive 的地址
export HADOOP_COMMON_HOME=/soft/hadoop-3.3.0
export HADOOP_MAPRED_HOME=/soft/hadoop-3.3.0
export HIVE_HOME=/soft/hive
## ZOOCFGDIR HBASE_HOME 可以先不配置也可以
[root@node1 lib]# ls *mysql*
mysql-connector-java-5.1.37.jar
[root@node1 lib]# pwd
/soft/sqoop/lib
[root@node1 lib]#
执行命令 ./sqoop help
[root@node1 bin]# ./sqoop help
Warning: /soft/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /soft/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /soft/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /soft/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
2022-11-03 22:37:32,708 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
usage: sqoop COMMAND [ARGS]
Available commands:
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
import-mainframe Import datasets from a mainframe server to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information
See 'sqoop help COMMAND' for information on a specific command.
[root@node1 bin]#
执行 mysql 测试连接
[root@node1 bin]# ./sqoop list-databases --connect jdbc:mysql://192.168.141.155/sqoop --username root --password hadoop
..............................
information_schema
ahs
cm
hive3
mysql
performance_schema
sqoop
sys
OK 环境正常
官方指导手册 https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html
导入: 关系型数据库(MySQL ,Oracle) 到大数据集群 HDFS HIVE HBASE ,使用关键字 import
| 命令 | 含义 | |
|---|---|---|
| connect | jdbc url | |
| username | jdbc username | |
| password | jdbc password | |
| table | mysql 里面的数据源 | |
| target-dir | 目标输出文件 | |
| delete-target-dir | 导出前删除目标目录 | |
| fields-terminated-by | 字段分割符 | |
| query | SQL 查询语句,有这个参数就不需要table,必须要加$CONDITIONS; query 里面的SQL 使用单引号 ' ,使用双引号 " 需要手动转义 | |
| where | where 会和 query 里面的查询冲突 |
drop table if exists student;
create table student(
id int auto_increment primary key ,
num int,
name varchar(20)
) charset =utf8;
INSERT INTO student(num, name) VALUES (95001, '李勇');
INSERT INTO student(num, name) VALUES (95002, '刘晨');
INSERT INTO student(num, name) VALUES (95003, '王敏');
INSERT INTO student(num, name) VALUES (95004, '张立');
INSERT INTO student(num, name) VALUES (95005, '刘刚');
INSERT INTO student(num, name) VALUES (95006, '孙庆');
INSERT INTO student(num, name) VALUES (95007, '易思玲');
INSERT INTO student(num, name) VALUES (95008, '李娜');
INSERT INTO student(num, name) VALUES (95009, '梦圆圆');
INSERT INTO student(num, name) VALUES (95010, '孔小涛');
.......
[root@node1 bin]# ./sqoop import \
--connect jdbc:mysql://192.168.141.155/sqoop \
--username root --password hadoop \
--table student \
--target-dir /user/sqoop_mysql_to_hdfs \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t"
Warning: /soft/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /soft/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /soft/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /soft/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
2022-11-03 23:15:35,237 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
2022-11-03 23:15:35,304 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
2022-11-03 23:15:35,389 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
2022-11-03 23:15:35,389 INFO tool.CodeGenTool: Beginning code generation
2022-11-03 23:15:35,652 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
2022-11-03 23:15:35,671 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
2022-11-03 23:15:35,676 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /soft/hadoop-3.3.0
注: /tmp/sqoop-root/compile/14cc91c986370cd735b8a241a21c6874/student.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
2022-11-03 23:15:36,758 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/14cc91c986370cd735b8a241a21c6874/student.jar
2022-11-03 23:15:37,458 INFO tool.ImportTool: Destination directory /user/sqoop_mysql_to_hdfs is not present, hence not deleting.
2022-11-03 23:15:37,458 WARN manager.MySQLManager: It looks like you are importing from mysql.
2022-11-03 23:15:37,458 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
2022-11-03 23:15:37,458 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
2022-11-03 23:15:37,458 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
2022-11-03 23:15:37,462 INFO mapreduce.ImportJobBase: Beginning import of student
2022-11-03 23:15:37,463 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
2022-11-03 23:15:37,472 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
2022-11-03 23:15:37,488 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
2022-11-03 23:15:37,573 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at node1/192.168.141.151:8032
2022-11-03 23:15:37,910 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1663767415605_0024
2022-11-03 23:15:39,401 INFO db.DBInputFormat: Using read commited transaction isolation
2022-11-03 23:15:39,455 INFO mapreduce.JobSubmitter: number of splits:1
2022-11-03 23:15:39,566 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1663767415605_0024
2022-11-03 23:15:39,566 INFO mapreduce.JobSubmitter: Executing with tokens: []
2022-11-03 23:15:39,726 INFO conf.Configuration: resource-types.xml not found
2022-11-03 23:15:39,726 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2022-11-03 23:15:39,996 INFO impl.YarnClientImpl: Submitted application application_1663767415605_0024
2022-11-03 23:15:40,033 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1663767415605_0024/
2022-11-03 23:15:40,034 INFO mapreduce.Job: Running job: job_1663767415605_0024
2022-11-03 23:15:49,194 INFO mapreduce.Job: Job job_1663767415605_0024 running in uber mode : false
2022-11-03 23:15:49,195 INFO mapreduce.Job: map 0% reduce 0%
2022-11-03 23:16:05,319 INFO mapreduce.Job: map 100% reduce 0%
2022-11-03 23:16:05,331 INFO mapreduce.Job: Job job_1663767415605_0024 completed successfully
2022-11-03 23:16:05,421 INFO mapreduce.Job: Counters: 33
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=273519
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=737
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
HDFS: Number of bytes read erasure-coded=0
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=13522
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=13522
Total vcore-milliseconds taken by all map tasks=13522
Total megabyte-milliseconds taken by all map tasks=13846528
Map-Reduce Framework
Map input records=44
Map output records=44
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=67
CPU time spent (ms)=810
Physical memory (bytes) snapshot=199610368
Virtual memory (bytes) snapshot=2785349632
Total committed heap usage (bytes)=148897792
Peak Map Physical memory (bytes)=199610368
Peak Map Virtual memory (bytes)=2785349632
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=737
2022-11-03 23:16:05,428 INFO mapreduce.ImportJobBase: Transferred 737 bytes in 27.9275 seconds (26.3898 bytes/sec)
2022-11-03 23:16:05,433 INFO mapreduce.ImportJobBase: Retrieved 44 records.
从日志看是到处成功
HDFS 验证导入的数据

导入数据成功 ~
[root@node1 bin]# ./sqoop import \
--connect jdbc:mysql://192.168.141.155/sqoop \
--username root --password hadoop \
--target-dir /user/sqoop_mysql_to_hdfs \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t" \
--query 'select id , name from student ;'
[root@node1 bin]# ./sqoop import \
> --connect jdbc:mysql://192.168.141.155/sqoop \
> --username root --password hadoop \
> --target-dir /user/sqoop_mysql_to_hdfs \
> --delete-target-dir \
> --num-mappers 1 \
> --fields-terminated-by "\t" \
> --query 'select id , name from student ;'
Warning: /soft/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /soft/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /soft/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /soft/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
2022-11-03 23:33:57,756 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
2022-11-03 23:33:57,822 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
2022-11-03 23:33:57,897 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
2022-11-03 23:33:57,897 INFO tool.CodeGenTool: Beginning code generation
2022-11-03 23:33:57,902 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Query [select id , name from student ;] must contain '$CONDITIONS' in WHERE clause.
at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:300)
at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1833)
at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1645)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
-----修改为
[root@node1 bin]# ./sqoop import --connect jdbc:mysql://192.168.141.155/sqoop --username root --password hadoop --target-dir /user/sqoop_mysql_to_hdfs --delete-target-dir --num-mappers 1 --fields-terminated-by "\t" --query 'select id , name from student where 1=1 and $CONDITIONS;'
注意: query 里面的SQL 使用单引号 ' ,使用双引号 " 需要手动转义
验证: 导入成功

--columns id,name \
[root@node1 bin]# ./sqoop import \
--connect jdbc:mysql://192.168.141.155/sqoop \
--username root --password hadoop \
--table student \
--columns id,name \
--target-dir /user/sqoop_mysql_to_hdfs \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t"
[root@node1 bin]# ./sqoop import \
--connect jdbc:mysql://192.168.141.155/sqoop \
--username root --password hadoop \
--table student \
--columns id,name \
--where "id=1" \
--target-dir /user/sqoop_mysql_to_hdfs \
--delete-target-dir \
--num-mappers 1 \
--fields-terminated-by "\t"
[root@node1 bin]# ./sqoop import \
--connect jdbc:mysql://192.168.141.155/sqoop \
--username root --password hadoop \
--table student \
--columns id,name \
--hive-import \
--hive-overwrite \
--num-mappers 1 \
--fields-terminated-by "\t" \
--hive-table sqooptest_mysql_tohive
验证: 导入成功~

导出: HDFS/HIVE/HBase 到关系型数据库 MySQL Oracle
创建一个数据库
create table student2 like student;
[root@node1 bin]# ./sqoop export \
--connect "jdbc:mysql://192.168.141.155/sqoop?useUnicode=true&characterEncoding=utf-8" \
--username root --password hadoop \
--table student2 \
--columns id,name \
--num-mappers 1 \
--input-fields-terminated-by "\t" \
--export-dir /user/sqoop_mysql_to_hdfs
注意:?useUnicode=true&characterEncoding=utf-8 不加这个MySQL 数据库 数据可能会乱码
验证:正确
输出压缩文件
只需将参数放到 Sqoop 命令字符串中即可。
--compression-codec
snappy压缩
--compression-codec org.apache.hadoop.io.compress.SnappyCodec
Gzip压缩
--compression-codec org.apache.hadoop.io.compress.GzipCodec
Bzip压缩
--compression-codec org.apache.hadoop.io.compress.BZip2Codec