• Sqoop学习


    Sqoop 学习

    1.简介

    官网地址:https://sqoop.apache.org/;

    主要版本Sqoop1 和 Sqoop2 两个版本

    注意:最新的稳定版本是1.4.7(下载,文档)。Sqoop2的最新版本是1.99.7(下载,文档)。请注意,1.99.7与1.4.7不兼容,且功能不完整,不适用于生产部署。 总结就是2版本不好用于生产

    Sqoop ===》 SQL to hadoop ;Sqoop 是一款开源工具,主要用于Hadoop (Hive) 数据和传统数据库(mysql ,postgresql)数据传递;可以把传统数据库数据转换到Hadoop HDFS 中,也可以把HDFS 数据导入到关系型数据库中;

    发展历史:起于2009 ,起初作为Hadoop 的第三方模块,后来为了方便部署和快速迭代,Sqoop独立出来作为apache 的项目;目前Apache已经终止Sqoop项目了

    2.Sqoop实现原理

    导入和导出命令翻译成MR查询来执行

    主要是对 MapReduce的inputformat 和outputformat 定制

    3.安装部署

    3.1 安装包下载

    下载地址 https://archive.apache.org/dist/sqoop/

    在这里插入图片描述

    上传到服务器解压 后可以看到下面模块

    [root@node1 sqoop]# pwd
    /soft/sqoop
    [root@node1 sqoop]# ls
    bin        CHANGELOG.txt  conf  ivy      lib          NOTICE.txt   README.txt       sqoop-patch-review.py  src
    build.xml  COMPILING.txt  docs  ivy.xml  LICENSE.txt  pom-old.xml  sqoop-1.4.6.jar  sqoop-test-1.4.6.jar   testdata
    [root@node1 sqoop]# 
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6

    3.2 配置文件修改

    //模板配置文件修改 在备份
    [root@node1 conf]# cp sqoop-env-template.sh sqoop-env.sh
    //修改配置文件
    [root@node1 conf]# vim sqoop-env.sh
    
    
    # 先配置hadoop  hive 的地址 
    export HADOOP_COMMON_HOME=/soft/hadoop-3.3.0
    export HADOOP_MAPRED_HOME=/soft/hadoop-3.3.0
    
    export HIVE_HOME=/soft/hive
    
    ## ZOOCFGDIR  HBASE_HOME 可以先不配置也可以
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14

    3.3 拷贝数据库 jdbc 的驱动包

    [root@node1 lib]# ls *mysql*
    mysql-connector-java-5.1.37.jar
    [root@node1 lib]# pwd
    /soft/sqoop/lib
    [root@node1 lib]#
    
    • 1
    • 2
    • 3
    • 4
    • 5

    3.4 验证 Sqoop

    执行命令 ./sqoop help

    [root@node1 bin]# ./sqoop help
    Warning: /soft/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../hbase does not exist! HBase imports will fail.
    Please set $HBASE_HOME to the root of your HBase installation.
    Warning: /soft/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../hcatalog does not exist! HCatalog jobs will fail.
    Please set $HCAT_HOME to the root of your HCatalog installation.
    Warning: /soft/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../accumulo does not exist! Accumulo imports will fail.
    Please set $ACCUMULO_HOME to the root of your Accumulo installation.
    Warning: /soft/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../zookeeper does not exist! Accumulo imports will fail.
    Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
    2022-11-03 22:37:32,708 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
    usage: sqoop COMMAND [ARGS]
    
    Available commands:
      codegen            Generate code to interact with database records
      create-hive-table  Import a table definition into Hive
      eval               Evaluate a SQL statement and display the results
      export             Export an HDFS directory to a database table
      help               List available commands
      import             Import a table from a database to HDFS
      import-all-tables  Import tables from a database to HDFS
      import-mainframe   Import datasets from a mainframe server to HDFS
      job                Work with saved jobs
      list-databases     List available databases on a server
      list-tables        List available tables in a database
      merge              Merge results of incremental imports
      metastore          Run a standalone Sqoop metastore
      version            Display version information
    
    See 'sqoop help COMMAND' for information on a specific command.
    [root@node1 bin]#
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30

    执行 mysql 测试连接

    [root@node1 bin]# ./sqoop list-databases --connect jdbc:mysql://192.168.141.155/sqoop --username root --password hadoop
    ..............................
    information_schema
    ahs
    cm
    hive3
    mysql
    performance_schema
    sqoop
    sys
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11

    OK 环境正常

    4. 数据导入 import

    4.1 关键字含义

    官方指导手册 https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html

    导入: 关系型数据库(MySQL ,Oracle) 到大数据集群 HDFS HIVE HBASE ,使用关键字 import

    命令含义
    connectjdbc url
    usernamejdbc username
    passwordjdbc password
    tablemysql 里面的数据源
    target-dir目标输出文件
    delete-target-dir导出前删除目标目录
    fields-terminated-by字段分割符
    querySQL 查询语句,有这个参数就不需要table,必须要加$CONDITIONS;
    query 里面的SQL 使用单引号 ' ,使用双引号 " 需要手动转义
    wherewhere 会和 query 里面的查询冲突

    4.2 数据准备

    drop table if exists  student;
    create table student(
        id int auto_increment primary key ,
        num int,
        name varchar(20)
    ) charset =utf8;
    INSERT INTO student(num, name) VALUES (95001, '李勇');
    INSERT INTO student(num, name) VALUES (95002, '刘晨');
    INSERT INTO student(num, name) VALUES (95003, '王敏');
    INSERT INTO student(num, name) VALUES (95004, '张立');
    INSERT INTO student(num, name) VALUES (95005, '刘刚');
    INSERT INTO student(num, name) VALUES (95006, '孙庆');
    INSERT INTO student(num, name) VALUES (95007, '易思玲');
    INSERT INTO student(num, name) VALUES (95008, '李娜');
    INSERT INTO student(num, name) VALUES (95009, '梦圆圆');
    INSERT INTO student(num, name) VALUES (95010, '孔小涛');
    .......
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17

    4.3.全部导入数据

    [root@node1 bin]# ./sqoop  import \
    --connect jdbc:mysql://192.168.141.155/sqoop  \
    --username root --password hadoop  \
    --table student \
    --target-dir /user/sqoop_mysql_to_hdfs  \
    --delete-target-dir  \
    --num-mappers 1 \
    --fields-terminated-by "\t"
    
    Warning: /soft/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../hbase does not exist! HBase imports will fail.
    Please set $HBASE_HOME to the root of your HBase installation.
    Warning: /soft/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../hcatalog does not exist! HCatalog jobs will fail.
    Please set $HCAT_HOME to the root of your HCatalog installation.
    Warning: /soft/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../accumulo does not exist! Accumulo imports will fail.
    Please set $ACCUMULO_HOME to the root of your Accumulo installation.
    Warning: /soft/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../zookeeper does not exist! Accumulo imports will fail.
    Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
    2022-11-03 23:15:35,237 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
    2022-11-03 23:15:35,304 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
    2022-11-03 23:15:35,389 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
    2022-11-03 23:15:35,389 INFO tool.CodeGenTool: Beginning code generation
    2022-11-03 23:15:35,652 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
    2022-11-03 23:15:35,671 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `student` AS t LIMIT 1
    2022-11-03 23:15:35,676 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /soft/hadoop-3.3.0
    注: /tmp/sqoop-root/compile/14cc91c986370cd735b8a241a21c6874/student.java使用或覆盖了已过时的 API。
    注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
    2022-11-03 23:15:36,758 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-root/compile/14cc91c986370cd735b8a241a21c6874/student.jar
    2022-11-03 23:15:37,458 INFO tool.ImportTool: Destination directory /user/sqoop_mysql_to_hdfs is not present, hence not deleting.
    2022-11-03 23:15:37,458 WARN manager.MySQLManager: It looks like you are importing from mysql.
    2022-11-03 23:15:37,458 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
    2022-11-03 23:15:37,458 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
    2022-11-03 23:15:37,458 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
    2022-11-03 23:15:37,462 INFO mapreduce.ImportJobBase: Beginning import of student
    2022-11-03 23:15:37,463 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address
    2022-11-03 23:15:37,472 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
    2022-11-03 23:15:37,488 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
    2022-11-03 23:15:37,573 INFO client.DefaultNoHARMFailoverProxyProvider: Connecting to ResourceManager at node1/192.168.141.151:8032
    2022-11-03 23:15:37,910 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1663767415605_0024
    2022-11-03 23:15:39,401 INFO db.DBInputFormat: Using read commited transaction isolation
    2022-11-03 23:15:39,455 INFO mapreduce.JobSubmitter: number of splits:1
    2022-11-03 23:15:39,566 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1663767415605_0024
    2022-11-03 23:15:39,566 INFO mapreduce.JobSubmitter: Executing with tokens: []
    2022-11-03 23:15:39,726 INFO conf.Configuration: resource-types.xml not found
    2022-11-03 23:15:39,726 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
    2022-11-03 23:15:39,996 INFO impl.YarnClientImpl: Submitted application application_1663767415605_0024
    2022-11-03 23:15:40,033 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1663767415605_0024/
    2022-11-03 23:15:40,034 INFO mapreduce.Job: Running job: job_1663767415605_0024
    2022-11-03 23:15:49,194 INFO mapreduce.Job: Job job_1663767415605_0024 running in uber mode : false
    2022-11-03 23:15:49,195 INFO mapreduce.Job:  map 0% reduce 0%
    2022-11-03 23:16:05,319 INFO mapreduce.Job:  map 100% reduce 0%
    2022-11-03 23:16:05,331 INFO mapreduce.Job: Job job_1663767415605_0024 completed successfully
    2022-11-03 23:16:05,421 INFO mapreduce.Job: Counters: 33
    	File System Counters
    		FILE: Number of bytes read=0
    		FILE: Number of bytes written=273519
    		FILE: Number of read operations=0
    		FILE: Number of large read operations=0
    		FILE: Number of write operations=0
    		HDFS: Number of bytes read=87
    		HDFS: Number of bytes written=737
    		HDFS: Number of read operations=6
    		HDFS: Number of large read operations=0
    		HDFS: Number of write operations=2
    		HDFS: Number of bytes read erasure-coded=0
    	Job Counters 
    		Launched map tasks=1
    		Other local map tasks=1
    		Total time spent by all maps in occupied slots (ms)=13522
    		Total time spent by all reduces in occupied slots (ms)=0
    		Total time spent by all map tasks (ms)=13522
    		Total vcore-milliseconds taken by all map tasks=13522
    		Total megabyte-milliseconds taken by all map tasks=13846528
    	Map-Reduce Framework
    		Map input records=44
    		Map output records=44
    		Input split bytes=87
    		Spilled Records=0
    		Failed Shuffles=0
    		Merged Map outputs=0
    		GC time elapsed (ms)=67
    		CPU time spent (ms)=810
    		Physical memory (bytes) snapshot=199610368
    		Virtual memory (bytes) snapshot=2785349632
    		Total committed heap usage (bytes)=148897792
    		Peak Map Physical memory (bytes)=199610368
    		Peak Map Virtual memory (bytes)=2785349632
    	File Input Format Counters 
    		Bytes Read=0
    	File Output Format Counters 
    		Bytes Written=737
    2022-11-03 23:16:05,428 INFO mapreduce.ImportJobBase: Transferred 737 bytes in 27.9275 seconds (26.3898 bytes/sec)
    2022-11-03 23:16:05,433 INFO mapreduce.ImportJobBase: Retrieved 44 records.
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80
    • 81
    • 82
    • 83
    • 84
    • 85
    • 86
    • 87
    • 88
    • 89
    • 90
    • 91
    • 92
    • 93

    从日志看是到处成功

    HDFS 验证导入的数据

    在这里插入图片描述

    导入数据成功 ~

    4.4 使用查询导出

    [root@node1 bin]# ./sqoop  import \
    --connect jdbc:mysql://192.168.141.155/sqoop  \
    --username root --password hadoop  \
    --target-dir /user/sqoop_mysql_to_hdfs  \
    --delete-target-dir  \
    --num-mappers 1 \
    --fields-terminated-by "\t"  \
    --query 'select id , name from student ;'
    
    
    [root@node1 bin]# ./sqoop  import \
    > --connect jdbc:mysql://192.168.141.155/sqoop  \
    > --username root --password hadoop  \
    > --target-dir /user/sqoop_mysql_to_hdfs  \
    > --delete-target-dir  \
    > --num-mappers 1 \
    > --fields-terminated-by "\t"  \
    > --query 'select id , name from student ;'
    Warning: /soft/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../hbase does not exist! HBase imports will fail.
    Please set $HBASE_HOME to the root of your HBase installation.
    Warning: /soft/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../hcatalog does not exist! HCatalog jobs will fail.
    Please set $HCAT_HOME to the root of your HCatalog installation.
    Warning: /soft/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../accumulo does not exist! Accumulo imports will fail.
    Please set $ACCUMULO_HOME to the root of your Accumulo installation.
    Warning: /soft/sqoop-1.4.6.bin__hadoop-2.0.4-alpha/bin/../../zookeeper does not exist! Accumulo imports will fail.
    Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
    2022-11-03 23:33:57,756 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
    2022-11-03 23:33:57,822 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
    2022-11-03 23:33:57,897 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
    2022-11-03 23:33:57,897 INFO tool.CodeGenTool: Beginning code generation
    2022-11-03 23:33:57,902 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: Query [select id , name from student ;] must contain '$CONDITIONS' in WHERE clause.
    	at org.apache.sqoop.manager.ConnManager.getColumnTypes(ConnManager.java:300)
    	at org.apache.sqoop.orm.ClassWriter.getColumnTypes(ClassWriter.java:1833)
    	at org.apache.sqoop.orm.ClassWriter.generate(ClassWriter.java:1645)
    	at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:107)
    	at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
    	at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
    	at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
    	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
    	at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
    	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
    	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:227)
    	at org.apache.sqoop.Sqoop.main(Sqoop.java:236)
    
    -----修改为
    
    
    [root@node1 bin]# ./sqoop  import --connect jdbc:mysql://192.168.141.155/sqoop  --username root --password hadoop  --target-dir /user/sqoop_mysql_to_hdfs  --delete-target-dir  --num-mappers 1 --fields-terminated-by "\t"  --query 'select id , name from student where 1=1 and $CONDITIONS;'
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49

    注意: query 里面的SQL 使用单引号 ' ,使用双引号 " 需要手动转义

    验证: 导入成功

    在这里插入图片描述

    4.5 指定导出列

    --columns id,name \

    [root@node1 bin]# ./sqoop  import \
    --connect jdbc:mysql://192.168.141.155/sqoop  \
    --username root --password hadoop  \
    --table student \
    --columns id,name  \
    --target-dir /user/sqoop_mysql_to_hdfs  \
    --delete-target-dir  \
    --num-mappers 1 \
    --fields-terminated-by "\t"
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9

    4.6指定where 条件

    [root@node1 bin]# ./sqoop  import \
    --connect jdbc:mysql://192.168.141.155/sqoop  \
    --username root --password hadoop  \
    --table student \
    --columns id,name  \
    --where "id=1"  \
    --target-dir /user/sqoop_mysql_to_hdfs  \
    --delete-target-dir  \
    --num-mappers 1 \
    --fields-terminated-by "\t"
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10

    4.7 导出到hive表

    [root@node1 bin]# ./sqoop  import \
    --connect jdbc:mysql://192.168.141.155/sqoop  \
    --username root --password hadoop  \
    --table student \
    --columns id,name  \
    --hive-import  \
    --hive-overwrite \
    --num-mappers 1 \
    --fields-terminated-by "\t" \
    --hive-table sqooptest_mysql_tohive 
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10

    验证: 导入成功~

    在这里插入图片描述

    5.数据导出 export

    导出: HDFS/HIVE/HBase 到关系型数据库 MySQL Oracle

    创建一个数据库

     create table  student2 like student;
    
    • 1
    [root@node1 bin]# ./sqoop  export \
    --connect "jdbc:mysql://192.168.141.155/sqoop?useUnicode=true&characterEncoding=utf-8"  \
    --username root --password hadoop  \
    --table student2 \
    --columns id,name  \
    --num-mappers 1 \
    --input-fields-terminated-by "\t" \
    --export-dir /user/sqoop_mysql_to_hdfs 
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8

    注意:?useUnicode=true&characterEncoding=utf-8 不加这个MySQL 数据库 数据可能会乱码

    验证:正确

    6.其他

    输出压缩文件

    只需将参数放到 Sqoop 命令字符串中即可。

    --compression-codec 
    
    • 1

    snappy压缩

    --compression-codec org.apache.hadoop.io.compress.SnappyCodec
    
    • 1

    Gzip压缩

    --compression-codec org.apache.hadoop.io.compress.GzipCodec
    
    • 1

    Bzip压缩

    --compression-codec org.apache.hadoop.io.compress.BZip2Codec
    
    • 1
  • 相关阅读:
    springboot物联网云平台源码 java物联网系统源码
    【云原生--Kubernetes】kubectl命令详解
    基于Java实现的仓库管理系统设计与实现(源码+lw+部署文档+讲解等)
    【数据结构】二叉树—二叉树镜面反转
    扫雷?拿来吧你(递归展开+坐标标记)
    判断两个对象是否不相等operator.ne()
    中断:PL硬中断,基地址,优先级。
    Day2讲课习题题解
    【剑指offer&牛客101】中那些高频笔试,面试题——链表篇
    阿里内部强推的SpringScurity实战笔记,与君共分享
  • 原文地址:https://blog.csdn.net/weixin_44244088/article/details/127681271