• hive语法之insert overwrite/insert into


    LanguageManual DML - Apache Hive - Apache Software Foundation

    Standard syntax:

    INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1 FROM from_statement;

    INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1 FROM from_statement;

    Hive extension (multiple inserts):

    FROM from_statement

    INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...) [IF NOT EXISTS]] select_statement1

    [INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2]

    [INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2] ...;

    FROM from_statement

    INSERT INTO TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1

    [INSERT INTO TABLE tablename2 [PARTITION ...] select_statement2]

    [INSERT OVERWRITE TABLE tablename2 [PARTITION ... [IF NOT EXISTS]] select_statement2] ...;

    Hive extension (dynamic partition inserts):

    INSERT OVERWRITE TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;

    INSERT INTO TABLE tablename PARTITION (partcol1[=val1], partcol2[=val2] ...) select_statement FROM from_statement;

     语法很简单,说明也很简单, insert into 是追加,insert overwrite 是覆盖不过只会覆盖对应分区或者全表数据

    关键点在下面

    • INSERT OVERWRITE will overwrite any existing data in the table or partition
      • unless IF NOT EXISTS is provided for a partition (as of Hive 0.9.0).
      • As of Hive 2.3.0 (HIVE-15880), if the table has TBLPROPERTIES ("auto.purge"="true") the previous data of the table is not moved to Trash when INSERT OVERWRITE query is run against the table. This functionality is applicable only for managed tables (see managed tables) and is turned off when "auto.purge" property is unset or set to false.

    这里提到了hive2.3.0之后 如果表属性有("auto.purge"="true"),当insert overwrite的时候,之前的数据是不会移动到垃圾箱的(会移动哪里呢,实际就是会直接删除,垃圾箱回收站都不放?),也就是说,哎呀 我不小心删除了,想恢复都不可以的。

    • INSERT INTO will append to the table or partition, keeping the existing data intact. (Note: INSERT INTO syntax is only available starting in version 0.8.)
      • As of Hive 0.13.0, a table can be made immutable by creating it with TBLPROPERTIES ("immutable"="true"). The default is "immutable"="false".
        INSERT INTO behavior into an immutable table is disallowed if any data is already present, although INSERT INTO still works if the immutable table is empty. The behavior of INSERT OVERWRITE is not affected by the "immutable" table property.
        An immutable table is protected against accidental updates due to a script loading data into it being run multiple times by mistake. The first insert into an immutable table succeeds and successive inserts fail, resulting in only one set of data in the table, instead of silently succeeding with multiple copies of the data in the table. 

    这里提到了一个比较有意思的参数("immutable"="true") 这个单词是不可变的意思,insert into本身是追加的意思,加了这个参数后, 就代表数据/分区只能被插入一次。(当然overwrite 这个表 也可以)

    • Inserts can be done to a table or a partition. If the table is partitioned, then one must specify a specific partition of the table by specifying values for all of the partitioning columns. If hive.typecheck.on.insert is set to true, these values are validated, converted and normalized to conform to their column types (Hive 0.12.0 onward). 

    类型检查和自动转换

    其余的都没啥说的。。

    动态分区插入参数

    hive.exec.dynamic.partition

    true

    Needs to be set to true to enable dynamic partition inserts

    hive.exec.dynamic.partition.mode

    strict

    In strict mode, the user must specify at least one static partition in case the user accidentally overwrites all partitions, in nonstrict mode all partitions are allowed to be dynamic

    hive.exec.max.dynamic.partitions.pernode

    100

    Maximum number of dynamic partitions allowed to be created in each mapper/reducer node

    hive.exec.max.dynamic.partitions

    1000

    Maximum number of dynamic partitions allowed to be created in total

    hive.exec.max.created.files

    100000

    Maximum number of HDFS files created by all mappers/reducers in a MapReduce job

    hive.error.on.empty.partition

    false

    Whether to throw an exception if dynamic partition insert generates empty results

     ---其实这个没啥说的主要是我在hive3.1看到表创建自动带purge=ture ,以为会有详细的解释的。

    --后续待补充

  • 相关阅读:
    c++练习题(4)
    知识图谱应用---智慧金融
    【蓝桥杯】C语言常见高级算法
    Selenium 浏览器坐标转桌面坐标
    重生之 SpringBoot3 入门保姆级学习(19、场景整合 CentOS7 Docker 的安装)
    FPGA:实现快速傅里叶变换(FFT)算法
    CentOS7 安装 Nacos过程
    用html做一个漂亮的网站【茶文化12页】期末网页制作 HTML+CSS网页设计实例 企业文化网站制作
    C语言试题124之给一个不多于 5 位的正整数,要求:一、求它是几位数,二、逆序打印出各位数字
    LuatOS-SOC接口文档(air780E)-- iconv - iconv操作
  • 原文地址:https://blog.csdn.net/cclovezbf/article/details/126708506