• KingbaseES例程之拥有大量索引的表导入数据


    概述

    如何快速插入大量数据比如几千万上亿的带索引的数据表。

    数据准备

    准备一个拥有二十个索引的数据表。

    1. kingbase=# \d+ bigtab
    2. Table "kingbase.bigtab"
    3. Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
    4. --------+---------+-----------+----------+---------+----------+--------------+-------------
    5. id | integer | | | | plain | |
    6. c01 | integer | | | | plain | |
    7. c02 | integer | | | | plain | |
    8. c03 | integer | | | | plain | |
    9. c04 | integer | | | | plain | |
    10. c05 | integer | | | | plain | |
    11. c06 | integer | | | | plain | |
    12. c07 | integer | | | | plain | |
    13. c08 | integer | | | | plain | |
    14. c09 | integer | | | | plain | |
    15. c10 | integer | | | | plain | |
    16. c11 | integer | | | | plain | |
    17. c12 | integer | | | | plain | |
    18. c13 | integer | | | | plain | |
    19. c14 | integer | | | | plain | |
    20. c15 | integer | | | | plain | |
    21. c16 | integer | | | | plain | |
    22. c17 | integer | | | | plain | |
    23. c18 | integer | | | | plain | |
    24. c19 | integer | | | | plain | |
    25. c20 | integer | | | | plain | |
    26. c21 | integer | | | | plain | |
    27. c22 | integer | | | | plain | |
    28. c23 | integer | | | | plain | |
    29. c24 | integer | | | | plain | |
    30. c25 | integer | | | | plain | |
    31. c26 | integer | | | | plain | |
    32. c27 | integer | | | | plain | |
    33. c28 | integer | | | | plain | |
    34. c29 | integer | | | | plain | |
    35. t01 | text | | | | extended | |
    36. t02 | text | | | | extended | |
    37. t03 | text | | | | extended | |
    38. t04 | text | | | | extended | |
    39. t05 | text | | | | extended | |
    40. t06 | text | | | | extended | |
    41. t07 | text | | | | extended | |
    42. t08 | text | | | | extended | |
    43. t09 | text | | | | extended | |
    44. t10 | text | | | | extended | |
    45. t11 | text | | | | extended | |
    46. t12 | text | | | | extended | |
    47. t13 | text | | | | extended | |
    48. t14 | text | | | | extended | |
    49. t15 | text | | | | extended | |
    50. t16 | text | | | | extended | |
    51. t17 | text | | | | extended | |
    52. t18 | text | | | | extended | |
    53. t19 | text | | | | extended | |
    54. t20 | text | | | | extended | |
    55. Indexes:
    56. "bigtab_i01" btree (c01)
    57. "bigtab_i02" btree (c02)
    58. "bigtab_i03" btree (c03)
    59. "bigtab_i04" btree (c04)
    60. "bigtab_i05" btree (c05)
    61. "bigtab_i06" btree (c06)
    62. "bigtab_i07" btree (c07)
    63. "bigtab_i08" btree (c08)
    64. "bigtab_i09" btree (c09)
    65. "bigtab_i10" btree (c10)
    66. "bigtab_i11" btree (c11)
    67. "bigtab_i12" btree (c12)
    68. "bigtab_i13" btree (c13)
    69. "bigtab_i14" btree (c14)
    70. "bigtab_i15" btree (c15)
    71. "bigtab_i16" btree (c16)
    72. "bigtab_i17" btree (c17)
    73. "bigtab_i18" btree (c18)
    74. "bigtab_i19" btree (c19)
    75. "bigtab_i20" btree (c20)
    76. Access method: heap
    77. kingbase=#

    方法一:直接插入海量数据,自动维护索引

    1. kingbase=#
    2. kingbase=# insert into bigtab
    3. kingbase-# select id
    4. kingbase-# , (random() * 100)::int + 1000 c01
    5. kingbase-# , (random() * 200)::int + 1000 c02
    6. kingbase-# , (random() * 300)::int + 10000 c03
    7. kingbase-# , (random() * 400)::int + 10000 c04
    8. kingbase-# , (random() * 500)::int + 10000 c05
    9. kingbase-# , (random() * 600)::int + 10000 c06
    10. kingbase-# , (random() * 700)::int + 10000 c07
    11. kingbase-# , (random() * 800)::int + 10000 c08
    12. kingbase-# , (random() * 900)::int + 10000 c09
    13. kingbase-# , (random() * 1000)::int + 10000 c10
    14. kingbase-# , (random() * 2000)::int + 10000 c11
    15. kingbase-# , (random() * 3000)::int + 10000 c12
    16. kingbase-# , (random() * 4000)::int + 10000 c13
    17. kingbase-# , (random() * 5000)::int + 10000 c14
    18. kingbase-# , (random() * 6000)::int + 10000 c15
    19. kingbase-# , (random() * 7000)::int + 10000 c16
    20. kingbase-# , (random() * 8000)::int + 10000 c17
    21. kingbase-# , (random() * 9000)::int + 10000 c18
    22. kingbase-# , (random() * 10000)::int + 10000 c19
    23. kingbase-# , (random() * 20000)::int + 10000 c20
    24. kingbase-# , (random() * 30000)::int + 10000 c21
    25. kingbase-# , (random() * 40000)::int + 10000 c22
    26. kingbase-# , (random() * 50000)::int + 10000 c23
    27. kingbase-# , (random() * 60000)::int + 10000 c24
    28. kingbase-# , (random() * 70000)::int + 10000 c25
    29. kingbase-# , (random() * 80000)::int + 10000 c26
    30. kingbase-# , (random() * 90000)::int + 10000 c27
    31. kingbase-# , (random() * 10000)::int + 10000 c28
    32. kingbase-# , (random() * 10000)::int + 10000 c29
    33. kingbase-# , md5(random()::text) t01
    34. kingbase-# , md5(random()::text) t02
    35. kingbase-# , md5(random()::text) t03
    36. kingbase-# , md5(random()::text) t04
    37. kingbase-# , md5(random()::text) t05
    38. kingbase-# , md5(random()::text) t06
    39. kingbase-# , md5(random()::text) t07
    40. kingbase-# , md5(random()::text) t08
    41. kingbase-# , md5(random()::text) t09
    42. kingbase-# , md5(random()::text) t10
    43. kingbase-# , md5(random()::text) t11
    44. kingbase-# , md5(random()::text) t12
    45. kingbase-# , md5(random()::text) t13
    46. kingbase-# , md5(random()::text) t14
    47. kingbase-# , md5(random()::text) t15
    48. kingbase-# , md5(random()::text) t16
    49. kingbase-# , md5(random()::text) t17
    50. kingbase-# , md5(random()::text) t18
    51. kingbase-# , md5(random()::text) t19
    52. kingbase-# , md5(random()::text) t20
    53. kingbase-# from generate_series(1, 2000000) id;
    54. INSERT 0 2000000
    55. Time: 299331.143 ms (04:59.331)

    优点: 语句单一;自动维护索引;自动支持之后的索引。

    缺点: 逐行维护索引,造成用时较长。

    方法二:删除索引,插入海量数据,再创建索引

    1. kingbase=#
    2. kingbase=# do
    3. kingbase-# $$
    4. kingbase$# begin
    5. kingbase$# drop index bigtab_i01;
    6. kingbase$# drop index bigtab_i02;
    7. kingbase$# drop index bigtab_i03;
    8. kingbase$# drop index bigtab_i04;
    9. kingbase$# drop index bigtab_i05;
    10. kingbase$# drop index bigtab_i06;
    11. kingbase$# drop index bigtab_i07;
    12. kingbase$# drop index bigtab_i08;
    13. kingbase$# drop index bigtab_i09;
    14. kingbase$# drop index bigtab_i10;
    15. kingbase$# drop index bigtab_i11;
    16. kingbase$# drop index bigtab_i12;
    17. kingbase$# drop index bigtab_i13;
    18. kingbase$# drop index bigtab_i14;
    19. kingbase$# drop index bigtab_i15;
    20. kingbase$# drop index bigtab_i16;
    21. kingbase$# drop index bigtab_i17;
    22. kingbase$# drop index bigtab_i18;
    23. kingbase$# drop index bigtab_i19;
    24. kingbase$# drop index bigtab_i20;
    25. kingbase$#
    26. kingbase$# insert into bigtab
    27. kingbase$# select id
    28. kingbase$# , (random() * 100)::int + 1000 c01
    29. kingbase$# , (random() * 200)::int + 1000 c02
    30. kingbase$# , (random() * 300)::int + 10000 c03
    31. kingbase$# , (random() * 400)::int + 10000 c04
    32. kingbase$# , (random() * 500)::int + 10000 c05
    33. kingbase$# , (random() * 600)::int + 10000 c06
    34. kingbase$# , (random() * 700)::int + 10000 c07
    35. kingbase$# , (random() * 800)::int + 10000 c08
    36. kingbase$# , (random() * 900)::int + 10000 c09
    37. kingbase$# , (random() * 1000)::int + 10000 c10
    38. kingbase$# , (random() * 2000)::int + 10000 c11
    39. kingbase$# , (random() * 3000)::int + 10000 c12
    40. kingbase$# , (random() * 4000)::int + 10000 c13
    41. kingbase$# , (random() * 5000)::int + 10000 c14
    42. kingbase$# , (random() * 6000)::int + 10000 c15
    43. kingbase$# , (random() * 7000)::int + 10000 c16
    44. kingbase$# , (random() * 8000)::int + 10000 c17
    45. kingbase$# , (random() * 9000)::int + 10000 c18
    46. kingbase$# , (random() * 10000)::int + 10000 c19
    47. kingbase$# , (random() * 20000)::int + 10000 c20
    48. kingbase$# , (random() * 30000)::int + 10000 c21
    49. kingbase$# , (random() * 40000)::int + 10000 c22
    50. kingbase$# , (random() * 50000)::int + 10000 c23
    51. kingbase$# , (random() * 60000)::int + 10000 c24
    52. kingbase$# , (random() * 70000)::int + 10000 c25
    53. kingbase$# , (random() * 80000)::int + 10000 c26
    54. kingbase$# , (random() * 90000)::int + 10000 c27
    55. kingbase$# , (random() * 10000)::int + 10000 c28
    56. kingbase$# , (random() * 10000)::int + 10000 c29
    57. kingbase$# , md5(random()::text) t01
    58. kingbase$# , md5(random()::text) t02
    59. kingbase$# , md5(random()::text) t03
    60. kingbase$# , md5(random()::text) t04
    61. kingbase$# , md5(random()::text) t05
    62. kingbase$# , md5(random()::text) t06
    63. kingbase$# , md5(random()::text) t07
    64. kingbase$# , md5(random()::text) t08
    65. kingbase$# , md5(random()::text) t09
    66. kingbase$# , md5(random()::text) t10
    67. kingbase$# , md5(random()::text) t11
    68. kingbase$# , md5(random()::text) t12
    69. kingbase$# , md5(random()::text) t13
    70. kingbase$# , md5(random()::text) t14
    71. kingbase$# , md5(random()::text) t15
    72. kingbase$# , md5(random()::text) t16
    73. kingbase$# , md5(random()::text) t17
    74. kingbase$# , md5(random()::text) t18
    75. kingbase$# , md5(random()::text) t19
    76. kingbase$# , md5(random()::text) t20
    77. kingbase$# from generate_series(1, 2000000) id;
    78. kingbase$#
    79. kingbase$# create index bigtab_i01 on bigtab (c01);
    80. kingbase$# create index bigtab_i02 on bigtab (c02);
    81. kingbase$# create index bigtab_i03 on bigtab (c03);
    82. kingbase$# create index bigtab_i04 on bigtab (c04);
    83. kingbase$# create index bigtab_i05 on bigtab (c05);
    84. kingbase$# create index bigtab_i06 on bigtab (c06);
    85. kingbase$# create index bigtab_i07 on bigtab (c07);
    86. kingbase$# create index bigtab_i08 on bigtab (c08);
    87. kingbase$# create index bigtab_i09 on bigtab (c09);
    88. kingbase$# create index bigtab_i10 on bigtab (c10);
    89. kingbase$# create index bigtab_i11 on bigtab (c11);
    90. kingbase$# create index bigtab_i12 on bigtab (c12);
    91. kingbase$# create index bigtab_i13 on bigtab (c13);
    92. kingbase$# create index bigtab_i14 on bigtab (c14);
    93. kingbase$# create index bigtab_i15 on bigtab (c15);
    94. kingbase$# create index bigtab_i16 on bigtab (c16);
    95. kingbase$# create index bigtab_i17 on bigtab (c17);
    96. kingbase$# create index bigtab_i18 on bigtab (c18);
    97. kingbase$# create index bigtab_i19 on bigtab (c19);
    98. kingbase$# create index bigtab_i20 on bigtab (c20);
    99. kingbase$#
    100. kingbase$# end;
    101. kingbase$# $$;
    102. ANONYMOUS BLOCK
    103. Time: 83069.170 ms (01:23.069)

    优点: 批量维护索引,用时最短。

    缺点: 语句复杂且固化;手动维护删建索引语句;不支持之后的索引。

    方法三:禁止索引更改,插入海量数据,重建表的全部索引

    1. kingbase=# do
    2. kingbase-# $$
    3. kingbase$# begin
    4. kingbase$#
    5. kingbase$# update pg_index
    6. kingbase$# set indislive= false
    7. kingbase$# where indrelid = 'bigtab'::regclass;
    8. kingbase$#
    9. kingbase$# insert into bigtab
    10. kingbase$# select id
    11. kingbase$# , (random() * 100)::int + 1000 c01
    12. kingbase$# , (random() * 200)::int + 1000 c02
    13. kingbase$# , (random() * 300)::int + 10000 c03
    14. kingbase$# , (random() * 400)::int + 10000 c04
    15. kingbase$# , (random() * 500)::int + 10000 c05
    16. kingbase$# , (random() * 600)::int + 10000 c06
    17. kingbase$# , (random() * 700)::int + 10000 c07
    18. kingbase$# , (random() * 800)::int + 10000 c08
    19. kingbase$# , (random() * 900)::int + 10000 c09
    20. kingbase$# , (random() * 1000)::int + 10000 c10
    21. kingbase$# , (random() * 2000)::int + 10000 c11
    22. kingbase$# , (random() * 3000)::int + 10000 c12
    23. kingbase$# , (random() * 4000)::int + 10000 c13
    24. kingbase$# , (random() * 5000)::int + 10000 c14
    25. kingbase$# , (random() * 6000)::int + 10000 c15
    26. kingbase$# , (random() * 7000)::int + 10000 c16
    27. kingbase$# , (random() * 8000)::int + 10000 c17
    28. kingbase$# , (random() * 9000)::int + 10000 c18
    29. kingbase$# , (random() * 10000)::int + 10000 c19
    30. kingbase$# , (random() * 20000)::int + 10000 c20
    31. kingbase$# , (random() * 30000)::int + 10000 c21
    32. kingbase$# , (random() * 40000)::int + 10000 c22
    33. kingbase$# , (random() * 50000)::int + 10000 c23
    34. kingbase$# , (random() * 60000)::int + 10000 c24
    35. kingbase$# , (random() * 70000)::int + 10000 c25
    36. kingbase$# , (random() * 80000)::int + 10000 c26
    37. kingbase$# , (random() * 90000)::int + 10000 c27
    38. kingbase$# , (random() * 10000)::int + 10000 c28
    39. kingbase$# , (random() * 10000)::int + 10000 c29
    40. kingbase$# , md5(random()::text) t01
    41. kingbase$# , md5(random()::text) t02
    42. kingbase$# , md5(random()::text) t03
    43. kingbase$# , md5(random()::text) t04
    44. kingbase$# , md5(random()::text) t05
    45. kingbase$# , md5(random()::text) t06
    46. kingbase$# , md5(random()::text) t07
    47. kingbase$# , md5(random()::text) t08
    48. kingbase$# , md5(random()::text) t09
    49. kingbase$# , md5(random()::text) t10
    50. kingbase$# , md5(random()::text) t11
    51. kingbase$# , md5(random()::text) t12
    52. kingbase$# , md5(random()::text) t13
    53. kingbase$# , md5(random()::text) t14
    54. kingbase$# , md5(random()::text) t15
    55. kingbase$# , md5(random()::text) t16
    56. kingbase$# , md5(random()::text) t17
    57. kingbase$# , md5(random()::text) t18
    58. kingbase$# , md5(random()::text) t19
    59. kingbase$# , md5(random()::text) t20
    60. kingbase$# from generate_series(1, 2000000) id;
    61. kingbase$#
    62. kingbase$# update pg_index
    63. kingbase$# set indislive= true
    64. kingbase$# where indrelid = 'bigtab'::regclass;
    65. kingbase$#
    66. kingbase$# analyse bigtab;
    67. kingbase$# reindex table bigtab;
    68. kingbase$#
    69. kingbase$# end;
    70. kingbase$# $$;
    71. ANONYMOUS BLOCK
    72. Time: 87110.126 ms (01:27.110)

    优点: 批量维护索引,用时短;语句固定模式;自动维护索引;支持之后的索引。

    缺点: 多个SQL语句,不易嵌入语句块。

    最后的话

    reindex table 的执行依赖统计信息,所以需要执行 analyse table ,才能成功重建表的全部可更新的索引。

    reindex index 不受上述因素的影响,可以强制重建不更新的索引,并自动修改 indislive= true。

    如果在REINDEX期间出现异常,那么所有需要rebuild的索引的状态都是invalid,意味着这些索引仍然占用空间,定义仍在但不能使用。

    避免REINDEX期间出现异常,可以在索引更新操作时,跳过唯一索引和外键依赖索引等。

  • 相关阅读:
    Python利用PIL将png图像转jpg图像
    Redis之Lua脚本
    新加坡大带宽服务器托管优势
    SpringBoot系列之搭建WebSocket应用
    Java练习题——抽象类、方法以及接口
    Java 腾讯邮箱发送邮件工具类(单人+多人)
    linux 设置开机启动
    STM32CubeMX教程12 DMA 直接内存读取
    深入理解重写equals()方法
    22年BATJ大厂必问面试题(复盘):JVM+微服务+多线程+锁+高并发
  • 原文地址:https://blog.csdn.net/lyu1026/article/details/126562053