• PostgreSQL 与 Oracle 访问分区表执行计划差异


    熟悉Oracle 的DBA都知道,Oracle 访问分区表时,对于没有提供分区条件的,也就是在无法使用分区剪枝情况下,优化器会根据全局的统计信息制定执行计划,该执行计划针对所有分区适用。在分析该方法利弊之前,我们先来看个例子,以确保对分区表的执行计划有所了解:

    一、Oracle 

    构建数据:

    复制代码
    create table part_tab01(part_key char(1),state char(1),desc_content varchar(4000))
    partition by range(part_key)
    (
      partition part_0 values less than(1),
      partition part_1 values less than(2)
    );
    
    insert into part_tab01 select '0','0',rpad('a',1000,'a') from dba_objects where rownum<10001;
    insert into part_tab01 select '1','1',rpad('a',1000,'a') from dba_objects where rownum<10001;
    insert into part_tab01 select * from part_tab01;
    insert into part_tab01 select * from part_tab01;
    insert into part_tab01 select * from part_tab01;
    insert into part_tab01 select * from part_tab01;
    insert into part_tab01 select * from part_tab01;
    insert into part_tab01 select * from part_tab01;
    insert into part_tab01 select '1','0',rpad('a',1000,'a') from dba_objects where rownum<11;
    insert into part_tab01 select '0','1',rpad('a',1000,'a') from dba_objects where rownum<11;
    
    create index idx_part_tab01_state on part_tab01(state) local;
    复制代码

    从数据的分布可以得出结论,最优的访问方法应该是:对于不同的分区、访问不同的state 值,应采用不同的表访问方法。

    实际Oracle 执行计划如下:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    SQL> select * from part_tab01 where state='1';
     
    640010 rows selected.
     
     
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 4116343635
     
    --------------------------------------------------------------------------------------------------
    | Id  | Operation           | Name       | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
    --------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT    |            |   640K|   613M| 49576   (1)| 00:00:02 |       |       |
    |   1 |  PARTITION RANGE ALL|            |   640K|   613M| 49576   (1)| 00:00:02 |     1 |     2 |
    |*  2 |   TABLE ACCESS FULL | PART_TAB01 |   640K|   613M| 49576   (1)| 00:00:02 |     1 |     2 |
    --------------------------------------------------------------------------------------------------

    结论:在没有指定分区键的情况下,根据全局的统计信息,采用全表访问。
    SQL> select * from part_tab01 where state='1' and part_key='0';
     
    10 rows selected.
     
     
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 1952449058
     
    -----------------------------------------------------------------------------------------------------------------------------------
    | Id  | Operation                                  | Name                 | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
    -----------------------------------------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT                           |                      |    10 | 10050 |     5   (0)| 00:00:01 |       |       |
    |   1 |  PARTITION RANGE SINGLE                    |                      |    10 | 10050 |     5   (0)| 00:00:01 |     1 |     1 |
    |*  2 |   TABLE ACCESS BY LOCAL INDEX ROWID BATCHED| PART_TAB01           |    10 | 10050 |     5   (0)| 00:00:01 |     1 |     1 |
    |*  3 |    INDEX RANGE SCAN                        | IDX_PART_TAB01_STATE |    10 |       |     3   (0)| 00:00:01 |     1 |     1 |
    -----------------------------------------------------------------------------------------------------------------------------------
     
    SQL> select * from part_tab01 where state='1' and part_key='1';
     
    640000 rows selected.
     
     
    Execution Plan
    ----------------------------------------------------------
    Plan hash value: 4278184147
     
    -----------------------------------------------------------------------------------------------------
    | Id  | Operation              | Name       | Rows  | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
    -----------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT       |            |   640K|   613M| 24793   (1)| 00:00:01 |       |       |
    |   1 |  PARTITION RANGE SINGLE|            |   640K|   613M| 24793   (1)| 00:00:01 |     2 |     2 |
    |*  2 |   TABLE ACCESS FULL    | PART_TAB01 |   640K|   613M| 24793   (1)| 00:00:01 |     2 |     2 |
    -----------------------------------------------------------------------------------------------------

    结论:在提供分区条件的情况下,优化器可以根据不同的分区统计信息给出不同的执行计划。

    可以看到,在没有分区条件的情况下,Oracle 是针对全表采用统一的执行。实际针对该SQL,最好的访问方法应该是:part_0 全表,part_1 索引

    二、PostgreSQL 执行计划

    构建数据:

    复制代码
    create table part_tab01(part_key char(1),state char(1),desc_content text)
    partition by range(part_key)
    (
      partition part_0 values less than(1),
      partition part_1 values less than(2)
    );
    
    insert into part_tab01 select '0','0',repeat('a',1000) from generate_series(1,1000000);
    insert into part_tab01 select '0','1',repeat('b',1000) from generate_series(1,10);
    insert into part_tab01 select '1','1',repeat('a',1000) from generate_series(1,1000000);
    insert into part_tab01 select '1','0',repeat('b',1000) from generate_series(1,10);
    
    create index idx_part_tab01_state on part_tab01(state);
    复制代码

     执行计划:即使没有提供分区条,针对不同分区,有不同的执行计划。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    test=# explain analyze select * from part_tab01 where state='1';
                                                                           QUERY PLAN
    ---------------------------------------------------------------------------------------------------------------------------------------------------------
     Append  (cost=0.42..160363.43 rows=1000000 width=1008) (actual time=0.022..484.005 rows=1000010 loops=1)
       ->  Index Scan using part_tab01_part_0_state_idx on part_tab01_part_0  (cost=0.42..4.44 rows=1 width=1008) (actual time=0.022..0.024 rows=10 loops=1)
             Index Cond: (state = '1'::bpchar)
       ->  Seq Scan on part_tab01_part_1  (cost=0.00..155358.99 rows=999999 width=1008) (actual time=0.011..424.713 rows=1000000 loops=1)
             Filter: (state = '1'::bpchar)
             Rows Removed by Filter: 10
     Planning Time: 0.293 ms
     Execution Time: 515.549 ms
    (8 rows)
     
    test=# explain analyze select * from part_tab01 where state='0';
                                                                           QUERY PLAN
    ---------------------------------------------------------------------------------------------------------------------------------------------------------
     Append  (cost=0.00..160363.68 rows=1000014 width=1008) (actual time=0.022..517.127 rows=1000010 loops=1)
       ->  Seq Scan on part_tab01_part_0  (cost=0.00..155359.16 rows=1000013 width=1008) (actual time=0.022..451.523 rows=1000000 loops=1)
             Filter: (state = '0'::bpchar)
             Rows Removed by Filter: 10
       ->  Index Scan using part_tab01_part_1_state_idx on part_tab01_part_1  (cost=0.42..4.44 rows=1 width=1008) (actual time=0.032..0.035 rows=10 loops=1)
             Index Cond: (state = '0'::bpchar)
     Planning Time: 0.090 ms
     Execution Time: 547.486 ms
    (8 rows)

    三、结论

    从本例可以看出,在不同分区数据分布不同的场景下,PostgreSQL针对不同分区有独立的执行计划是更优方法。现实中典型的场景,如:按时间分区的工单表,历史分区可能大部分工单是结束状态,而当前分区工单可能大部分是非结束状态,因此,针对历史与当前分区,可能需要不同的执行计划。

    结论:PostgreSQL 会针对不同的分区制定不同的执行计划,执行计划可能更合理,但是,由于需要读取每个分区的统计数据,不可避免对于执行计划的生成有影响,特别是在分区数量非常多的情况下,生成执行计划的效率就非常低。

  • 相关阅读:
    “Life Long Learning”(终身学习)和“灾难性遗忘”(catastrophic forgetting)
    高质量的子程序
    最强辅助!阿里最新总结22年大厂Java面试真题+核心知识点全面覆盖+答案详解!
    静态搜索iOS动态链接函数的调用位置
    SuperViT:Super Vision Transformer
    【kali-权限提升】(4.2.4)社会工程学工具包:远控木马使用、设置、利用
    Vue框架中的各种指令(续)
    Vue组件知识点详解(组件创建、组件通信)
    算法与数据结构(第二周)——排序基础:插入排序法
    腾讯事务处理技术验证系统3TS-Coo模板安装文档&&说明文档(小白向,简单轻松就能上手)
  • 原文地址:https://www.cnblogs.com/kingbase/p/16661948.html