Greenplum-表的存储模式

在学习Greenplum的过程中，了解到Greenplum中表有多种不同的存储模式，包括堆表、AO表、行存、列表、压缩、加密这些特性，这些不同的存储模式分别适用什么样的场景，我们在真正使用的时候又该怎么合理的选择哪一种表呢？本文就GP中的这几种表的存储模式加以说明。

配置参数

既然GP中的表有这么多的存储模式，那么肯定就有一种存储模式是默认选择，这个默认值通过参数gp_default_storage_options来控制。
我们可以通过以下命令来查看一个数据库中的默认存储模式：
gpconfig --show gp_default_storage_options
示例输出如下：

Values on all segments are consistent
GUC          : gp_default_storage_options
Master  value: appendonly=false,blocksize=32768,compresstype=none,checksum=true,orientation=row
Segment value: appendonly=false,blocksize=32768,compresstype=none,checksum=true,orientation=row
1
2
3
4

以上输出表示，默认的存储模式为行存无压缩堆表。

堆表

堆表是默认的存储模式。
适用场景：OLTP类型工作负载，数据装载之后需要频繁的更新。适合小表，如维度表。
6版本引入全局死锁检测，打开此功能后，堆表的更新操作可以并发执行，通过以下参数控制，默认为off。
gpconfig --show gp_enable_global_deadlock_detector

Values on all segments are consistent
GUC          : gp_enable_global_deadlock_detector
Master  value: off
Segment value: off
1
2
3
4

堆表创建示例：

=# create table foo (a int, b text) distributed by (a);
CREATE TABLE
=# \d foo
      Table "public.foo"
 Column |  Type   | Modifiers 
--------+---------+-----------
 a      | integer | 
 b      | text    | 
Distributed by: (a)
1
2
3
4
5
6
7
8
9

AO表(Append-Optimized)

适用场景：OLAP类型工作负载，分批加载入库且不会频繁更新。适合大表，如事实表。AO表不维护MVCC信息，节省一定空间。再结合压缩选项，可大大节省空间。

创建示例：

=# create table bar (a int, b text) with (appendoptimized=true) distributed by (a);
CREATE TABLE
=# \d bar
Append-Only Table "public.bar"
 Column |  Type   | Modifiers 
--------+---------+-----------
 a      | integer | 
 b      | text    | 
Compression Type: None
Compression Level: 0
Block Size: 32768
Checksum: t
Distributed by: (a)
1
2
3
4
5
6
7
8
9
10
11
12
13

以上关键字Append-Only Table代表创建的为AO表。

行存

可以是堆表，也可以是AO表。
适用场景：频繁更新、频繁insert、select或where有经常涉及很多字段。
注：堆表只能创建为行存不能创建为列存。AO表如果没有明确指定创建为列存，默认创建为行存。

列存

只能是AO表。
适用场景：与上述相反。数据批量插入、极少更新、select或where只涉及表的很少字段。

创建示例：

=# create table bar (a int, b text) with (appendoptimized=true, orientation=column) distributed by (a);
CREATE TABLE
=# \d bar
Append-Only Columnar Table "public.bar"
 Column |  Type   | Modifiers 
--------+---------+-----------
 a      | integer | 
 b      | text    | 
Checksum: t
Distributed by: (a)
1
2
3
4
5
6
7
8
9
10

以上关键字Append-Only Columnar代表创建的为AO列存表。
注：大部分情况下，不建议使用列存，因为会导致文件数膨胀严重。

压缩

只能是AO表。
压缩可以作用于整表，也可以是特定列，可以对不同的列使用不同的压缩算法。

行或列	可用压缩类型	支持压缩算法
行	表级	ZLIB, ZSTD, and QUICKLZ（开源版本不可用）
列	表级或列级	RLE_TYPE, ZLIB, ZSTD, and QUICKLZ（开源版本不可用））

创建示例：

=# create table foo (a int, b text) with (appendoptimized=true, compresstype=zlib, compresslevel=5) distributed by (a);
CREATE TABLE
=# \d foo
Append-Only Table "public.foo"
 Column |  Type   | Modifiers 
--------+---------+-----------
 a      | integer | 
 b      | text    | 
Compression Type: zlib
Compression Level: 5
Block Size: 32768
Checksum: t
Distributed by: (a)
1
2
3
4
5
6
7
8
9
10
11
12
13

以上关键字Compression Type: zlib及Compression level: 5代表创建的表使用5级zlib压缩算法。

检查压缩与分布情况

Greenplum针对压缩率与数据分布情况分别提供了对应的函数可以查询，

函数	返回类型	描述
get_ao_distribution(name) get_ao_distribution(oid)	集合类型(dbid, tuplecount)	展示AO表的分布情况，每行对应segid和记录数
get_ao_compression_ratio(name) get_ao_compression_ratio(oid)	float8	计算AO表的压缩率。如果该信息未得到，将返回-1。

示例：

=# select * from get_ao_distribution('bar');
 segmentid | tupcount 
-----------+----------
         2 |     3247
         1 |     3385
         0 |     3368
(3 rows)

=# select * from get_ao_compression_ratio('bar');
 get_ao_compression_ratio 
--------------------------
                        1
(1 row)
1
2
3
4
5
6
7
8
9
10
11
12
13

相关阅读:
分布式互斥
 微信小程序中的列表渲染
 Windows环境编译webots遇到报错：‘gbk‘ codec can‘t decode byte 0x93 in position 547
About Random Forest
DT 卡通材质学习一
 【licheePi-dock】驱动一个OLED-SSD1306
Python大作业——爬虫+可视化+数据分析+数据库（可视化篇）
短视频账号系统源码----3年技术源头开发
 第十八章《JDBC》第2节：JDBC编程
 产品经理就业喜报：有努力，回报就在眼前
原文地址：https://blog.csdn.net/Post_Yuan/article/details/126856708