参考博客:
Greenplum6 JDBC insert 性能百倍提升 *之踩坑之路_恋奴娇的博客-CSDN博客
HybridDB for PostgreSQL , Greenplum 写入性能优化实践_weixin_34391854的博客-CSDN博客
Greenplum6 JDBC insert性能媲美MySQL_Greenplum中文社区的博客-CSDN博客_greenplum写入性能
通过copyIn向PostgreSQL或Greenplum写入数据_陈序猿张的博客-CSDN博客
Greenplum基于pgbench的性能测试 - 腾讯云开发者社区-腾讯云
基于pgbench的GreenPlum压力测试_zyset的博客-CSDN博客_pgbench 官方压测结果
pgbench10安装记录(yum)_mingjie73的博客-CSDN博客_pgbench 安装
Greenplum数据加载的三种方式:
理论上来说,性能 gdfdist>copyin>insert(batch)>insert
yum安装pgbench
- yum install -y https://download.postgresql.org/pub/repos/yum/10/redhat/rhel-7.9-x86_64/
-
- yum install postgresql10
- yum install postgresql10-server
- yum install -y postgresql10-contrib
配置环境变量,默认安装是/usr/pgsql-10/bin,所以需要配置path 路径
- vim /etc/profile
-
- # 添加这行
- export PATH=$PATH:/usr/pgsql-10/bin
-
- source /etc/profile
在gp上面创建测试库
create database pgbenchdb;
执行测试指令
pgbench -i -s 5 pgbenchdb --初始化,将在pgbench_accounts表中创建 500,000行。
执行后查询pgbench_accounts的条数,如果50w条说明插入成功
1、在pgbenchdb数据库下创建测试表tbl
- CREATE TABLE public.tbl (
- id bigserial NOT NULL,
- crt_time timestamp NULL,
- sensorid int4 NULL,
- sensorloc point NULL,
- info text NULL
- )
- WITH (
- autovacuum_enabled=on,
- autovacuum_vacuum_threshold=1,
- autovacuum_vacuum_cost_delay=0
- )
- DISTRIBUTED BY (id);
2、在/home/gpadmin下面创建insert.sql文件,向表中插入一条随机数据
insert into public.tbl (crt_time,sensorid,info) values ( clock_timestamp(),trunc(random()*500000),substring(md5(random()::text),1,8) );
3、在/home/gpadmin下面创建read.sql文件, 从表中读取一条随机数据
select * from public.tbl where sensorid = trunc(random()*500000);
测试前
truncate table public.tbl
1、链接数测试,模拟224个客户端连接,8个线程,每个客户端8个事务
pgbench -f /home/gpadmin/insert.sql -c 224 -C -j 8 -t 8 pgbenchdb
执行结果
- starting vacuum...end.
- transaction type: /home/gpadmin/insert.sql
- scaling factor: 1
- query mode: simple
- number of clients: 224
- number of threads: 8
- number of transactions per client: 8
- number of transactions actually processed: 1792/1792
- latency average = 2308.419 ms
- tps = 97.036123 (including connections establishing)
- tps = 97.929852 (excluding connections establishing)
2、测试客户端64 10w写入
pgbench -f /home/gpadmin/insert.sql -c 64 -C -j 8 -t 1563 pgbenchdb
执行结果
- transaction type: /home/gpadmin/insert.sql
- scaling factor: 1
- query mode: simple
- number of clients: 64
- number of threads: 8
- number of transactions per client: 1563
- number of transactions actually processed: 98733/100032
- latency average = 649.590 ms
- tps = 98.523657 (including connections establishing)
- tps = 100.359377 (excluding connections establishing)
3、测试客户端128 10w写入
pgbench -f /home/gpadmin/insert.sql -c 128 -C -j 8 -t 782 pgbenchdb
执行结果
- [gpadmin@pnode3 ~]$ pgbench -f /home/gpadmin/insert.sql -c 128 -C -j 8 -t 782 pgbenchdb
- starting vacuum...end.
- transaction type: /home/gpadmin/insert.sql
- scaling factor: 1
- query mode: simple
- number of clients: 128
- number of threads: 8
- number of transactions per client: 782
- number of transactions actually processed: 100096/100096
- latency average = 1316.215 ms
- tps = 97.248556 (including connections establishing)
- tps = 98.091034 (excluding connections establishing)
4、测试客户端164 100w写入
pgbench -f /home/gpadmin/insert.sql -c 64 -C -j 8 -t 15625 pgbenchdb
执行结果
- transaction type: /home/gpadmin/insert.sql
- scaling factor: 1
- query mode: simple
- number of clients: 64
- number of threads: 8
- number of transactions per client: 15625
- number of transactions actually processed: 1000000/1000000
- latency average = 769.468 ms
- tps = 83.174383 (including connections establishing)
- tps = 84.614156 (excluding connections establishing)
5、客户端128 100w写入
pgbench -f /home/gpadmin/insert.sql -c 128 -C -j 8 -t 7813 pgbenchdb
执行结果
- transaction type: /home/gpadmin/insert.sql
- scaling factor: 1
- query mode: simple
- number of clients: 128
- number of threads: 8
- number of transactions per client: 7813
- number of transactions actually processed: 1000064/1000064
- latency average = 1389.130 ms
- tps = 92.144034 (including connections establishing)
- tps = 92.931354 (excluding connections establishing)
1、客户端64 10w查询
pgbench -f /home/gpadmin/read.sql -c 64 -C -j 8 -t 1563 pgbenchdb
执行结果
- transaction type: /home/gpadmin/read.sql
- scaling factor: 1
- query mode: simple
- number of clients: 64
- number of threads: 8
- number of transactions per client: 1563
- number of transactions actually processed: 100032/100032
- latency average = 4188.179 ms
- tps = 15.281105 (including connections establishing)
- tps = 15.371656 (excluding connections establishing)
性能太差了 就不往下测试了
1 copy in方式
采用CopyManager对Greenplum进行数据导出导入的Java Demo:Greenplum copy in的Java工具类_CarsonBigData的博客-CSDN博客
2 gpfdist方式
方案描述:增量数据和目标表数据合并到临时表,然后覆盖目标表。
-- 业务库insert和update的数据:tmp_incr_data
-- 今天过来的最新数据:tmp_update_data
-- 目标表里不需要更新的数据:tmp_not_update_data
-- 目标表:dest_table,按天分区
-- 主键字段:ID
1、取出当天未更新的数据,放到临时表里
- CREATE TABLE tmp_not_update_data
- SELECT *
- FROM dest_table a
- LEFT JOIN tmp_incr_data b
- ON a.id = b.id
- WHERE b.id is null;
2、从目标表删除当天数据
TRUNCATE TABLE dest_table PARTITION(DATA_DATE = '2022-07-28');
3、把当天未更新的数据插回目标表
- INSERT INTO dest_table
- SELECT * FROM tmp_not_update_data;
4、把当天更新的数据查到目标表
- INSERT INTO dest_table
- SELECT * FROM tmp_update_data;
持续更新。。。