• Greenplum性能优化之路


    参考博客:

    Greenplum6 JDBC insert 性能百倍提升 *之踩坑之路_恋奴娇的博客-CSDN博客

    HybridDB for PostgreSQL , Greenplum 写入性能优化实践_weixin_34391854的博客-CSDN博客
    Greenplum6 JDBC insert性能媲美MySQL_Greenplum中文社区的博客-CSDN博客_greenplum写入性能

    通过copyIn向PostgreSQL或Greenplum写入数据_陈序猿张的博客-CSDN博客

    Greenplum基于pgbench的性能测试 - 腾讯云开发者社区-腾讯云

    pgbench 使用介绍及示例 - 墨天轮

    基于pgbench的GreenPlum压力测试_zyset的博客-CSDN博客_pgbench 官方压测结果

     pgbench10安装记录(yum)_mingjie73的博客-CSDN博客_pgbench 安装

    背景

    Greenplum数据加载的三种方式:

    1. insert
    2. copyin
    3. 外部表 gpfdist

    理论上来说,性能 gdfdist>copyin>insert(batch)>insert

    1 安装所需工具

    1.1 pgbench

    yum安装pgbench

    1. yum install -y https://download.postgresql.org/pub/repos/yum/10/redhat/rhel-7.9-x86_64/
    2. yum install postgresql10
    3. yum install postgresql10-server
    4. yum install -y postgresql10-contrib

    配置环境变量,默认安装是/usr/pgsql-10/bin,所以需要配置path 路径

    1. vim /etc/profile
    2. # 添加这行
    3. export PATH=$PATH:/usr/pgsql-10/bin
    4. source /etc/profile

    在gp上面创建测试库

    create database pgbenchdb;
    

    执行测试指令

    pgbench -i -s 5 pgbenchdb     --初始化,将在pgbench_accounts表中创建 500,000行。
    

    执行后查询pgbench_accounts的条数,如果50w条说明插入成功

    2 基准测试

    2.1 测试准备

    1、在pgbenchdb数据库下创建测试表tbl

    1. CREATE TABLE public.tbl (
    2. id bigserial NOT NULL,
    3. crt_time timestamp NULL,
    4. sensorid int4 NULL,
    5. sensorloc point NULL,
    6. info text NULL
    7. )
    8. WITH (
    9. autovacuum_enabled=on,
    10. autovacuum_vacuum_threshold=1,
    11. autovacuum_vacuum_cost_delay=0
    12. )
    13. DISTRIBUTED BY (id);

    2、在/home/gpadmin下面创建insert.sql文件,向表中插入一条随机数

    insert into public.tbl (crt_time,sensorid,info) values ( clock_timestamp(),trunc(random()*500000),substring(md5(random()::text),1,8) );
    

    3、在/home/gpadmin下面创建read.sql文件, 从表中读取一条随机数据

    select * from public.tbl where sensorid = trunc(random()*500000);
    

    2.2 开始测试

    2.2.1 写入测试

    测试前

    truncate table public.tbl

    1、链接数测试,模拟224个客户端连接,8个线程,每个客户端8个事务

    pgbench -f /home/gpadmin/insert.sql -c 224 -C -j 8 -t 8 pgbenchdb
    

    执行结果

    1. starting vacuum...end.
    2. transaction type: /home/gpadmin/insert.sql
    3. scaling factor: 1
    4. query mode: simple
    5. number of clients: 224
    6. number of threads: 8
    7. number of transactions per client: 8
    8. number of transactions actually processed: 1792/1792
    9. latency average = 2308.419 ms
    10. tps = 97.036123 (including connections establishing)
    11. tps = 97.929852 (excluding connections establishing)

    2、测试客户端64 10w写入

    pgbench -f /home/gpadmin/insert.sql -c 64 -C -j 8 -t 1563 pgbenchdb

    执行结果

    1. transaction type: /home/gpadmin/insert.sql
    2. scaling factor: 1
    3. query mode: simple
    4. number of clients: 64
    5. number of threads: 8
    6. number of transactions per client: 1563
    7. number of transactions actually processed: 98733/100032
    8. latency average = 649.590 ms
    9. tps = 98.523657 (including connections establishing)
    10. tps = 100.359377 (excluding connections establishing)

    3、测试客户端128 10w写入

    pgbench -f /home/gpadmin/insert.sql -c 128 -C -j 8 -t 782 pgbenchdb

    执行结果

    1. [gpadmin@pnode3 ~]$ pgbench -f /home/gpadmin/insert.sql -c 128 -C -j 8 -t 782 pgbenchdb
    2. starting vacuum...end.
    3. transaction type: /home/gpadmin/insert.sql
    4. scaling factor: 1
    5. query mode: simple
    6. number of clients: 128
    7. number of threads: 8
    8. number of transactions per client: 782
    9. number of transactions actually processed: 100096/100096
    10. latency average = 1316.215 ms
    11. tps = 97.248556 (including connections establishing)
    12. tps = 98.091034 (excluding connections establishing)

    4、测试客户端164 100w写入

    pgbench -f /home/gpadmin/insert.sql -c 64 -C -j 8 -t 15625 pgbenchdb

    执行结果

    1. transaction type: /home/gpadmin/insert.sql
    2. scaling factor: 1
    3. query mode: simple
    4. number of clients: 64
    5. number of threads: 8
    6. number of transactions per client: 15625
    7. number of transactions actually processed: 1000000/1000000
    8. latency average = 769.468 ms
    9. tps = 83.174383 (including connections establishing)
    10. tps = 84.614156 (excluding connections establishing)

    5、客户端128 100w写入

    pgbench -f /home/gpadmin/insert.sql -c 128 -C -j 8 -t 7813 pgbenchdb

    执行结果

    1. transaction type: /home/gpadmin/insert.sql
    2. scaling factor: 1
    3. query mode: simple
    4. number of clients: 128
    5. number of threads: 8
    6. number of transactions per client: 7813
    7. number of transactions actually processed: 1000064/1000064
    8. latency average = 1389.130 ms
    9. tps = 92.144034 (including connections establishing)
    10. tps = 92.931354 (excluding connections establishing)

    2.2.2 读取测试

    1、客户端64  10w查询

    pgbench -f /home/gpadmin/read.sql -c 64 -C -j 8 -t 1563 pgbenchdb

    执行结果

    1. transaction type: /home/gpadmin/read.sql
    2. scaling factor: 1
    3. query mode: simple
    4. number of clients: 64
    5. number of threads: 8
    6. number of transactions per client: 1563
    7. number of transactions actually processed: 100032/100032
    8. latency average = 4188.179 ms
    9. tps = 15.281105 (including connections establishing)
    10. tps = 15.371656 (excluding connections establishing)

    性能太差了 就不往下测试了

    2.2.3 更新测试

    3 优化

    3.1 写入优化

    1 copy in方式

    采用CopyManager对Greenplum进行数据导出导入的Java Demo:Greenplum copy in的Java工具类_CarsonBigData的博客-CSDN博客

    2 gpfdist方式

    3.2 更新优化

    方案描述:增量数据和目标表数据合并到临时表,然后覆盖目标表。

    -- 业务库insert和update的数据:tmp_incr_data

    -- 今天过来的最新数据:tmp_update_data

    -- 目标表里不需要更新的数据:tmp_not_update_data

    -- 目标表:dest_table,按天分区

    -- 主键字段:ID

    1、取出当天未更新的数据,放到临时表里

    1. CREATE TABLE tmp_not_update_data
    2. SELECT *
    3. FROM dest_table a
    4. LEFT JOIN tmp_incr_data b
    5.     ON a.id = b.id
    6. WHERE b.id is null;

    2、从目标表删除当天数据

    TRUNCATE TABLE dest_table PARTITION(DATA_DATE = '2022-07-28');

    3、把当天未更新的数据插回目标表

    1. INSERT INTO dest_table
    2. SELECT * FROM tmp_not_update_data;

    4、把当天更新的数据查到目标表

    1. INSERT INTO dest_table
    2. SELECT * FROM tmp_update_data;

    持续更新。。。

  • 相关阅读:
    Structure-Aware Transformer for Graph Representation Learning
    Neurocomputing投稿记录
    网安之PHP基础作业(5)
    C# 第五章『面向对象』◆第3节:构造函数(方法)
    重温redis和mysql的数据一致性问题
    【探索程序员职业赛道:挑战与机遇】
    学习 Axure RP 的不同阶段
    Mysql和Redis如何保证数据一致性
    Java8 Lambda.stream.sorted() 方法使用浅析分享
    BASH shell脚本篇4——函数
  • 原文地址:https://blog.csdn.net/Carson073/article/details/125929390