• PostgreSQL ash —— pgsentinel插件


    零、 注意事项

           测试发现,pgsentinel插件在pg_active_session_history视图记录条数较多时,存在严重的内存占用问题,群里的其他朋友反馈还可能存在严重的内存泄漏问题。本文仅用于学习和测试,未用于生产环境。

           设置 pgsentinel_ash.max_entries=10000000,启动DB报错需请求28G内存。对于负载稍高的数据库,例如每秒40个活跃会话,按照每秒收集一次的频率,1000万行也仅够保存不到3天的数据,该插件就需要占掉28G内存,实用性太低。

    -bash-4.2$ pg_ctl start -D $PGDATA
    waiting for server to start....2023-10-07 19:43:09.865 CST [2210] FATAL:  could not map anonymous shared memory: Cannot allocate memory
    2023-10-07 19:43:09.865 CST [2210] HINT:  This error usually means that PostgreSQL's request for a shared memory segment exceeded available memory, swap space, or huge pages. To reduce the request size (currently 29601538048 bytes), reduce PostgreSQL's shared memory usage, perhaps by reducing shared_buffers or max_connections.
    2023-10-07 19:43:09.865 CST [2210] LOG:  database system is shut down
     stopped waiting
    pg_ctl: could not start server
    Examine the log output

    一、 插件作用

           众所周知,pg是没有像oracle那样的ash视图的,因此要回溯历史问题不太方便。pgsentinel插件会将pg_stat_activity与pg_stat_statements视图内容定期快照,并存入pg_active_session_history和pg_stat_statements_history视图中(重启数据库其中数据会被清空)。

    1. pg_active_session_history视图字段

    重启数据库其中数据会被清空,大部分字段与对应版本的pg_stat_activity视图字段含义相同

    ColumnType备注
    ash_timetimestamp with time zone采样时间
    datidoid
    datnametext
    pidinteger
    leader_pidinteger若有并行,其leader进程的pid
    usesysidoiduser id
    usenametext
    application_nametext
    client_addrtext
    client_hostnametext
    client_portinteger
    backend_starttimestamp with time zone
    xact_starttimestamp with time zone
    query_starttimestamp with time zone
    state_changetimestamp with time zone
    wait_event_typetext
    wait_eventtext
    statetext
    backend_xidxid
    backend_xminxid
    top_level_querytext执行函数、存储过程时的外层SQL(开pg_stat_statements.track = all才会有区别)
    querytext
    cmdtypetext

    queryidbigint
    backend_typetext
    blockersintegerblockers数量
    blockerpidinteger
    blocker_statetext

    2. pg_stat_statements_history视图字段

    重启数据库其中数据会被清空,与对应版本的pg_stat_statements视图字段含义相同

    ColumnType备注
    ash_timetimestamp with time zone
    useridoid
    dbidoid
    queryidbigint
    callsbigint
    total_exec_timedouble precision
    rowsbigint
    shared_blks_hitbigint
    shared_blks_readbigint
    shared_blks_dirtiedbigint
    shared_blks_writtenbigint
    local_blks_hitbigint
    local_blks_readbigint
    local_blks_dirtiedbigint
    local_blks_writtenbigint
    temp_blks_readbigint
    temp_blks_writtenbigint
    blk_read_timedouble precision
    blk_write_timedouble precision
    plansbigint
    total_plan_timedouble precision
    wal_recordsbigint
    wal_fpibigint
    wal_bytesnumeric

    二、 插件安装配置

    1. 下载

    GitHub - pgsentinel/pgsentinel: postgresql extension providing Active session history

    2. 安装

    1. # poatgres用户执行
    2. unzip pgsentinel-master.zip
    3. cd pgsentinel-master/src
    4. make
    5. # root用户执行(要配环境变量,参考下面)
    6. make install

    具体安装过程

    -bash-4.2$ unzip pgsentinel-master.zip 
    -bash-4.2$ cd pgsentinel-master/src
    -bash-4.2$ make
    gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -O2 -fPIC -I. -I./ -I/data/postgres/base/14.0/include/server -I/data/postgres/base/14.0/include/internal  -D_GNU_SOURCE   -c -o pgsentinel.o pgsentinel.c
    gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -O2 -fPIC -I. -I./ -I/data/postgres/base/14.0/include/server -I/data/postgres/base/14.0/include/internal  -D_GNU_SOURCE   -c -o get_parsedinfo.o get_parsedinfo.c
    gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -O2 -fPIC -shared -o pgsentinel.so pgsentinel.o get_parsedinfo.o -L/data/postgres/base/14.0/lib    -Wl,--as-needed -Wl,-rpath,'/data/postgres/base/14.0/lib',--enable-new-dtags -lm  

    [root@linux01 ~]# vi .bash_profile
    # .bash_profile

    # Get the aliases and functions
    if [ -f ~/.bashrc ]; then
            . ~/.bashrc
    fi

    # User specific environment and startup programs

    PATH=$PATH:$HOME/bin

    export PATH
     
    export PGHOME=/data/postgres/base/14.0
    export PGDATA=/data/postgres/pg5432/data
    export PATH=$PGHOME/bin:$PATH:$HOME/bin
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PGHOME/lib
    export LANG=en_US.UTF-8                                                  
    ~                                 
    [root@linux01 ~]# source .bash_profile
    [root@linux01 ~]# 
    [root@linux01 ~]# cd .../pgsentinel-master/src
    [root@linux01 src]# make install 
    /usr/bin/mkdir -p '/data/postgres/base/14.0/lib'
    /usr/bin/mkdir -p '/data/postgres/base/14.0/share/extension'
    /usr/bin/mkdir -p '/data/postgres/base/14.0/share/extension'
    /usr/bin/install -c -m 755  pgsentinel.so '/data/postgres/base/14.0/lib/pgsentinel.so'
    /usr/bin/install -c -m 644 .//pgsentinel.control '/data/postgres/base/14.0/share/extension/'
    /usr/bin/install -c -m 644 .//pgsentinel--1.0.sql  '/data/postgres/base/14.0/share/extension/'

    创建插件

    CREATE EXTENSION pgsentinel;

    3. 插件配置

    • 必须配置
    1. vi postgresql.conf
    2. shared_preload_libraries = 'pg_stat_statements,auto_explain,pgsentinel'

    若未配置,查询会报错

    1. postgres=# select * from pg_active_session_history ; 
    2. ERROR:  pg_active_session_history must be loaded via shared_preload_libraries

    重启db生效

    1. pg_ctl stop -m fast
    2. pg_ctl start -D $PGDATA
    3. postgres=# select * from pg_active_session_history ;
    4. (0 rows)

    • 可选配置

    可以直接在postgresql.conf中修改,也可以alter system设置

     alter system set pgsentinel_pgssh.enable=on;
    参数名参数含义默认值建议值备注
    pgsentinel_ash.sampling_period采样时间(秒)110,视业务负载及需求而定reload生效
    pgsentinel_ash.max_entries

    pg_active_session_history

    最大记录条数(占用ring buffer大小,单位为字节)

    1000视业务负载及需求而定。注意非常耗内存,设置1000万约占内存28G重启生效。设置过大可能内存不足,DB启动失败
    pgsentinel.db_name数据存在哪个db中postgrespgawr重启生效
    pgsentinel_ash.track_idle_trans是否记录 idle in transaction 状态会话offonreload生效
    pgsentinel_pgssh.max_entries

    pg_stat_statements_history

    最大记录条数(占用ring buffer大小,单位为字节)

    10000视业务负载及需求而定重启生效。设置过大可能内存不足,DB启动失败
    pgsentinel_pgssh.enable是否启用 pg_stat_statements_historyoffon重启生效

    这部分对应源码

    从中也可以看到各参数含义、默认值、最小最大值,是否需重启生效等

    1. static void
    2. pgsentinel_load_params(void)
    3. {
    4. DefineCustomIntVariable("pgsentinel_ash.sampling_period",
    5. "Duration between each pull (in seconds).",
    6. NULL,
    7. &ash_sampling_period,
    8. 1,
    9. 1,
    10. INT_MAX,
    11. PGC_SIGHUP,
    12. 0,
    13. NULL,
    14. NULL,
    15. NULL);
    16. DefineCustomBoolVariable("pgsentinel_ash.track_idle_trans",
    17. "Track session in idle transaction state.",
    18. NULL,
    19. &ash_track_idle_trans,
    20. false,
    21. PGC_SIGHUP,
    22. 0,
    23. NULL,
    24. NULL,
    25. NULL);
    26. if (!process_shared_preload_libraries_in_progress)
    27. return;
    28. /* can't define PGC_POSTMASTER variable after startup */
    29. DefineCustomIntVariable("pgsentinel_ash.max_entries",
    30. "Maximum number of ash entries.",
    31. NULL,
    32. &ash_max_entries,
    33. 1000,
    34. 1000,
    35. INT_MAX,
    36. PGC_POSTMASTER,
    37. 0,
    38. NULL,
    39. NULL,
    40. NULL);
    41. EmitWarningsOnPlaceholders("pgsentinel_ash");
    42. DefineCustomIntVariable("pgsentinel_pgssh.max_entries",
    43. "Maximum number of pgssh entries.",
    44. NULL,
    45. &pgssh_max_entries,
    46. 10000,
    47. 10000,
    48. INT_MAX,
    49. PGC_POSTMASTER,
    50. 0,
    51. NULL,
    52. NULL,
    53. NULL);
    54. DefineCustomBoolVariable("pgsentinel_pgssh.enable",
    55. "Enable pg_stat_statements_history.",
    56. NULL,
    57. &pgssh_enable,
    58. false,
    59. PGC_POSTMASTER,
    60. 0,
    61. NULL,
    62. NULL,
    63. NULL);
    64. EmitWarningsOnPlaceholders("pgsentinel_pgssh");
    65. DefineCustomStringVariable("pgsentinel.db_name",
    66. gettext_noop("Database on which the worker connect."),
    67. NULL,
    68. &pgsentinelDbName,
    69. "postgres",
    70. PGC_POSTMASTER,
    71. GUC_SUPERUSER_ONLY,
    72. NULL, NULL, NULL);
    73. }

    • 其他相关参数

    查询语句保留长度

    1. # 为每个活动会话的pg_stat_activity.query字段所保留的内存量(字节,默认1024)
    2. track_activity_query_size = 2048

    跟踪层级

           pgsentinel依赖于pg_stat_statements插件的数据,如果想要更详细,可以调整相应参数(但必须注意对系统的负载)

    1. # 记录函数和存储过程中的子语句
    2. pg_stat_statements.track = all

    四、 实现原理

           插件最核心的就是pg_active_session_history,pg_stat_statements_history两个视图,所以源码中最重要的,也就是这两个视图的创建。

    1. 视图创建

    源码中的 pgsentinel--1.0.sql,可以看到这两个视图内容来自两个函数,并进行授权

    1. CREATE VIEW pg_active_session_history AS
    2. SELECT * FROM pg_active_session_history();
    3. GRANT SELECT ON pg_active_session_history TO PUBLIC;
    4. CREATE VIEW pg_stat_statements_history AS
    5. SELECT * FROM pg_stat_statements_history();
    6. GRANT SELECT ON pg_stat_statements_history TO PUBLIC;

    而这两个函数实际是用c语言编写的

    2. 函数创建

    1. CREATE FUNCTION pg_active_session_history(
    2. OUT ash_time timestamptz,
    3. OUT datid Oid,
    4. OUT datname text,
    5. OUT pid integer,
    6. OUT leader_pid integer,
    7. OUT usesysid Oid,
    8. OUT usename text,
    9. OUT application_name text,
    10. OUT client_addr text,
    11. OUT client_hostname text,
    12. OUT client_port integer,
    13. OUT backend_start timestamptz,
    14. OUT xact_start timestamptz,
    15. OUT query_start timestamptz,
    16. OUT state_change timestamptz,
    17. OUT wait_event_type text,
    18. OUT wait_event text,
    19. OUT state text,
    20. OUT backend_xid xid,
    21. OUT backend_xmin xid,
    22. OUT top_level_query text,
    23. OUT query text,
    24. OUT cmdtype text,
    25. OUT queryid bigint,
    26. OUT backend_type text,
    27. OUT blockers integer,
    28. OUT blockerpid integer,
    29. OUT blocker_state text
    30. )
    31. RETURNS SETOF record
    32. AS 'MODULE_PATHNAME', 'pg_active_session_history'
    33. LANGUAGE C STRICT VOLATILE PARALLEL SAFE;
    34. -- Register a view on the function for ease of use.
    35. CREATE VIEW pg_active_session_history AS
    36. SELECT * FROM pg_active_session_history();
    37. GRANT SELECT ON pg_active_session_history TO PUBLIC;
    38. CREATE FUNCTION pg_stat_statements_history(
    39. OUT ash_time timestamptz,
    40. OUT userid Oid,
    41. OUT dbid Oid,
    42. OUT queryid bigint,
    43. OUT calls bigint,
    44. OUT total_exec_time double precision,
    45. OUT rows bigint,
    46. OUT shared_blks_hit bigint,
    47. OUT shared_blks_read bigint,
    48. OUT shared_blks_dirtied bigint,
    49. OUT shared_blks_written bigint,
    50. OUT local_blks_hit bigint,
    51. OUT local_blks_read bigint,
    52. OUT local_blks_dirtied bigint,
    53. OUT local_blks_written bigint,
    54. OUT temp_blks_read bigint,
    55. OUT temp_blks_written bigint,
    56. OUT blk_read_time double precision,
    57. OUT blk_write_time double precision,
    58. OUT plans bigint,
    59. OUT total_plan_time double precision,
    60. OUT wal_records bigint,
    61. OUT wal_fpi bigint,
    62. OUT wal_bytes numeric
    63. )
    64. RETURNS SETOF record
    65. AS 'MODULE_PATHNAME', 'pg_stat_statements_history'
    66. LANGUAGE C STRICT VOLATILE PARALLEL SAFE;

    既然如此,我们看看源码中究竟是怎么实现的这些函数


    五、 源码学习

    1. pg_active_session_history函数内容

    它有两个分支,另外根据不同pg版本有不同语句(这里只挑了一个版本):

    • 启用pgsa_query_no_track_idle,即只记录active会话
    1. select act.datid, act.datname, act.pid, act.usesysid, act.usename, \
    2. act.application_name, text(act.client_addr), act.client_hostname, \
    3. act.client_port, act.backend_start, act.xact_start, act.query_start, \
    4. act.state_change, case when act.wait_event_type is null then 'CPU' \
    5. else act.wait_event_type end as wait_event_type,case when act.wait_event is null \
    6. then 'CPU' else act.wait_event end as wait_event, act.state, act.backend_xid, \
    7. act.backend_xmin, act.query, act.backend_type,(pg_blocking_pids(act.pid))[1], \
    8. cardinality(pg_blocking_pids(act.pid)),blk.state,gpi.*, act.leader_pid \
    9. from pg_stat_activity act left join pg_stat_activity blk \
    10. on (pg_blocking_pids(act.pid))[1] = blk.pid,get_parsedinfo(act.pid) gpi \
    11. where act.state ='active' and act.pid != pg_backend_pid()";
    • 启用 pgsa_query_track_idle,即记录active和idle in transaction会话
    1. select act.datid, act.datname, act.pid, act.usesysid, act.usename, \
    2. act.application_name, text(act.client_addr), act.client_hostname, \
    3. act.client_port, act.backend_start, act.xact_start, act.query_start, \
    4. act.state_change, case when act.wait_event_type is null then 'CPU' \
    5. else act.wait_event_type end as wait_event_type,case when act.wait_event is null \
    6. then 'CPU' else act.wait_event end as wait_event, act.state, act.backend_xid, \
    7. act.backend_xmin, act.query, act.backend_type,(pg_blocking_pids(act.pid))[1], \
    8. cardinality(pg_blocking_pids(act.pid)),blk.state,gpi.*, act.leader_pid \
    9. from pg_stat_activity act left join pg_stat_activity blk \
    10. on (pg_blocking_pids(act.pid))[1] = blk.pid,get_parsedinfo(act.pid) gpi \
    11. where act.state in ('active', 'idle in transaction') and act.pid != pg_backend_pid()";

    2. pg_stat_statements_query函数内容

    也有版本区分,这里只取其中一版

    1. select userid, dbid, queryid, calls, total_exec_time, rows, shared_blks_hit, \
    2. shared_blks_read, shared_blks_dirtied, shared_blks_written, local_blks_hit, \
    3. local_blks_read, local_blks_dirtied, local_blks_written, temp_blks_read, \
    4. temp_blks_written, blk_read_time, blk_write_time, \
    5. plans, total_plan_time, wal_records, wal_fpi, wal_bytes \
    6. from pg_stat_statements \
    7. where queryid in (select queryid from pg_active_session_history \
    8. where ash_time in (select ash_time from pg_active_session_history \
    9. order by ash_time desc limit 1))";

    3. 记录内容

    每一行记录叫做一个entry

    • pg_active_session_history对应叫ashEntry
    • pg_stat_statements_query对应叫pgsshEntry
    1. /* ash entry */
    2. typedef struct ashEntry
    3. {
    4. int pid;
    5. #if PG_VERSION_NUM >= 130000
    6. int leader_pid;
    7. #endif
    8. int client_port;
    9. uint64 queryid;
    10. TimestampTz ash_time;
    11. Oid datid;
    12. Oid usesysid;
    13. char *usename;
    14. char *datname;
    15. char *application_name;
    16. char *wait_event_type;
    17. char *wait_event;
    18. char *state;
    19. char *blocker_state;
    20. char *client_hostname;
    21. int blockers;
    22. int blockerpid;
    23. char *top_level_query;
    24. char *query;
    25. char *cmdtype;
    26. char *backend_type;
    27. char *client_addr;
    28. TransactionId backend_xmin;
    29. TransactionId backend_xid;
    30. TimestampTz backend_start;
    31. TimestampTz xact_start;
    32. TimestampTz query_start;
    33. TimestampTz state_change;
    34. } ashEntry;
    35. /* pg_stat_statement_history entry */
    36. typedef struct pgsshEntry
    37. {
    38. TimestampTz ash_time;
    39. Oid userid;
    40. Oid dbid;
    41. uint64 queryid;
    42. int64 calls;
    43. double total_time;
    44. int64 rows;
    45. int64 shared_blks_hit;
    46. int64 shared_blks_read;
    47. int64 shared_blks_dirtied;
    48. int64 shared_blks_written;
    49. int64 local_blks_hit;
    50. int64 local_blks_read;
    51. int64 local_blks_dirtied;
    52. int64 local_blks_written;
    53. int64 temp_blks_read;
    54. int64 temp_blks_written;
    55. double blk_read_time;
    56. double blk_write_time;
    57. #if PG_VERSION_NUM >= 130000
    58. int64 plans;
    59. double total_plan_time;
    60. int64 wal_records;
    61. int64 wal_fpi;
    62. uint64 wal_bytes;
    63. #endif
    64. } pgsshEntry;

    每个字段有一个buffer变量,记录共享内存用量,例如

    1. static char *AshEntryUsenameBuffer = NULL;
    2. static char *AshEntryDatnameBuffer = NULL;
    3. static char *AshEntryAppnameBuffer = NULL;

           ash_entry_memsize和pgssh_entry_memsize估算entry所需内存,如果占用量过大,DB启动可能会失败。基本原理是:

    • 每行占用内存 = 各字段占用内存之和
    • 总占用内存 = 每行占用内存 * 最大行数 ash_max_entries
    • 这里几乎都用字段最大长度NAMEDATALEN来估算,导致数值非常大,且数据又不落磁盘,全都使用内存,导致该插件消耗内存过大,几近不可用
    1. /* Estimate amount of shared memory needed for ash entry */
    2. static Size
    3. ash_entry_memsize(void)
    4. {
    5. Size size;
    6. /* AshEntryArray */
    7. size = mul_size(sizeof(ashEntry), ash_max_entries);
    8. /* AshEntryUsenameBuffer */
    9. size = add_size(size, mul_size(NAMEDATALEN, ash_max_entries));
    10. /* AshEntryDatnameBuffer */
    11. size = add_size(size, mul_size(NAMEDATALEN, ash_max_entries));
    12. /* AshEntryAppnameBuffer */
    13. size = add_size(size, mul_size(NAMEDATALEN, ash_max_entries));
    14. /* AshEntryClientaddrBuffer */
    15. size = add_size(size, mul_size(NAMEDATALEN, ash_max_entries));
    16. /* AshEntryWaitEventTypeBuffer */
    17. size = add_size(size, mul_size(NAMEDATALEN, ash_max_entries));
    18. /* AshEntryWaitEventBuffer */
    19. size = add_size(size, mul_size(NAMEDATALEN, ash_max_entries));
    20. /* AshEntryStateBuffer */
    21. size = add_size(size, mul_size(NAMEDATALEN, ash_max_entries));
    22. /* AshEntryClientHostnameBuffer */
    23. size = add_size(size, mul_size(NAMEDATALEN, ash_max_entries));
    24. /* AshEntryQueryBuffer */
    25. size = add_size(size, mul_size(pgstat_track_activity_query_size,
    26. ash_max_entries));
    27. /* AshEntryCmdTypeBuffer */
    28. size = add_size(size, mul_size(NAMEDATALEN, ash_max_entries));
    29. /* AshEntryTopLevelQueryBuffer */
    30. size = add_size(size, mul_size(pgstat_track_activity_query_size,
    31. ash_max_entries));
    32. /* AshEntryBackendTypeBuffer */
    33. size = add_size(size, mul_size(NAMEDATALEN, ash_max_entries));
    34. /* AshEntryBlockerStateBuffer */
    35. size = add_size(size, mul_size(NAMEDATALEN, ash_max_entries));
    36. return size;
    37. }

    参考:

    GitHub - pgsentinel/pgsentinel: postgresql extension providing Active session history

    一种PostgreSQL数据库监控和溯源分析的方法和系统与流程

    PostgreSQL 12.2官方手册学习( 第19章 运行时统计数据) - 墨天轮

  • 相关阅读:
    【牛客刷题-算法】NC7 买卖股票的最好时机(一)
    Spring中类内部调用方法AOP失效的原因
    OpenHarmony实战开发-组件复用实践。
    【基于pyAudioKits的Python音频信号处理项目(二)】深度学习语音识别
    HCIP---VRRP
    React UseMemo源码分析
    Oracle存储过程
    MySQL 基础语法(2)
    malicious software
    html实现粘贴excel数据,在页面表格中复制
  • 原文地址:https://blog.csdn.net/Hehuyi_In/article/details/110709344