• 实际并行workers数量不等于postgresql.conf中设置的max_parallel_workers_per_gather数量


    1 前言

    • 本文件的源码来自PostgreSQL 14.5,其它版本略有不同
    • 并行workers并不能显箸提升性能。个人不建议使用并行worker进程,大多数情况下采用postgresql.conf默认配置即可。

    PostgreSQL的并行workers是由compute_parallel_worker函数决定的,compute_parallel_worker是估算扫描所需的并行工作线程数,并不是您在postgresql.conf中设置的max_parallel_workers_per_gather数量,compute_parallel_worker会根据heap_pages、index_pages、max_workers(max_parallel_workers_per_gather)来决定并行工作线程数量。

    2 源码和调用位置

    compute_parallel_worker共有4个地方调用

    src\backend\optimizer\path\allpaths.c(801,21)
    src\backend\optimizer\path\allpaths.c(3724,21)
    src\backend\optimizer\path\costsize.c(707,33)
    src\backend\optimizer\plan\planner.c(5953,21)
    
    • 1
    • 2
    • 3
    • 4

    compute_parallel_worker的声明

    src\include\optimizer\paths.h(59,12)
    
    • 1

    compute_parallel_worker的实现

    src\backend\optimizer\path\allpaths.c(3750,1)
    
    • 1

    compute_parallel_worker的源码

    /*
     * Compute the number of parallel workers that should be used to scan a
     * relation.  We compute the parallel workers based on the size of the heap to
     * be scanned and the size of the index to be scanned, then choose a minimum
     * of those.
     *
     * "heap_pages" is the number of pages from the table that we expect to scan, or
     * -1 if we don't expect to scan any.
     *
     * "index_pages" is the number of pages from the index that we expect to scan, or
     * -1 if we don't expect to scan any.
     *
     * "max_workers" is caller's limit on the number of workers.  This typically
     * comes from a GUC.
     * "max_workers"就是postgresql.conf中max_parallel_workers_per_gather的值
     */
    int
    compute_parallel_worker(RelOptInfo *rel, double heap_pages, double index_pages,
    						int max_workers)
    {
    	int			parallel_workers = 0;
    
    	/*
    	 * If the user has set the parallel_workers reloption, use that; otherwise
    	 * select a default number of workers.
         * 不需要优化,直接来自表级存储参数parallel_workers
         * 详见第3节直接使用postgresql.conf中设置的max_parallel_workers_per_gather数量
    	 */
    	if (rel->rel_parallel_workers != -1)
    		parallel_workers = rel->rel_parallel_workers;
    	else
    	{
    		/*
    		 * If the number of pages being scanned is insufficient to justify a
    		 * parallel scan, just return zero ... unless it's an inheritance
    		 * child. In that case, we want to generate a parallel path here
    		 * anyway.  It might not be worthwhile just for this relation, but
    		 * when combined with all of its inheritance siblings it may well pay
    		 * off.
    		 */
    		if (rel->reloptkind == RELOPT_BASEREL &&
    			((heap_pages >= 0 && heap_pages < min_parallel_table_scan_size) ||
    			 (index_pages >= 0 && index_pages < min_parallel_index_scan_size)))
    			return 0;
    
    		if (heap_pages >= 0)
    		{
    			int			heap_parallel_threshold;
    			int			heap_parallel_workers = 1;
    
    			/*
    			 * Select the number of workers based on the log of the size of
    			 * the relation.  This probably needs to be a good deal more
    			 * sophisticated, but we need something here for now.  Note that
    			 * the upper limit of the min_parallel_table_scan_size GUC is
    			 * chosen to prevent overflow here.
    			 */
    			heap_parallel_threshold = Max(min_parallel_table_scan_size, 1);
    			while (heap_pages >= (BlockNumber) (heap_parallel_threshold * 3))
    			{
    				heap_parallel_workers++;
    				heap_parallel_threshold *= 3;
    				if (heap_parallel_threshold > INT_MAX / 3)
    					break;		/* avoid overflow */
    			}
    
    			parallel_workers = heap_parallel_workers;
    		}
    
    		if (index_pages >= 0)
    		{
    			int			index_parallel_workers = 1;
    			int			index_parallel_threshold;
    
    			/* same calculation as for heap_pages above */
    			index_parallel_threshold = Max(min_parallel_index_scan_size, 1);
    			while (index_pages >= (BlockNumber) (index_parallel_threshold * 3))
    			{
    				index_parallel_workers++;
    				index_parallel_threshold *= 3;
    				if (index_parallel_threshold > INT_MAX / 3)
    					break;		/* avoid overflow */
    			}
    
    			if (parallel_workers > 0)
    				parallel_workers = Min(parallel_workers, index_parallel_workers);
    			else
    				parallel_workers = index_parallel_workers;
    		}
    	}
    
    	/* In no case use more than caller supplied maximum number of workers */
    	parallel_workers = Min(parallel_workers, max_workers);
    
    	return parallel_workers;
    }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80
    • 81
    • 82
    • 83
    • 84
    • 85
    • 86
    • 87
    • 88
    • 89
    • 90
    • 91
    • 92
    • 93
    • 94
    • 95
    • 96

    3 直接使用postgresql.conf中设置的max_parallel_workers_per_gather数量

    如果要使用指定数量的并行worker数,必须使用存储参数parallel_workers。

    alter table tab set (parallel_workers=8);
    
    • 1

    配置了存储参数后,compute_parallel_worker就不在对并行worker数优化,直接返回配置的parallel_workers数量。

    if (rel->rel_parallel_workers != -1)
    		parallel_workers = rel->rel_parallel_workers;
    
    • 1
    • 2
    • 注意:如果不设置表级存储参数parallel_workers,实际的并行工作线程数由compute_parallel_worker根据heap_pages、index_pages、max_workers来决定并行工作线程数量,因此会出现实际并行工作数量不等于postgresql.conf中设置的max_parallel_workers_per_gather的情况。
      在这里插入图片描述
      在这里插入图片描述

    • 在次强制:并行workers并不能显箸提升性能。个人不建议使用并行worker进程,大多数情况下采用postgresql.conf默认配置即可。

  • 相关阅读:
    分布式事务理论基础之初识Seata
    Vue中如何进行分布式搜索与全文搜索(如Elasticsearch)
    只因说了一句:“反正我有技术,在哪不一样”一位年薪35W自动化测试工程师被某讯开除
    XML文件反序列化读取
    【工程应用九】再谈基于离散夹角余弦相似度指标的形状匹配优化(十六角度量化+指令集加速+目标只有部分在图像内的识别+最小外接矩形识别重叠等)
    [毕业设计源码]基于微信小程序的校园二手交易系统
    基于SSM的点餐平台系统设计与实现
    java计算机毕业设计智慧问诊系统源码+数据库+系统+lw文档
    LoadBalancer负载均衡服务调用
    羽夏看Linux内核——环境搭建
  • 原文地址:https://blog.csdn.net/kmblack1/article/details/132588014