实际并行workers数量不等于postgresql.conf中设置的max_parallel_workers_per_gather数量

1 前言

本文件的源码来自PostgreSQL 14.5，其它版本略有不同
并行workers并不能显箸提升性能。个人不建议使用并行worker进程，大多数情况下采用postgresql.conf默认配置即可。

PostgreSQL的并行workers是由compute_parallel_worker函数决定的，compute_parallel_worker是估算扫描所需的并行工作线程数，并不是您在postgresql.conf中设置的max_parallel_workers_per_gather数量，compute_parallel_worker会根据heap_pages、index_pages、max_workers(max_parallel_workers_per_gather)来决定并行工作线程数量。

2 源码和调用位置

compute_parallel_worker共有4个地方调用

src\backend\optimizer\path\allpaths.c(801,21)
src\backend\optimizer\path\allpaths.c(3724,21)
src\backend\optimizer\path\costsize.c(707,33)
src\backend\optimizer\plan\planner.c(5953,21)
1
2
3
4

compute_parallel_worker的声明

src\include\optimizer\paths.h(59,12)
1

compute_parallel_worker的实现

src\backend\optimizer\path\allpaths.c(3750,1)
1

compute_parallel_worker的源码

/*
 * Compute the number of parallel workers that should be used to scan a
 * relation.  We compute the parallel workers based on the size of the heap to
 * be scanned and the size of the index to be scanned, then choose a minimum
 * of those.
 *
 * "heap_pages" is the number of pages from the table that we expect to scan, or
 * -1 if we don't expect to scan any.
 *
 * "index_pages" is the number of pages from the index that we expect to scan, or
 * -1 if we don't expect to scan any.
 *
 * "max_workers" is caller's limit on the number of workers.  This typically
 * comes from a GUC.
 * "max_workers"就是postgresql.conf中max_parallel_workers_per_gather的值
 */
int
compute_parallel_worker(RelOptInfo *rel, double heap_pages, double index_pages,
						int max_workers)
{
	int			parallel_workers = 0;

	/*
	 * If the user has set the parallel_workers reloption, use that; otherwise
	 * select a default number of workers.
     * 不需要优化，直接来自表级存储参数parallel_workers
     * 详见第3节直接使用postgresql.conf中设置的max_parallel_workers_per_gather数量
	 */
	if (rel->rel_parallel_workers != -1)
		parallel_workers = rel->rel_parallel_workers;
	else
	{
		/*
		 * If the number of pages being scanned is insufficient to justify a
		 * parallel scan, just return zero ... unless it's an inheritance
		 * child. In that case, we want to generate a parallel path here
		 * anyway.  It might not be worthwhile just for this relation, but
		 * when combined with all of its inheritance siblings it may well pay
		 * off.
		 */
		if (rel->reloptkind == RELOPT_BASEREL &&
			((heap_pages >= 0 && heap_pages < min_parallel_table_scan_size) ||
			 (index_pages >= 0 && index_pages < min_parallel_index_scan_size)))
			return 0;

		if (heap_pages >= 0)
		{
			int			heap_parallel_threshold;
			int			heap_parallel_workers = 1;

			/*
			 * Select the number of workers based on the log of the size of
			 * the relation.  This probably needs to be a good deal more
			 * sophisticated, but we need something here for now.  Note that
			 * the upper limit of the min_parallel_table_scan_size GUC is
			 * chosen to prevent overflow here.
			 */
			heap_parallel_threshold = Max(min_parallel_table_scan_size, 1);
			while (heap_pages >= (BlockNumber) (heap_parallel_threshold * 3))
			{
				heap_parallel_workers++;
				heap_parallel_threshold *= 3;
				if (heap_parallel_threshold > INT_MAX / 3)
					break;		/* avoid overflow */
			}

			parallel_workers = heap_parallel_workers;
		}

		if (index_pages >= 0)
		{
			int			index_parallel_workers = 1;
			int			index_parallel_threshold;

			/* same calculation as for heap_pages above */
			index_parallel_threshold = Max(min_parallel_index_scan_size, 1);
			while (index_pages >= (BlockNumber) (index_parallel_threshold * 3))
			{
				index_parallel_workers++;
				index_parallel_threshold *= 3;
				if (index_parallel_threshold > INT_MAX / 3)
					break;		/* avoid overflow */
			}

			if (parallel_workers > 0)
				parallel_workers = Min(parallel_workers, index_parallel_workers);
			else
				parallel_workers = index_parallel_workers;
		}
	}

	/* In no case use more than caller supplied maximum number of workers */
	parallel_workers = Min(parallel_workers, max_workers);

	return parallel_workers;
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96

3 直接使用postgresql.conf中设置的max_parallel_workers_per_gather数量

如果要使用指定数量的并行worker数，必须使用存储参数parallel_workers。

alter table tab set (parallel_workers=8);
1

配置了存储参数后，compute_parallel_worker就不在对并行worker数优化，直接返回配置的parallel_workers数量。

if (rel->rel_parallel_workers != -1)
		parallel_workers = rel->rel_parallel_workers;
1
2

注意：如果不设置表级存储参数parallel_workers，实际的并行工作线程数由compute_parallel_worker根据heap_pages、index_pages、max_workers来决定并行工作线程数量，因此会出现实际并行工作数量不等于postgresql.conf中设置的max_parallel_workers_per_gather的情况。
在次强制：并行workers并不能显箸提升性能。个人不建议使用并行worker进程，大多数情况下采用postgresql.conf默认配置即可。

相关阅读:
分布式事务理论基础之初识Seata
Vue中如何进行分布式搜索与全文搜索（如Elasticsearch）
只因说了一句：“反正我有技术，在哪不一样”一位年薪35W自动化测试工程师被某讯开除
 XML文件反序列化读取
 【工程应用九】再谈基于离散夹角余弦相似度指标的形状匹配优化（十六角度量化+指令集加速+目标只有部分在图像内的识别+最小外接矩形识别重叠等）
[毕业设计源码]基于微信小程序的校园二手交易系统
 基于SSM的点餐平台系统设计与实现
 java计算机毕业设计智慧问诊系统源码+数据库+系统+lw文档
 LoadBalancer负载均衡服务调用
 羽夏看Linux内核——环境搭建
原文地址：https://blog.csdn.net/kmblack1/article/details/132588014