• Postgresql源码(58)元组拼接heap_form_tuple剖析


    版本:14
    相关:
    《Postgresql源码(51)变长类型实现(valena.c)》
    《Postgresql源码(56)可扩展类型分析ExpandedObject/ExpandedRecord》

    2 背景

    typedef struct HeapTupleData
    {
    	uint32		t_len;			/* length of *t_data */
    	ItemPointerData t_self;		/* SelfItemPointer */
    	Oid			t_tableOid;		/* table the tuple came from */
    	HeapTupleHeader t_data;		/* -> tuple header and data */
    } HeapTupleData;
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7

    3 HeapTuple的构造函数heap_form_tuple

    HeapTuple结构在heap_form_tuple函数中拼接,后文重点分析这个函数:

    这里已插入5列数据为例:三定长、二变长

    drop table t21;
    create table t21(i1 int, v10 varchar(10), n1 numeric, c2 char(2), t1 text);
    insert into t21 values (1, 'mylen=7', 5.5, '22', 'hi12345');
    
    • 1
    • 2
    • 3

    3.1 heap_form_tuple入参

    构造函数heap_form_tuple

    HeapTuple
    heap_form_tuple(TupleDesc tupleDescriptor, Datum *values, bool *isnull)
    
    • 1
    • 2

    注意入参是一个元组描述符、值数组、isnull数组,值数组里面记的是int值或datum数据指针

    (gdb) p *tupleDescriptor
    $9 = {natts = 5, tdtypeid = 2249, tdtypmod = -1, tdrefcount = -1, constr = 0x0, attrs = 0x199ce90}
    (gdb) p values[0]
    $11 = 1            : int的值
    (gdb) p values[1]
    $12 = 27157600     : datum数据指针
    (gdb) p values[2]
    $13 = 27153160     : datum数据指针
    (gdb) p values[3]
    $14 = 27158432     : datum数据指针
    (gdb) p values[4]
    $15 = 27154592     : datum数据指针
    (gdb) p isnull[0] 
    $17 = false
    (gdb) p isnull[1]
    $18 = false
    (gdb) p isnull[2]
    $19 = false
    (gdb) p isnull[3]
    $20 = false
    (gdb) p isnull[4]
    $21 = false
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22

    3.2 heap_form_tuple执行流程

    • 注意:hoff的位置是HeapTupleHeaderData往后多少能偏移到数据
    • 注意:tuple->t_data的位置是HeapTupleData往后偏移多少能到HeapTupleHeaderData头的位置
    • 内存结构是:HeapTupleData+HeapTupleHeaderData+数据
    heap_form_tuple
    ...
        len = offsetof(HeapTupleHeaderData, t_bits)        : 计算出头的大小len = 23,t_bits是柔性数组指针
        hoff = len = MAXALIGN(len);                        : 对齐hoff = len = 24
        data_len = heap_compute_data_size(...)             : 计算出数据需要的长度见3.3,共data_len = 30字节
        len += data_len;                                   : len = 24 + 30 = 54
    
        tuple = (HeapTuple) palloc0(HEAPTUPLESIZE + len)   : 申请HeapTupleData + HeapTupleHeaderData + 数据30字节
        tuple->t_data = td = (HeapTupleHeader) ((char *) tuple + HEAPTUPLESIZE)
                                                           : t_data指向的是HeapTupleData后,HeapTupleHeaderData头的位置
        ...
        // 配置tuple的值
        ...
        heap_fill_tuple                                    : 根据数据类型开始添加数据,见3.4
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14

    3.3 heap_compute_data_size

    计算数据长度heap_compute_data_size,已下面SQL为例

    drop table t21;
    create table t21(i1 int, v10 varchar(10), n1 numeric, c2 char(2), t1 text);
    insert into t21 values (1, 'mylen=7', 5.5, '22', 'hi12345');
    
    • 1
    • 2
    • 3

    函数对每个列单独处理,主要处理逻辑走三个分支:

    3.3.1 三个分支的进入逻辑

    分支一: atti->attlen == -1atti->attstorage != 'p' 且 当前是4B头 且 数据很短能换成1B头
    分支二: atti->attlen == -1 且 当前是1B_E头 且 1B_E是RO类型VARTAG_EXPANDED_RO
    分支三: 其他情况

    		if (ATT_IS_PACKABLE(atti) &&
    			VARATT_CAN_MAKE_SHORT(DatumGetPointer(val)))
    		{
    			/*
    			 * we're anticipating converting to a short varlena header, so
    			 * adjust length and don't count any alignment
    			 */
    			data_length += VARATT_CONVERTED_SHORT_SIZE(DatumGetPointer(val));
    		}
    		else if (atti->attlen == -1 &&
    				 VARATT_IS_EXTERNAL_EXPANDED(DatumGetPointer(val)))
    		{
    			/*
    			 * we want to flatten the expanded value so that the constructed
    			 * tuple doesn't depend on it
    			 */
    			data_length = att_align_nominal(data_length, atti->attalign);
    			data_length += EOH_get_flat_size(DatumGetEOHP(val));
    		}
    		else
    		{
    			data_length = att_align_datum(data_length, atti->attalign,
    										  atti->attlen, val);
    			data_length = att_addlength_datum(data_length, atti->attlen,
    											  val);
    		}
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26

    对于五列测试数据

    int类型:走分支三(长度4)

    (gdb) p atti->attlen
    $30 = 4
    (gdb) p atti->attstorage
    $31 = 112 'p'
    
    • 1
    • 2
    • 3
    • 4

    计算流程

    // 第一步:对齐data_length=0,对齐后还是0
    			data_length = att_align_datum(data_length, atti->attalign,
    										  atti->attlen, val);
    // 第二步:加上长度atti->attlen,data_length=4
    			data_length = att_addlength_datum(data_length, atti->attlen,
    											  val);
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6

    长度增加4

    varchar类型:走分支一(长度8)

    (gdb) p atti->attlen
    $38 = -1
    (gdb) p atti->attstorage
    $39 = 120 'x'
    
    • 1
    • 2
    • 3
    • 4

    计算流程

    // 能1B就能装下了, 后面会把4B转成1B头,这里按1B计算长度即可
    data_length += VARATT_CONVERTED_SHORT_SIZE(DatumGetPointer(val))
    
    • 1
    • 2

    长度增加8

    numeric类型:走分支一(长度7)

    char类型:走分支一(长度3)

    1B头加上自己2个字节,一共三字节

    text类型:走分支一(长度8)

    1B头加上自己7个字节,一共8字节

    3.4 heap_fill_tuple

    heap_fill_tuple对每一列调用fill_val填入数据

    heap_fill_tuple
      for (i = 0; i < numberOfAttributes; i++)
        fill_val(...)  
    
    • 1
    • 2
    • 3

    fill_val的分支就比较多了,对于每一列都进入下面4个分支来处理

        if (att->attbyval)
    	{
    		/* pass-by-value */
    		data = (char *) att_align_nominal(data, att->attalign);
    		store_att_byval(data, datum, att->attlen);
    		data_length = att->attlen;
    	}
    	else if (att->attlen == -1)
    	{
    		/* varlena */
    		Pointer		val = DatumGetPointer(datum);
    
    		*infomask |= HEAP_HASVARWIDTH;
    		if (VARATT_IS_EXTERNAL(val))
    		{
    			if (VARATT_IS_EXTERNAL_EXPANDED(val))
    			{
    				/*
    				 * we want to flatten the expanded value so that the
    				 * constructed tuple doesn't depend on it
    				 */
    				ExpandedObjectHeader *eoh = DatumGetEOHP(datum);
    
    				data = (char *) att_align_nominal(data,
    												  att->attalign);
    				data_length = EOH_get_flat_size(eoh);
    				EOH_flatten_into(eoh, data, data_length);
    			}
    			else
    			{
    				*infomask |= HEAP_HASEXTERNAL;
    				/* no alignment, since it's short by definition */
    				data_length = VARSIZE_EXTERNAL(val);
    				memcpy(data, val, data_length);
    			}
    		}
    		else if (VARATT_IS_SHORT(val))
    		{
    			/* no alignment for short varlenas */
    			data_length = VARSIZE_SHORT(val);
    			memcpy(data, val, data_length);
    		}
    		else if (VARLENA_ATT_IS_PACKABLE(att) &&
    				 VARATT_CAN_MAKE_SHORT(val))
    		{
    			/* convert to short varlena -- no alignment */
    			data_length = VARATT_CONVERTED_SHORT_SIZE(val);
    			SET_VARSIZE_SHORT(data, data_length);
    			memcpy(data + 1, VARDATA(val), data_length - 1);
    		}
    		else
    		{
    			/* full 4-byte header varlena */
    			data = (char *) att_align_nominal(data,
    											  att->attalign);
    			data_length = VARSIZE(val);
    			memcpy(data, val, data_length);
    		}
    	}
    	else if (att->attlen == -2)
    	{
    		/* cstring ... never needs alignment */
    		*infomask |= HEAP_HASVARWIDTH;
    		Assert(att->attalign == TYPALIGN_CHAR);
    		data_length = strlen(DatumGetCString(datum)) + 1;
    		memcpy(data, DatumGetPointer(datum), data_length);
    	}
    	else
    	{
    		/* fixed-length pass-by-reference */
    		data = (char *) att_align_nominal(data, att->attalign);
    		Assert(att->attlen > 0);
    		data_length = att->attlen;
    		memcpy(data, DatumGetPointer(datum), data_length);
    	}
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75

    分支:

    1. att->attbyval == true 值是直接传递的,就直接赋值就好了
    2. att->attlen == -1变长头类型,要走valena按4B、1B、1B_E分别处理
    3. att->attlen == -2直接拷贝cstring类型
    4. 其他:直接拷贝

    对于五列测试数据

    int类型:走分支一:值拷贝

    传值的数据保存在栈内存上,直接赋值即可

    varchar类型:走分支二:数据4B转换为1B后内存拷贝

    数据足够小,可以不用4B头存储,转换为1B头保存后拷贝

    numeric类型:走分支二:数据4B转换为1B后内存拷贝

    数据足够小,可以不用4B头存储,转换为1B头保存后拷贝

    char类型:走分支二:数据4B转换为1B后内存拷贝

    数据足够小,可以不用4B头存储,转换为1B头保存后拷贝

    text类型:走分支二:数据4B转换为1B后内存拷贝

    数据足够小,可以不用4B头存储,转换为1B头保存后拷贝

  • 相关阅读:
    科研学习|研究方法——Python计量Logit模型
    港科夜闻|香港科大计划建立北部都会区卫星校园完善"科大创新带",发展未来创新科技 未来医药发展及跨学科教育...
    JMeter+influxdb+grafana性能测试监控平台
    苹果与芯片巨头Arm达成20年新合作协议,将继续采用芯片技术
    乐吾乐Topology-le5le为智慧电力可视化赋能(一)
    electron-updater
    2023届-计算机视觉算法岗实习面经
    导数求函数最大值和最小值习题
    用java写一个HttpClients的连接池实例,用PoolingHttpClientConnectionManager
    【多线程笔记02】多线程之CyclicBarrier的介绍和使用
  • 原文地址:https://blog.csdn.net/jackgo73/article/details/125522422