• postgresql源码学习(49)—— MVCC⑤-cmin与cmax 同事务内的可见性判断


    一、 难以理解的场景

    postgresql源码学习(十九)—— MVCC④-可见性判断 HeapTupleSatisfiesMVCC函数_Hehuyi_In的博客-CSDN博客

           在前篇的可见性判断中有个一直没想明白的问题 —— 本事务插入的数据,什么场景可能会出现去查询获取快照后插入的数据这种情况?因为理论上事务内顺序的插入和查询,查的都是快照插入前的数据。今天终于发现了答案,愉快地记录下来~ 

    经过好多文章和资料的查找,终于查到了一个案例 —— 游标,下面案例参考自

    一文搞懂费解的cmin和cmax

           游标在声明时就获取一个快照,后续对于表的更改不影响到游标的取值,所以这也存在类似不同事务交替执行产生的数据可见性的问题。如下示例,Fetch游标时看到的是声明游标时的数据快照而不是Fetch执行时,即声明游标后对数据的变更对该游标不可见

    1. create table cur_test(a int);
    2. begin;
    3. insert into cur_test values(1);
    4. select xmin,xmax,cmin,cmax,a from cur_test;
    5. declare mycursor cursor for select xmin,xmax,cmin,cmax,a from cur_test ;
    6. insert into cur_test values(2);
    7. fetch all from mycursor;
    8. select xmin,xmax,cmin,cmax,a from cur_test;

    后来在postgresql_internals书中也找到了这种场景的介绍

            a cursor that was opened at a particular point in time must not see any changes that happened later, regardless of the isolation level.
            To address such situations, tuple headers provide a special field (displayed as cmin and cmax pseudocolumns) that shows the sequence number of the operation within the transaction.

    二、 cid是什么

           在不同事务中,可以根据xmin和xmax判断事务可见性。那同一个事务内怎么办?就需要用到cid(command id),cid 表示在该事务中,执行当前dml前还执行过几条dml,因此cid越大,写入顺序越靠后。 再回顾一下它的定义:

    postgresql源码学习(十六)—— MVCC①-元组上的版本信息_Hehuyi_In的博客-CSDN博客_pgsql 怎么查看元组中的txid

    1. /*
    2. * We store five "virtual" fields Xmin, Cmin, Xmax, Cmax, and Xvac in three
    3. * physical fields. Xmin and Xmax are always really stored, but Cmin, Cmax
    4. * and Xvac share a field. This works because we know that Cmin and Cmax
    5. * are only interesting for the lifetime of the inserting and deleting
    6. * transaction respectively. If a tuple is inserted and deleted in the same
    7. * transaction, we store a "combo" command id that can be mapped to the real
    8. * cmin and cmax, but only by use of local state within the originating
    9. * backend. See combocid.c for more details. Meanwhile, Xvac is only set by
    10. * old-style VACUUM FULL, which does not have any command sub-structure and so
    11. * does not need either Cmin or Cmax. (This requires that old-style VACUUM
    12. * FULL never try to move a tuple whose Cmin or Cmax is still interesting,
    13. * ie, an insert-in-progress or delete-in-progress tuple.)
    14. */
    15. typedef struct HeapTupleFields
    16. {
    17. TransactionId t_xmin; /* inserting xact ID */
    18. TransactionId t_xmax; /* deleting or locking xact ID */
    19. union
    20. {
    21. CommandId t_cid; /* inserting or deleting command ID, or both */
    22. TransactionId t_xvac; /* old-style VACUUM FULL xact ID */
    23. } t_field3;
    24. } HeapTupleFields;

    可以看到,虽然在前面的select中它是cmin,cmax两个值,但在源代码中,就是一个t_cid。

    主要特点如下:

    • cmin和cmax则分别代表插入、删除操作
    • 8.4以前,cmincmax分开存储,后来考虑到对同一事务内的同一行既执行插入又执行删除情况不多见,所以就合并成了一个字段,节省资源
    • 如果真的出现同一事务中对同一行既执行插入、又执行删除,则通过combo cid 查找真正的cmin和cmax
    • cmincmax始终是一致的,因为本质上都是 t_cid 字段
    • vacuum full只设置t_xvac字段,而不需要cmincmax,可以合并存储两者,进一步节省资源
    • CommandIduint32类型,最大支持2^32 - 1个命令,为了节省资源,查询不会增加cid(类似事务id分配)

    三、 获取cmin与cmax

    1. HeapTupleHeaderGetCmin和HeapTupleHeaderGetCmax

          在HeapTupleSatisfiesMVCC函数中,获取元组cmin与cmax相关的主要是HeapTupleHeaderGetCmin和HeapTupleHeaderGetCmax两个函数。

    1. /*
    2. * GetCmin and GetCmax assert that they are only called in situations where
    3. * they make sense, that is, can deliver a useful answer. If you have
    4. * reason to examine a tuple's t_cid field from a transaction other than
    5. * the originating one, use HeapTupleHeaderGetRawCommandId() directly.
    6. */
    7. CommandId
    8. HeapTupleHeaderGetCmin(HeapTupleHeader tup)
    9. {
    10. CommandId cid = HeapTupleHeaderGetRawCommandId(tup);
    11. Assert(!(tup->t_infomask & HEAP_MOVED));
    12. Assert(TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tup)));
    13. if (tup->t_infomask & HEAP_COMBOCID)
    14. return GetRealCmin(cid);
    15. else
    16. return cid;
    17. }
    1. CommandId
    2. HeapTupleHeaderGetCmax(HeapTupleHeader tup)
    3. {
    4. CommandId cid = HeapTupleHeaderGetRawCommandId(tup);
    5. Assert(!(tup->t_infomask & HEAP_MOVED));
    6. /*
    7. * Because GetUpdateXid() performs memory allocations if xmax is a
    8. * multixact we can't Assert() if we're inside a critical section. This
    9. * weakens the check, but not using GetCmax() inside one would complicate
    10. * things too much.
    11. */
    12. Assert(CritSectionCount > 0 ||
    13. TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetUpdateXid(tup)));
    14. if (tup->t_infomask & HEAP_COMBOCID)
    15. return GetRealCmax(cid);
    16. else
    17. return cid;
    18. }

    可以看到,它们很类似:

    • 对于非COMBOCID,核心函数都是 HeapTupleHeaderGetRawCommandId
    • COMBOCID使用的是标记位t_infomask判断
    • COMBOCID情况下分别使用GetRealCmin和GetRealCmax
    • 无论哪个函数,最后返回的都是cid一个字段,因此cmincmax始终是一致的

    2. HeapTupleHeaderGetRawCommandId函数

    可以看到,其实就是去获取元组的t_cid

    https://img-blog.csdnimg.cn/2c622678070f4fbbb0d37fa3cec1cda4.png

    1. /*
    2. * HeapTupleHeaderGetRawCommandId will give you what's in the header whether
    3. * it is useful or not. Most code should use HeapTupleHeaderGetCmin or
    4. * HeapTupleHeaderGetCmax instead, but note that those Assert that you can
    5. * get a legitimate(合法的) result, ie you are in the originating transaction!
    6. */
    7. #define HeapTupleHeaderGetRawCommandId(tup) \
    8. ( \
    9. (tup)->t_choice.t_heap.t_field3.t_cid \
    10. )

    四、 combo cid

    1. HEAP_COMBOCID标记位

    从前面代码可以看到,COMBOCID使用的是标记位t_infomask判断,我们来验证一下。

    t_infomask值为32,转成16进制就是0x0020,也就是HEAP_COMBOCID

    2. 相关结构体

    以下在combocid.c 文件中

    1. /* Hash table to lookup combo CIDs by cmin and cmax,可以看到是通过哈希表去定位cmin,cmax的 */
    2. static HTAB *comboHash = NULL;
    3. /* Key and entry structures for the hash table,哈希表的key */
    4. typedef struct
    5. {
    6. CommandId cmin;
    7. CommandId cmax;
    8. } ComboCidKeyData;
    9. typedef ComboCidKeyData *ComboCidKey;
    10. /* 哈希表的value */
    11. typedef struct
    12. {
    13. ComboCidKeyData key;
    14. CommandId combocid;
    15. } ComboCidEntryData;
    16. typedef ComboCidEntryData *ComboCidEntry;

    3. GetRealCmin和GetRealCmax函数

    这两个函数相当简单,问题是过于简单好像看不出来什么东西...

    1. static CommandId
    2. GetRealCmin(CommandId combocid)
    3. {
    4. Assert(combocid < usedComboCids);
    5. return comboCids[combocid].cmin;
    6. }
    7. static CommandId
    8. GetRealCmax(CommandId combocid)
    9. {
    10. Assert(combocid < usedComboCids);
    11. return comboCids[combocid].cmax;
    12. }

    其中的usedComboCids和comboCids数组都来自GetComboCommandId函数

    4. GetComboCommandId函数

           从函数注释和调用栈都可以看到,它不是显式调用的,是一个内部函数,这里就有前面提到过的哈希表。

    1. /**** Internal routines 内部流程 ****/
    2. /*
    3. * Get a combo command id that maps to cmin and cmax.
    4. * We try to reuse old combo command ids when possible.
    5. */
    6. static CommandId
    7. GetComboCommandId(CommandId cmin, CommandId cmax)
    8. {
    9. CommandId combocid;
    10. ComboCidKeyData key; // 哈希表key
    11. ComboCidEntry entry; // 哈希表value
    12. bool found;
    13. /*
    14. * Create the hash table and array the first time we need to use combo cids in the transaction.
    15. * 第一次使用时哈希表不存在,需要先创建
    16. */
    17. if (comboHash == NULL)
    18. {
    19. HASHCTL hash_ctl;
    20. /* Make array first; existence of hash table asserts array exists
    21. 可以看到comboCids是ComboCidKeyData类型的数组(哈希表key数组) */
    22. comboCids = (ComboCidKeyData *)
    23. MemoryContextAlloc(TopTransactionContext,
    24. sizeof(ComboCidKeyData) * CCID_ARRAY_SIZE);
    25. sizeComboCids = CCID_ARRAY_SIZE;
    26. usedComboCids = 0;
    27. hash_ctl.keysize = sizeof(ComboCidKeyData);
    28. hash_ctl.entrysize = sizeof(ComboCidEntryData);
    29. hash_ctl.hcxt = TopTransactionContext;
    30. /* 创建哈希表 */
    31. comboHash = hash_create("Combo CIDs",
    32. CCID_HASH_SIZE,
    33. &hash_ctl,
    34. HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
    35. }
    36. /*
    37. * 超过长度则自动扩展数组
    38. */
    39. if (usedComboCids >= sizeComboCids)
    40. {
    41. int newsize = sizeComboCids * 2;
    42. comboCids = (ComboCidKeyData *)
    43. repalloc(comboCids, sizeof(ComboCidKeyData) * newsize);
    44. sizeComboCids = newsize;
    45. }
    46. /* Lookup or create a hash entry with the desired cmin/cmax,根据key值从哈希表中查找cmin/cmax */
    47. /* We assume there is no struct padding in ComboCidKeyData! */
    48. key.cmin = cmin;
    49. key.cmax = cmax;
    50. entry = (ComboCidEntry) hash_search(comboHash,
    51. (void *) &key,
    52. HASH_ENTER,
    53. &found);
    54. if (found)
    55. {
    56. /* Reuse an existing combo CID,找到则返回combocid */
    57. return entry->combocid;
    58. }
    59. /* We have to create a new combo CID; we already made room in the array
    60. 否则需要新建一个combocid并以当前传入的cmin/cmax作为值,同时将计数器usedComboCids加一 */
    61. combocid = usedComboCids;
    62. comboCids[combocid].cmin = cmin;
    63. comboCids[combocid].cmax = cmax;
    64. usedComboCids++;
    65. /* 保存到哈希表的value中并返回 */
    66. entry->combocid = combocid;
    67. return combocid;
    68. }

            因此,前面的GetRealCmin和GetRealCmax函数,只需要一步return comboCids[combocid].cmin和 return comboCids[combocid].cmax 即可从哈希表获取所需值。

    参考

    一文搞懂费解的cmin和cmax

    postgresql源码学习(十六)—— MVCC①-元组上的版本信息_Hehuyi_In的博客-CSDN博客_pgsql 怎么查看元组中的txid

    pg事务篇(三)—— 事务状态与Hint Bits(t_infomask)_Hehuyi_In的博客-CSDN博客

    postgresql源码学习(十九)—— MVCC④-可见性判断 HeapTupleSatisfiesMVCC函数_Hehuyi_In的博客-CSDN博客

  • 相关阅读:
    RTX 4090来了!显卡换新,驱动别拉胯
    Python 使用executemany批量向mysql插入数据
    《Unity Magica Cloth从入门到详解》之(4)MeshCloth网布
    Quartz 基本使用,任务调度quartz详解,动态定时任务,job,Trigger,Scheduler
    【数字IC手撕代码】Verilog轮询仲裁器|题目|原理|设计|仿真
    ES 中时间日期类型 “yyyy-MM-dd HHmmss” 的完全避坑指南
    如何看待Java是一门半编译半解释型的语言(企业真题)
    如何建设水利数字孪生流域
    【机器学习】基础篇--线代基础
    JAVA计算机毕业设计网约车管理系统(附源码、数据库)
  • 原文地址:https://blog.csdn.net/Hehuyi_In/article/details/127955762