• postgresql源码学习(40)—— 崩溃恢复② - 恢复起点


    一、 获取恢复起点

    1. 默认的恢复起点

           通常,崩溃恢复的起点是最近一次检查点,这个位置保存在控制文件中。在之前创建检查点的函数中我们也看到,每次检查点创建时都会刷新控制文件中的信息。

           以下代码位于StartupXLOG函数(xlog.c

    1. /* Get the last valid checkpoint record.
    2. 从控制文件获取检查点及redo点位置,从XLOG读取检查点记录
    3. */
    4. checkPointLoc = ControlFile->checkPoint;
    5. RedoStartLSN = ControlFile->checkPointCopy.redo;
    6. record = ReadCheckpointRecord(xlogreader, checkPointLoc, 1, true);
    7. /* 如果读到,输出debug信息 */
    8. if (record != NULL)
    9. {
    10. ereport(DEBUG1,
    11. (errmsg_internal("checkpoint record is at %X/%X",
    12. LSN_FORMAT_ARGS(checkPointLoc))));
    13. }
    14. else
    15. {
    16. /* 如果读不到,直接报错。旧版本在非standby模式下会尝试再往前读取一个检查点,新版本为了简化,去掉了这一项 */
    17. ereport(PANIC,
    18. (errmsg("could not locate a valid checkpoint record")));
    19. }

    2. 特殊情况

            前面提到过排他模式备份会创建backup_label文件,如果该文件存在,则优先从该文件获取检查点信息,作为故障恢复起点。

    1. if (read_backup_label(&checkPointLoc, &backupEndRequired,
    2. &backupFromStandby))
    3. {
    4. List *tablespaces = NIL;
    5. /*
    6. * Archive recovery was requested, and thanks to the backup label file, we know how far we need to replay to reach consistency. Enter archive recovery directly.
    7. */
    8. InArchiveRecovery = true;
    9. if (StandbyModeRequested)
    10. StandbyMode = true;
    11. /*
    12. * When a backup_label file is present, we want to roll forward from the checkpoint it identifies, rather than using pg_control.
    13. 如果backup_label文件存在,则优先从该文件而非控制文件获取检查点信息。
    14. */
    15. record = ReadCheckpointRecord(xlogreader, checkPointLoc, 0, true);
    16. /* 虽然backup_label文件中记录了检查点信息,但有可能其所在WAL日志已被清理(尤其是备份时间太长的时候),因此XLOG中有可能读不到对应信息 */
    17. /* 如果读到检查点信息 */
    18. if (record != NULL)
    19. {
    20. memcpy(&checkPoint, XLogRecGetData(xlogreader), sizeof(CheckPoint));
    21. wasShutdown = ((record->xl_info & ~XLR_INFO_MASK) == XLOG_CHECKPOINT_SHUTDOWN);
    22. ereport(DEBUG1,
    23. (errmsg_internal("checkpoint record is at %X/%X",
    24. LSN_FORMAT_ARGS(checkPointLoc))));
    25. InRecovery = true; /* force recovery even if SHUTDOWNED */
    26. /*
    27. * Make sure that REDO location exists. This may not be the a backup_label around that references a WAL segment that's already been archived. 即使读到了检查点信息,也可能读不到redo信息(略早于检查点时间)
    28. */
    29. if (checkPoint.redo < checkPointLoc)
    30. {
    31. XLogBeginRead(xlogreader, checkPoint.redo);
    32. if (!ReadRecord(xlogreader, LOG, false))
    33. ereport(FATAL,
    34. (errmsg("could not find redo location referenced by checkpoint record"),
    35. errhint("If you are restoring from a backup, touch \"%s/recovery.signal\" and add required recovery options.\n"
    36. "If you are not restoring from a backup, try removing the file \"%s/backup_label\".\n"
    37. "Be careful: removing \"%s/backup_label\" will result in a corrupt cluster if restoring from a backup.",
    38. DataDir, DataDir, DataDir)));
    39. }
    40. }
    41. /* 如果读不到检查点信息 */
    42. else
    43. {
    44. ereport(FATAL,
    45. (errmsg("could not locate required checkpoint record"),
    46. errhint("If you are restoring from a backup, touch \"%s/recovery.signal\" and add required recovery options.\n"
    47. "If you are not restoring from a backup, try removing the file \"%s/backup_label\".\n"
    48. "Be careful: removing \"%s/backup_label\" will result in a corrupt cluster if restoring from a backup.",
    49. DataDir, DataDir, DataDir)));
    50. wasShutdown = false; /* keep compiler quiet */
    51. }

    如果有backup_label文件,但又无法获取到检查点或者redo点信息,数据库启动会报错。

    根据error提示:

    • 如果是在从备份恢复数据,则创建recovery.signal文件并添加必要的恢复选项
    • 如果不是从备份恢复数据,则删除backup_label文件
    • 注意,如果删除了backup_label文件,对应的那个排他备份是不能用于恢复的,相当于备份失败了

    3. 重要变量

    进入下一部分前,再重点看几个变量,它们会在后面的代码中频繁出现:

    • InRecovery:如果为true,应该理解为进程正在replay日志记录,而不是系统正处于恢复模式,后者应该通过RecoveryInProgress() 确定。
    • ArchiveRecoveryRequested:请求进行归档日志恢复
    • InArchiveRecovery:若为true,说明当前在使用归档日志恢复(通常在执行PITR、或者是从库);若为false,说明当前仅使用pg_wal目录中的wal日志进行恢复(通常是崩溃恢复阶段)
    1. /*
    2. * Are we doing recovery from XLOG?
    3. *
    4. * This is only ever true in the startup process; it should be read as meaning
    5. * "this process is replaying WAL records", rather than "the system is in
    6. * recovery mode". It should be examined primarily by functions that need
    7. * to act differently when called from a WAL redo function (e.g., to skip WAL
    8. * logging). To check whether the system is in recovery regardless of which
    9. * process you're running in, use RecoveryInProgress() but only after shared
    10. * memory startup and lock initialization.
    11. */
    12. bool InRecovery = false;
    1. /*
    2. * When ArchiveRecoveryRequested is set, archive recovery was requested,
    3. * ie. signal files were present. When InArchiveRecovery is set, we are
    4. * currently recovering using offline XLOG archives. These variables are only
    5. * valid in the startup process.
    6. *
    7. * When ArchiveRecoveryRequested is true, but InArchiveRecovery is false, we're
    8. * currently performing crash recovery using only XLOG files in pg_wal, but
    9. * will switch to using offline XLOG archives as soon as we reach the end of
    10. * WAL in pg_wal.
    11. */
    12. bool ArchiveRecoveryRequested = false;
    13. bool InArchiveRecovery = false;

    二、 进入恢复模式

         从if (InRecovery) 部分开始,真正开始日志应用。首先会更新控制文件,说明当前进入了Recovery模式,并将读到的检查点信息也保存到控制文件。

    1. /* REDO开始 */
    2. if (InRecovery)
    3. {
    4. int rmid;
    5. /*
    6. * Update pg_control to show that we are recovering and to show the selected checkpoint as the place we are starting from. We also mark
    7. pg_control with any minimum recovery stop point obtained from a backup history file.
    8. 更新控制文件状态以示我们正在恢复,并且展示我们选择作为恢复起点的检查点位置。另外还会用从备份历史文件获取的最小恢复结束位置(minimum recovery stop point)标记控制文件
    9. */
    10. /* 先保存控制文件中的状态,然后更新 */
    11. dbstate_at_startup = ControlFile->state;
    12. /* 如果在使用归档日志进行恢复(PITR或者从库),更新状态 */
    13. if (InArchiveRecovery)
    14. {
    15. ControlFile->state = DB_IN_ARCHIVE_RECOVERY;
    16. SpinLockAcquire(&XLogCtl->info_lck);
    17. XLogCtl->SharedRecoveryState = RECOVERY_STATE_ARCHIVE;
    18. SpinLockRelease(&XLogCtl->info_lck);
    19. }
    20. /* 否则,如果在使用WAL文件进行恢复(崩溃恢复) */
    21. else
    22. {
    23. ereport(LOG,
    24. (errmsg("database system was not properly shut down; "
    25. "automatic recovery in progress")));
    26. /* 如果指定了目标时间线,且大于控制文件中记录的当前时间线,则记录日志信息并更新时间线 */
    27. if (recoveryTargetTLI > ControlFile->checkPointCopy.ThisTimeLineID)
    28. ereport(LOG,
    29. (errmsg("crash recovery starts in timeline %u "
    30. "and has target timeline %u",
    31. ControlFile->checkPointCopy.ThisTimeLineID,
    32. recoveryTargetTLI)));
    33. /* 修改控制文件状态,以示在进行崩溃恢复 */
    34. ControlFile->state = DB_IN_CRASH_RECOVERY;
    35. SpinLockAcquire(&XLogCtl->info_lck);
    36. XLogCtl->SharedRecoveryState = RECOVERY_STATE_CRASH;
    37. SpinLockRelease(&XLogCtl->info_lck);
    38. }
    39. /* 更新控制文件中检查点信息 */
    40. ControlFile->checkPoint = checkPointLoc;
    41. ControlFile->checkPointCopy = checkPoint;
    42. if (InArchiveRecovery)
    43. {
    44. /* initialize minRecoveryPoint if not set yet,最小恢复点不应该小于重做点 */
    45. if (ControlFile->minRecoveryPoint < checkPoint.redo)
    46. {
    47. ControlFile->minRecoveryPoint = checkPoint.redo;
    48. ControlFile->minRecoveryPointTLI = checkPoint.ThisTimeLineID;
    49. }
    50. }
    51. /* 如果有backup_label文件 */
    52. if (haveBackupLabel)
    53. {
    54. ControlFile->backupStartPoint = checkPoint.redo;
    55. ControlFile->backupEndRequired = backupEndRequired;
    56. /* 如果是从从库备份的 */
    57. if (backupFromStandby)
    58. {
    59. /* 只可能是以下两种状态,若不是则报错 */
    60. if (dbstate_at_startup != DB_IN_ARCHIVE_RECOVERY &&
    61. dbstate_at_startup != DB_SHUTDOWNED_IN_RECOVERY)
    62. ereport(FATAL,
    63. (errmsg("backup_label contains data inconsistent with control file"),
    64. errhint("This means that the backup is corrupted and you will "
    65. "have to use another backup for recovery.")));
    66. /* 若是,则更新备份结束点 */
    67. ControlFile->backupEndPoint = ControlFile->minRecoveryPoint;
    68. }
    69. }
    70. /* 更新时间 */
    71. ControlFile->time = (pg_time_t) time(NULL);
    72. /* No need to hold ControlFileLock yet, we aren't up far enough */
    73. UpdateControlFile();

    参考

    PostgreSQL技术内幕:事务处理深度探索》第4

    https://blog.csdn.net/asmartkiller/article/details/121245772

    https://blog.nowcoder.net/n/a21fd782200e4f9f9054e66898bbccf4

  • 相关阅读:
    WLAN学习笔记
    【leetCode:剑指 Offer】20. 表示数值的字符串
    理性推广 | C1N短网址帮您节省运营成本!
    gateway之断言的使用详解
    IDE工具(48) idea常用插件
    NRK3301语音芯片在智能窗帘上的应用
    常用的shell命令
    在unity 2020 urp版本中实现 LensFlare功能
    供货商与社群团购平台对接时的8大误区,你都知道吗?
    前端工程化精讲第六课 团队工具:如何利用云开发提升团队开发效率?
  • 原文地址:https://blog.csdn.net/Hehuyi_In/article/details/126433746