postgresql源码学习（40）—— 崩溃恢复② - 恢复起点

一、获取恢复起点

1. 默认的恢复起点

通常，崩溃恢复的起点是最近一次检查点，这个位置保存在控制文件中。在之前创建检查点的函数中我们也看到，每次检查点创建时都会刷新控制文件中的信息。

以下代码位于StartupXLOG函数（xlog.c）


/* Get the last valid checkpoint record. 
从控制文件获取检查点及redo点位置，从XLOG读取检查点记录 
*/
        checkPointLoc = ControlFile->checkPoint;
        RedoStartLSN = ControlFile->checkPointCopy.redo;
        record = ReadCheckpointRecord(xlogreader, checkPointLoc, 1, true);
 
/* 如果读到，输出debug信息 */
        if (record != NULL)
        {
            ereport(DEBUG1,
                    (errmsg_internal("checkpoint record is at %X/%X",
                                     LSN_FORMAT_ARGS(checkPointLoc))));
        }
        else
        {
            /* 如果读不到，直接报错。旧版本在非standby模式下会尝试再往前读取一个检查点，新版本为了简化，去掉了这一项 */
            ereport(PANIC,
                    (errmsg("could not locate a valid checkpoint record")));
        }

2. 特殊情况

前面提到过排他模式备份会创建backup_label文件，如果该文件存在，则优先从该文件获取检查点信息，作为故障恢复起点。


if (read_backup_label(&checkPointLoc, &backupEndRequired,
                          &backupFromStandby))
    {
        List       *tablespaces = NIL;
 
        /*
         * Archive recovery was requested, and thanks to the backup label file, we know how far we need to replay to reach consistency. Enter archive recovery directly.
         */
        InArchiveRecovery = true;
        if (StandbyModeRequested)
            StandbyMode = true;
 
        /*
         * When a backup_label file is present, we want to roll forward from the checkpoint it identifies, rather than using pg_control.
          如果backup_label文件存在，则优先从该文件而非控制文件获取检查点信息。
         */
        record = ReadCheckpointRecord(xlogreader, checkPointLoc, 0, true);
 
/* 虽然backup_label文件中记录了检查点信息，但有可能其所在WAL日志已被清理（尤其是备份时间太长的时候），因此XLOG中有可能读不到对应信息 */
 
/* 如果读到检查点信息 */
        if (record != NULL)
        {
            memcpy(&checkPoint, XLogRecGetData(xlogreader), sizeof(CheckPoint));
            wasShutdown = ((record->xl_info & ~XLR_INFO_MASK) == XLOG_CHECKPOINT_SHUTDOWN);
            ereport(DEBUG1,
                    (errmsg_internal("checkpoint record is at %X/%X",
                                     LSN_FORMAT_ARGS(checkPointLoc))));
            InRecovery = true;  /* force recovery even if SHUTDOWNED */
 
            /*
             * Make sure that REDO location exists. This may not be the a backup_label around that references a WAL segment that's already been archived. 即使读到了检查点信息，也可能读不到redo信息（略早于检查点时间）
             */
            if (checkPoint.redo < checkPointLoc)
            {
                XLogBeginRead(xlogreader, checkPoint.redo);
                if (!ReadRecord(xlogreader, LOG, false))
                    ereport(FATAL,
                            (errmsg("could not find redo location referenced by checkpoint record"),
                             errhint("If you are restoring from a backup, touch \"%s/recovery.signal\" and add required recovery options.\n"
                                     "If you are not restoring from a backup, try removing the file \"%s/backup_label\".\n"
                                     "Be careful: removing \"%s/backup_label\" will result in a corrupt cluster if restoring from a backup.",
                                     DataDir, DataDir, DataDir)));
            }
        }
/* 如果读不到检查点信息 */
        else
        {
            ereport(FATAL,
                    (errmsg("could not locate required checkpoint record"),
                     errhint("If you are restoring from a backup, touch \"%s/recovery.signal\" and add required recovery options.\n"
                             "If you are not restoring from a backup, try removing the file \"%s/backup_label\".\n"
                             "Be careful: removing \"%s/backup_label\" will result in a corrupt cluster if restoring from a backup.",
                             DataDir, DataDir, DataDir)));
            wasShutdown = false;    /* keep compiler quiet */
        }

如果有backup_label文件，但又无法获取到检查点或者redo点信息，数据库启动会报错。

根据error提示：

如果是在从备份恢复数据，则创建recovery.signal文件并添加必要的恢复选项
如果不是从备份恢复数据，则删除backup_label文件
注意，如果删除了backup_label文件，对应的那个排他备份是不能用于恢复的，相当于备份失败了

3. 重要变量

进入下一部分前，再重点看几个变量，它们会在后面的代码中频繁出现：

InRecovery：如果为true，应该理解为进程正在replay日志记录，而不是系统正处于恢复模式，后者应该通过RecoveryInProgress() 确定。
ArchiveRecoveryRequested：请求进行归档日志恢复
InArchiveRecovery：若为true，说明当前在使用归档日志恢复（通常在执行PITR、或者是从库）；若为false，说明当前仅使用pg_wal目录中的wal日志进行恢复（通常是崩溃恢复阶段）


/*
 * Are we doing recovery from XLOG?
 *
 * This is only ever true in the startup process; it should be read as meaning
 * "this process is replaying WAL records", rather than "the system is in
 * recovery mode".  It should be examined primarily by functions that need
 * to act differently when called from a WAL redo function (e.g., to skip WAL
 * logging).  To check whether the system is in recovery regardless of which
 * process you're running in, use RecoveryInProgress() but only after shared
 * memory startup and lock initialization.
 */
bool        InRecovery = false;


/*
 * When ArchiveRecoveryRequested is set, archive recovery was requested,
 * ie. signal files were present. When InArchiveRecovery is set, we are
 * currently recovering using offline XLOG archives. These variables are only
 * valid in the startup process.
 *
 * When ArchiveRecoveryRequested is true, but InArchiveRecovery is false, we're
 * currently performing crash recovery using only XLOG files in pg_wal, but
 * will switch to using offline XLOG archives as soon as we reach the end of
 * WAL in pg_wal.
*/
bool        ArchiveRecoveryRequested = false;
bool        InArchiveRecovery = false;

二、进入恢复模式

从if (InRecovery) 部分开始，真正开始日志应用。首先会更新控制文件，说明当前进入了Recovery模式，并将读到的检查点信息也保存到控制文件。


/* REDO开始 */
    if (InRecovery)
    {
        int         rmid;
 
        /*
         * Update pg_control to show that we are recovering and to show the selected checkpoint as the place we are starting from. We also mark
pg_control with any minimum recovery stop point obtained from a backup history file.  
更新控制文件状态以示我们正在恢复，并且展示我们选择作为恢复起点的检查点位置。另外还会用从备份历史文件获取的最小恢复结束位置（minimum recovery stop point）标记控制文件
         */
 
        /* 先保存控制文件中的状态，然后更新 */
        dbstate_at_startup = ControlFile->state;
        /* 如果在使用归档日志进行恢复（PITR或者从库），更新状态 */
        if (InArchiveRecovery)
        {
            ControlFile->state = DB_IN_ARCHIVE_RECOVERY;
 
            SpinLockAcquire(&XLogCtl->info_lck);
            XLogCtl->SharedRecoveryState = RECOVERY_STATE_ARCHIVE;
            SpinLockRelease(&XLogCtl->info_lck);
        }
        /* 否则，如果在使用WAL文件进行恢复（崩溃恢复） */
        else
        {
            ereport(LOG,
                    (errmsg("database system was not properly shut down; "
                            "automatic recovery in progress")));
 
            /* 如果指定了目标时间线，且大于控制文件中记录的当前时间线，则记录日志信息并更新时间线 */
            if (recoveryTargetTLI > ControlFile->checkPointCopy.ThisTimeLineID)
                ereport(LOG,
                        (errmsg("crash recovery starts in timeline %u "
                                "and has target timeline %u",
                                ControlFile->checkPointCopy.ThisTimeLineID,
                                recoveryTargetTLI)));
            /* 修改控制文件状态，以示在进行崩溃恢复 */
            ControlFile->state = DB_IN_CRASH_RECOVERY;
 
            SpinLockAcquire(&XLogCtl->info_lck);
            XLogCtl->SharedRecoveryState = RECOVERY_STATE_CRASH;
            SpinLockRelease(&XLogCtl->info_lck);
        }
 
        /* 更新控制文件中检查点信息 */
        ControlFile->checkPoint = checkPointLoc;
        ControlFile->checkPointCopy = checkPoint;
        if (InArchiveRecovery)
        {
            /* initialize minRecoveryPoint if not set yet，最小恢复点不应该小于重做点 */
            if (ControlFile->minRecoveryPoint < checkPoint.redo)
            {
                ControlFile->minRecoveryPoint = checkPoint.redo;
                ControlFile->minRecoveryPointTLI = checkPoint.ThisTimeLineID;
            }
        }
 
        /* 如果有backup_label文件 */
        if (haveBackupLabel)
        {
            ControlFile->backupStartPoint = checkPoint.redo;
            ControlFile->backupEndRequired = backupEndRequired;
 
            /* 如果是从从库备份的 */
            if (backupFromStandby)
            {
                /* 只可能是以下两种状态，若不是则报错 */
                if (dbstate_at_startup != DB_IN_ARCHIVE_RECOVERY &&
                    dbstate_at_startup != DB_SHUTDOWNED_IN_RECOVERY)
                    ereport(FATAL,
                            (errmsg("backup_label contains data inconsistent with control file"),
                             errhint("This means that the backup is corrupted and you will "
                                     "have to use another backup for recovery.")));
                /* 若是，则更新备份结束点 */
                ControlFile->backupEndPoint = ControlFile->minRecoveryPoint;
            }
        }
        /* 更新时间 */
        ControlFile->time = (pg_time_t) time(NULL);
        /* No need to hold ControlFileLock yet, we aren't up far enough */
        UpdateControlFile();
…

参考

《PostgreSQL技术内幕：事务处理深度探索》第4章

https://blog.csdn.net/asmartkiller/article/details/121245772

https://blog.nowcoder.net/n/a21fd782200e4f9f9054e66898bbccf4

相关阅读:
WLAN学习笔记
 【leetCode:剑指 Offer】20. 表示数值的字符串
 理性推广 | C1N短网址帮您节省运营成本！
gateway之断言的使用详解
 IDE工具(48) idea常用插件
 NRK3301语音芯片在智能窗帘上的应用
 常用的shell命令
 在unity 2020 urp版本中实现 LensFlare功能
 供货商与社群团购平台对接时的8大误区，你都知道吗？
前端工程化精讲第六课团队工具：如何利用云开发提升团队开发效率？
原文地址：https://blog.csdn.net/Hehuyi_In/article/details/126433746