通常,崩溃恢复的起点是最近一次检查点,这个位置保存在控制文件中。在之前创建检查点的函数中我们也看到,每次检查点创建时都会刷新控制文件中的信息。
以下代码位于StartupXLOG函数(xlog.c)
- /* Get the last valid checkpoint record.
- 从控制文件获取检查点及redo点位置,从XLOG读取检查点记录
- */
- checkPointLoc = ControlFile->checkPoint;
- RedoStartLSN = ControlFile->checkPointCopy.redo;
- record = ReadCheckpointRecord(xlogreader, checkPointLoc, 1, true);
-
- /* 如果读到,输出debug信息 */
- if (record != NULL)
- {
- ereport(DEBUG1,
- (errmsg_internal("checkpoint record is at %X/%X",
- LSN_FORMAT_ARGS(checkPointLoc))));
- }
- else
- {
- /* 如果读不到,直接报错。旧版本在非standby模式下会尝试再往前读取一个检查点,新版本为了简化,去掉了这一项 */
- ereport(PANIC,
- (errmsg("could not locate a valid checkpoint record")));
- }
前面提到过排他模式备份会创建backup_label文件,如果该文件存在,则优先从该文件获取检查点信息,作为故障恢复起点。
- if (read_backup_label(&checkPointLoc, &backupEndRequired,
- &backupFromStandby))
- {
- List *tablespaces = NIL;
-
- /*
- * Archive recovery was requested, and thanks to the backup label file, we know how far we need to replay to reach consistency. Enter archive recovery directly.
- */
- InArchiveRecovery = true;
- if (StandbyModeRequested)
- StandbyMode = true;
-
- /*
- * When a backup_label file is present, we want to roll forward from the checkpoint it identifies, rather than using pg_control.
- 如果backup_label文件存在,则优先从该文件而非控制文件获取检查点信息。
- */
- record = ReadCheckpointRecord(xlogreader, checkPointLoc, 0, true);
-
- /* 虽然backup_label文件中记录了检查点信息,但有可能其所在WAL日志已被清理(尤其是备份时间太长的时候),因此XLOG中有可能读不到对应信息 */
-
- /* 如果读到检查点信息 */
- if (record != NULL)
- {
- memcpy(&checkPoint, XLogRecGetData(xlogreader), sizeof(CheckPoint));
- wasShutdown = ((record->xl_info & ~XLR_INFO_MASK) == XLOG_CHECKPOINT_SHUTDOWN);
- ereport(DEBUG1,
- (errmsg_internal("checkpoint record is at %X/%X",
- LSN_FORMAT_ARGS(checkPointLoc))));
- InRecovery = true; /* force recovery even if SHUTDOWNED */
-
- /*
- * Make sure that REDO location exists. This may not be the a backup_label around that references a WAL segment that's already been archived. 即使读到了检查点信息,也可能读不到redo信息(略早于检查点时间)
- */
- if (checkPoint.redo < checkPointLoc)
- {
- XLogBeginRead(xlogreader, checkPoint.redo);
- if (!ReadRecord(xlogreader, LOG, false))
- ereport(FATAL,
- (errmsg("could not find redo location referenced by checkpoint record"),
- errhint("If you are restoring from a backup, touch \"%s/recovery.signal\" and add required recovery options.\n"
- "If you are not restoring from a backup, try removing the file \"%s/backup_label\".\n"
- "Be careful: removing \"%s/backup_label\" will result in a corrupt cluster if restoring from a backup.",
- DataDir, DataDir, DataDir)));
- }
- }
- /* 如果读不到检查点信息 */
- else
- {
- ereport(FATAL,
- (errmsg("could not locate required checkpoint record"),
- errhint("If you are restoring from a backup, touch \"%s/recovery.signal\" and add required recovery options.\n"
- "If you are not restoring from a backup, try removing the file \"%s/backup_label\".\n"
- "Be careful: removing \"%s/backup_label\" will result in a corrupt cluster if restoring from a backup.",
- DataDir, DataDir, DataDir)));
- wasShutdown = false; /* keep compiler quiet */
- }
如果有backup_label文件,但又无法获取到检查点或者redo点信息,数据库启动会报错。
根据error提示:
进入下一部分前,再重点看几个变量,它们会在后面的代码中频繁出现:
- /*
- * Are we doing recovery from XLOG?
- *
- * This is only ever true in the startup process; it should be read as meaning
- * "this process is replaying WAL records", rather than "the system is in
- * recovery mode". It should be examined primarily by functions that need
- * to act differently when called from a WAL redo function (e.g., to skip WAL
- * logging). To check whether the system is in recovery regardless of which
- * process you're running in, use RecoveryInProgress() but only after shared
- * memory startup and lock initialization.
- */
- bool InRecovery = false;
- /*
- * When ArchiveRecoveryRequested is set, archive recovery was requested,
- * ie. signal files were present. When InArchiveRecovery is set, we are
- * currently recovering using offline XLOG archives. These variables are only
- * valid in the startup process.
- *
- * When ArchiveRecoveryRequested is true, but InArchiveRecovery is false, we're
- * currently performing crash recovery using only XLOG files in pg_wal, but
- * will switch to using offline XLOG archives as soon as we reach the end of
- * WAL in pg_wal.
- */
- bool ArchiveRecoveryRequested = false;
- bool InArchiveRecovery = false;
从if (InRecovery) 部分开始,真正开始日志应用。首先会更新控制文件,说明当前进入了Recovery模式,并将读到的检查点信息也保存到控制文件。
- /* REDO开始 */
- if (InRecovery)
- {
- int rmid;
-
- /*
- * Update pg_control to show that we are recovering and to show the selected checkpoint as the place we are starting from. We also mark
- pg_control with any minimum recovery stop point obtained from a backup history file.
- 更新控制文件状态以示我们正在恢复,并且展示我们选择作为恢复起点的检查点位置。另外还会用从备份历史文件获取的最小恢复结束位置(minimum recovery stop point)标记控制文件
- */
-
- /* 先保存控制文件中的状态,然后更新 */
- dbstate_at_startup = ControlFile->state;
- /* 如果在使用归档日志进行恢复(PITR或者从库),更新状态 */
- if (InArchiveRecovery)
- {
- ControlFile->state = DB_IN_ARCHIVE_RECOVERY;
-
- SpinLockAcquire(&XLogCtl->info_lck);
- XLogCtl->SharedRecoveryState = RECOVERY_STATE_ARCHIVE;
- SpinLockRelease(&XLogCtl->info_lck);
- }
- /* 否则,如果在使用WAL文件进行恢复(崩溃恢复) */
- else
- {
- ereport(LOG,
- (errmsg("database system was not properly shut down; "
- "automatic recovery in progress")));
-
- /* 如果指定了目标时间线,且大于控制文件中记录的当前时间线,则记录日志信息并更新时间线 */
- if (recoveryTargetTLI > ControlFile->checkPointCopy.ThisTimeLineID)
- ereport(LOG,
- (errmsg("crash recovery starts in timeline %u "
- "and has target timeline %u",
- ControlFile->checkPointCopy.ThisTimeLineID,
- recoveryTargetTLI)));
- /* 修改控制文件状态,以示在进行崩溃恢复 */
- ControlFile->state = DB_IN_CRASH_RECOVERY;
-
- SpinLockAcquire(&XLogCtl->info_lck);
- XLogCtl->SharedRecoveryState = RECOVERY_STATE_CRASH;
- SpinLockRelease(&XLogCtl->info_lck);
- }
-
- /* 更新控制文件中检查点信息 */
- ControlFile->checkPoint = checkPointLoc;
- ControlFile->checkPointCopy = checkPoint;
- if (InArchiveRecovery)
- {
- /* initialize minRecoveryPoint if not set yet,最小恢复点不应该小于重做点 */
- if (ControlFile->minRecoveryPoint < checkPoint.redo)
- {
- ControlFile->minRecoveryPoint = checkPoint.redo;
- ControlFile->minRecoveryPointTLI = checkPoint.ThisTimeLineID;
- }
- }
-
- /* 如果有backup_label文件 */
- if (haveBackupLabel)
- {
- ControlFile->backupStartPoint = checkPoint.redo;
- ControlFile->backupEndRequired = backupEndRequired;
-
- /* 如果是从从库备份的 */
- if (backupFromStandby)
- {
- /* 只可能是以下两种状态,若不是则报错 */
- if (dbstate_at_startup != DB_IN_ARCHIVE_RECOVERY &&
- dbstate_at_startup != DB_SHUTDOWNED_IN_RECOVERY)
- ereport(FATAL,
- (errmsg("backup_label contains data inconsistent with control file"),
- errhint("This means that the backup is corrupted and you will "
- "have to use another backup for recovery.")));
- /* 若是,则更新备份结束点 */
- ControlFile->backupEndPoint = ControlFile->minRecoveryPoint;
- }
- }
- /* 更新时间 */
- ControlFile->time = (pg_time_t) time(NULL);
- /* No need to hold ControlFileLock yet, we aren't up far enough */
- UpdateControlFile();
- …
参考
《PostgreSQL技术内幕:事务处理深度探索》第4章
https://blog.csdn.net/asmartkiller/article/details/121245772
https://blog.nowcoder.net/n/a21fd782200e4f9f9054e66898bbccf4