如果数据库宕机或者服务器故障,缓存中的脏页可能尚未被刷入磁盘,磁盘中的数据处于不一致状态。在数据库重新启动时,需要借助WAL日志将数据库恢复到一致的状态。
崩溃恢复的核心进程是Startup进程,核心函数是StartupXLOG函数,它在pg启动时就会被调用,读取配置信息,应用WAL日志。
通过读取不同参数,Startup进程可以完成3大功能——崩溃恢复、从库日志应用、PITR(基于时间点的恢复)。可以看到,这3大功能与WAL日志应用都是分不开的。
二、 配置信息读取
用户可以在数据目录下创建standby.signal或者recovery.signal文件,Startup进程会检查这两个文件是否存在:
- /*
- * See if there are any recovery signal files and if so, set state for
- * recovery.
- *
- * See if there is a recovery command file (recovery.conf), and if so
- * throw an ERROR since as of PG12 we no longer recognize that.
- */
- static void
- readRecoverySignalFile(void)
- {
- struct stat stat_buf;
-
- if (IsBootstrapProcessingMode())
- return;
-
- /*
- * Check for old recovery API file: recovery.conf,检查recovery.conf文件,如果有则报错
- */
- if (stat(RECOVERY_COMMAND_FILE, &stat_buf) == 0)
- ereport(FATAL,
- (errcode_for_file_access(),
- errmsg("using recovery command file \"%s\" is not supported",
- RECOVERY_COMMAND_FILE)));
-
- /*
- * Remove unused .done file, if present. Ignore if absent.如果存在.done文件,将其删除
- */
- unlink(RECOVERY_COMMAND_DONE);
-
- /* 若standby.signal存在,将standby_signal_file_found标签设置为true,说明当前数据库是个从库 */
- if (stat(STANDBY_SIGNAL_FILE, &stat_buf) == 0)
- {
- int fd;
-
- fd = BasicOpenFilePerm(STANDBY_SIGNAL_FILE, O_RDWR | PG_BINARY | get_sync_bit(sync_method),
- S_IRUSR | S_IWUSR);
- if (fd >= 0)
- {
- (void) pg_fsync(fd);
- close(fd);
- }
- standby_signal_file_found = true;
- }
- /* 若recovery.signal存在,将recovery_signal_file_found标签设置为true,说明当前在执行PITR */
- else if (stat(RECOVERY_SIGNAL_FILE, &stat_buf) == 0)
- {
- int fd;
-
- fd = BasicOpenFilePerm(RECOVERY_SIGNAL_FILE, O_RDWR | PG_BINARY | get_sync_bit(sync_method),
- S_IRUSR | S_IWUSR);
- if (fd >= 0)
- {
- (void) pg_fsync(fd);
- close(fd);
- }
- recovery_signal_file_found = true;
- }
-
- /* 如果都不是,将下面两个参数设置为false */
- StandbyModeRequested = false;
- ArchiveRecoveryRequested = false;
- if (standby_signal_file_found)
- {
- StandbyModeRequested = true;
- ArchiveRecoveryRequested = true;
- }
- else if (recovery_signal_file_found)
- {
- StandbyModeRequested = false;
- ArchiveRecoveryRequested = true;
- }
- else
- /* 如果都不是,直接返回,说明是普通的崩溃恢复 */
- return;
-
- /*
- * We don't support standby mode in standalone backends; that requires other processes such as the WAL receiver to be alive.
- */
- if (StandbyModeRequested && !IsUnderPostmaster)
- ereport(FATAL,
- (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("standby mode is not supported by single-user servers")));
- }
下一篇,我们正式学习StartupXLOG函数,看不同场景下如何进行日志应用。
参考
《PostgreSQL技术内幕:事务处理深度探索》第4章