MIT6.830-lab6-Rollback and Recovery(数据库的日志回滚和恢复)

文章目录

👉 代码获取和项目文档

任务介绍

MIT6.830的lab6中，我们将实现日志的回滚和崩溃恢复。

rollback：当一个事务显式执行失败后，就要进行回滚，我们要做的就是将数据库回滚到事务开始前的状态；
Recovery：当数据库崩溃后，再一次重启就需要根据log日志进行数据恢复。

概念介绍

在进行我们的任务之前，我们需要了解一些基本概念。

steal/no-force策略

lab6这个实验中，我们实现的是steal/no-force策略，两种策略的区别如下:

steal/no-steal: 是否允许一个uncommitted的事务将修改更新到磁盘，如果是steal策略，那么此时磁盘上就可能包含uncommitted的数据，因此系统需要记录undo log，以防事务abort时进行回滚（roll-back）。如果是no steal策略，就表示磁盘上不会存在uncommitted数据，因此无需回滚操作，也就无需记录undo log；
force/no-force:force策略表示事务在committed之后必须将所有更新立刻持久化到磁盘，这样会导致磁盘发生很多小的写操作（更可能是随机写）。no-force表示事务在committed之后可以不立即持久化到磁盘，这样可以缓存很多的更新批量持久化到磁盘，这样可以降低磁盘操作次数（提升顺序写），但是如果committed之后发生crash，那么此时已经committed的事务数据将会丢失（因为还没有持久化到磁盘），因此系统需要记录redo log，在系统重启时候进行前滚（roll-forward）操作。

日志类型

static final int ABORT_RECORD = 1;
static final int COMMIT_RECORD = 2;
static final int UPDATE_RECORD = 3;
static final int BEGIN_RECORD = 4;
static final int CHECKPOINT_RECORD = 5;
1
2
3
4
5

simpledb的日志记录一共有5种：ABORT, COMMIT, UPDATE, BEGIN, and CHECKPOINT，分别记录事务失败、事务提交、写入磁盘前的脏页、事务开始、检测点，这些格式的日志都记录在同一个日志文件中；日志文件以及每条日志的通用格式如下：

对于ABORT, COMMIT, and BEGIN这三种，中间的content是空的；

对于UPDATE格式的记录，主要是五部分：

type：当前record的类型；
transationId：生成该record记录的事务id；
before data：旧的数据，也就是记录修改前的数据；
after data：新的数据，也就是当前进行刷盘的数据；
start offset：当前record的起始offset。

对于CHECKPOINT记录，在日志的开始时候会记录一个long类型的数据，记录的就是最新checkpoint的offset。

而CHECKPOINT记录的格式如下：

type：当前record的类型；
transationId：生成该record记录的事务id，这里直接就是-1；
transation count：当前所有活跃事务的个数；
transaction detail：每一个活跃事务的id和第一条日志记录的偏移量。

redo log与undo log

在simpledb中，日志不区分redo log和undo log，格式较为简单，也不会记录事务执行过程中对记录的具体修改行为。

simpledb中用UPDATE格式的日志来保存数据的变化，在每次将数据页写入磁盘前需要用logWrite方法来记录变化：

public  synchronized void logWrite(TransactionId tid, Page before,
                                   Page after)
    throws IOException  {
    Debug.log("WRITE, offset = " + raf.getFilePointer());
    preAppend();
    /* update record conists of

           record type
           transaction id
           before page data (see writePageData)
           after page data
           start offset
        */
    raf.writeInt(UPDATE_RECORD);
    raf.writeLong(tid.getId());

    writePageData(raf,before);
    writePageData(raf,after);
    raf.writeLong(currentOffset);
    currentOffset = raf.getFilePointer();

    Debug.log("WRITE OFFSET = " + currentOffset);
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

前面也说了，update record保存了每一个page的before和after数据，before数据就是修改前的老数据，after数据就是当前要刷盘的数据。

每一个page都会有一个数据结构：

byte[] oldData;
1

该数组就保存了page的旧数据流。

数据页一开始的旧数据是从磁盘中读取后进行设置的，之后当事务提交时，就意味着这个修改已经是持久化到磁盘了，可以更新一次旧数据。而刷盘主要是两种情况：

（1）某一事务提交时刷盘

public synchronized  void flushPages(TransactionId tid) throws IOException {
    // some code goes here
    // not necessary for lab1|lab2
    LRUCache<PageId, Page>.DLinkedNode head = buffer.getHead();
    LRUCache<PageId, Page>.DLinkedNode tail = buffer.getTail();
    while(head!=tail){
        Page page = head.value;
        if(page!=null && page.isDirty()!=null&&page.isDirty().equals(tid) ){
            DbFile dbFile = Database.getCatalog().getDatabaseFile(page.getId().getTableId());
            //记录日志
            try{
                Database.getLogFile().logWrite(page.isDirty(),page.getBeforeImage(),page);
                Database.getLogFile().force();
                page.markDirty(false,null);

                dbFile.writePage(page);
                page.setBeforeImage();
            }catch (IOException e){
                e.printStackTrace();
            }
        }
        head = head.next;
    }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24

（2）进行一次checkpoint时全部刷盘

public void logCheckpoint() throws IOException {
        //make sure we have buffer pool lock before proceeding
        synchronized (Database.getBufferPool()) {
            synchronized (this) {
                //Debug.log("CHECKPOINT, offset = " + raf.getFilePointer());
                preAppend();
                long startCpOffset, endCpOffset;
                Set<Long> keys = tidToFirstLogRecord.keySet();
                Iterator<Long> els = keys.iterator();
                force();
                Database.getBufferPool().flushAllPages();
                ...
1
2
3
4
5
6
7
8
9
10
11
12

这里调用了一次Database.getBufferPool().flushAllPages()将所有脏页进行刷盘，并且记录旧数据。

事务t在CheckPoint前begin，且没有在CheckPoint前commit或abort：
- 在CheckPoint后commit的需要redo：
- 在CheckPoint后abort的需要redo(实际上就是undo)；
- 在CheckPoint后什么都没做的需要undo。
如果在CheckPoint前已经commit或abort，已经刷盘，不管它；
事务t在CheckPoint后begin，或不存在CheckPoint，只把commit的tid刷盘即可，未提交和abort的都直接忽视。

exercise 1 - Rollback

该任务主要完成的是LogFile.rollback()函数，当事务中止时，在事务释放其锁之前调用此函数。它的工作是撤消事务可能对数据库所做的任何更改。

大概逻辑很简单，只需要找到该事务tid第一个update record，然后用它的before data进行刷盘就可以进行恢复。

在介绍任务的完成代码前，我们必须要了解一下LogFile中的一个数据结构：

final Map<Long,Long> tidToFirstLogRecord = new HashMap<>();
1

该map结构很重要，key是活跃事务tid，value就是当前事务的第一条record offset。开始BEGIN日志的时候会往map里put一个tid和对应的offset进去，在事务完成(COMMIT或ABORT)以后会删除这个map里对应的tid的记录。所以这个map里实际上保存的是正在进行的事务的BeginOffset。

public void rollback(TransactionId tid)
    throws NoSuchElementException, IOException {
    synchronized (Database.getBufferPool()) {
        synchronized(this) {
            // some code goes here
            preAppend();
            long tidId = tid.getId();
            Long begin = tidToFirstLogRecord.get(tidId);
            raf.seek(begin);
            while(true){
                try{
                    int type = raf.readInt();
                    Long curTid = raf.readLong();
                    if(curTid!=tidId){
                        //如果不是当前的tid，就直接跳过
                        if(type==3){
                            //update record 还要跳过页数据
                            readPageData(raf);
                            readPageData(raf);
                        }
                    }else{
                        if(type==3){
                            //只需要恢复到最初的状态就行
                            Page before = readPageData(raf);
                            Page after = readPageData(raf);
                            DbFile databaseFile = Database.getCatalog().getDatabaseFile(before.getId().getTableId());
                            databaseFile.writePage(before);
                            Database.getBufferPool().discardPage(after.getId());
                            raf.seek(raf.getFilePointer()+8);
                            break;
                        }
                    }
                    raf.seek(raf.getFilePointer()+8);
                }catch (EOFException e){
                    break;
                }
            }
        }
    }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40

根据tidToFirstLogRecord获取该事务第一条记录的位置，并移动到日志开始的地方；
读取每一条日志，如果不是当前事务的日志就跳过；否则判断是否是update record，是的话就取出其before data并进行刷盘。

exercise 2 - Recovery

该任务主要完成的是LogFile.recover()，当数据库崩溃并重启后，就要在所有的事务开始前开始一次数据恢复。

大概得恢复理论就是借助checkpoint，大概逻辑如下：

事务t在CheckPoint前begin，且没有在CheckPoint前commit或abort：
- 在CheckPoint后commit的需要redo；
- 在CheckPoint后abort的需要undo；
- 在CheckPoint后什么都没做的需要undo。
如果在CheckPoint前已经commit或abort，已经刷盘，不管它；
事务t在CheckPoint后begin，或不存在CheckPoint，只把commit的tid刷盘即可，未提交和abort的都直接忽视。

当然其实我们的数据库是一种简化后的数据库，redo和undo操作只需要将相应的page刷盘就可以了。

public void recover() throws IOException {
    synchronized (Database.getBufferPool()) {
        synchronized (this) {
            recoveryUndecided = false;
            // some code goes here
            HashMap<Long, List<Page[]>> undoMap = new HashMap<>();
            raf.seek(0);
            print();
            long checkpoint = raf.readLong();
            if(checkpoint!=-1){
                HashMap<Long, Long> tidPos = new HashMap<>();
                raf.seek(checkpoint);
                //跳过record type和tid
                raf.seek(raf.getFilePointer()+12);
                //获取正在进行事务的个数
                int num = raf.readInt();
                while(num>0){
                    //获取每一个事务的tid和第一条log record OFFSET
                    long curTid = raf.readLong();
                    long offset = raf.readLong();
                    tidPos.put(curTid,offset);
                    num--;
                }
                for(Long pos:tidPos.keySet()){
                    raf.seek(tidPos.get(pos));
                    recoverSearch(raf,undoMap);
                }
            }else{
                System.out.println(raf.getFilePointer() + "-----------");
                recoverSearch(raf, undoMap);
            }
            //进行undo操作
            for(Long tid:undoMap.keySet()){
                Page[] pages = undoMap.get(tid).get(0);
                Page before = pages[0];
                DbFile databaseFile = Database.getCatalog().getDatabaseFile(before.getId().getTableId());
                databaseFile.writePage(before);
            }
            undoMap.clear();
        }
    }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42

从日志开头部分读取最新checkpoint的offset：
- 如果是-1，则表明没有进行一次checkpoint，需要调用recoverSearch从日志开头进行搜索record，并得到一个undo map；
- 如果不是-1，则跳转到对应的offset，获取每一个事务的tid和第一条log record OFFSET，分别进行跳转并调用recoverSearch开始搜索record，得到一个undo map；
undo map中的就是所有要进行undo的事务和其对应的所有page，只需要获取记录的第一个page进行刷盘就可以恢复了。

看下recoverSearch函数：

/**
     * 从指定位置开始检索每一条记录，update记录就直接放入map中，commit就直接将最终page刷盘，abort就直接将开始前的page刷盘
     * @param raf
     * @param map
     */
private void recoverSearch(RandomAccessFile raf,Map<Long,List<Page[]>> map) throws IOException {
    while(true){
        try{
            int type = raf.readInt();
            long curTid = raf.readLong();
            if(type==3){
                //update
                if(!map.containsKey(curTid)){
                    map.put(curTid,new ArrayList<>());
                }
                Page before = readPageData(raf);
                Page after = readPageData(raf);
                map.get(curTid).add(new Page[]{before,after});
            }else if(type==2 && map.containsKey(curTid)){
                //commit
                Page[] pages = map.get(curTid).get(map.get(curTid).size() - 1);
                Page after = pages[1];
                DbFile databaseFile = Database.getCatalog().getDatabaseFile(after.getId().getTableId());
                databaseFile.writePage(after);
                map.remove(curTid);
            }else if(type==1 && map.containsKey(curTid)){
                //abort
                Page[] pages = map.get(curTid).get(0);
                Page before = pages[0];
                DbFile databaseFile = Database.getCatalog().getDatabaseFile(before.getId().getTableId());
                databaseFile.writePage(before);
                map.remove(curTid);
            }
            raf.seek(raf.getFilePointer()+8);
        }catch (EOFException e){
            break;
        }
    }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

从指定的文件位置开始进行读取每一个record：

如果是update record，直接记录到map中；
如果是commit record，说明在崩溃前已经进行提交了，这里可以直接将map中记录的最后一个page刷盘，并在map中删除该tid的数据；
如果是abort record，说明在崩溃前已经进行rollback了，这里可以直接将map中记录的第一个page刷盘，并在map中删除该id的数据。

这里特地再说明一下，在这个map中保存的都是一些需要undo的事务数据。

相关阅读:
java-net-php-python-jspm校园闲置物品拍卖系统计算机毕业设计程序
 C语言学习笔记
 PSP - 蛋白质复合物结构预测 Template Pair 特征 Mask 可视化
 基于 FPGA 实现数字时钟详细原理讲解及验证结果
 【Linux】Linux常用命令—文件管理(上)
【大模型的一些基本结论】
墨天轮沙龙 | 庚顿数据姚羽：实时数据技术赋能流程工业，保障业务连续性
 Springboot毕设项目基于Springboot的手机电商网站lmo47（java+VUE+Mybatis+Maven+Mysql）
Mysql中获取所有表名以及表名带时间字符串使用BetweenAnd筛选区间范围
 二十、泛型（5）
原文地址：https://blog.csdn.net/qq_44766883/article/details/127415009