• golang的垃圾回收算法之九写屏障


    说明

    为了方便阅读把写屏障,先写出来,然后再接着写垃圾回收的清理等操作。

    一、写屏障

    在前面提到了,在标记的过程中是要插入写屏障的。在三色标记算法中,由于支持并行,就需要保证标记过程的安全性。标记有两大类强弱三色不变。应用在具体的场景有插入屏障和删除屏障。在前面的增量式GC中提到过,三色标记算法如果没有写屏障很容易导致误GC,安全性无法得到保障。那么如何才能保障GC的安全也就是正确的标记相关的内存对象呢?其实就是启动屏障机制。
    写屏障其实就是一种机制,它能够在各种环境下保证数据的一致性。这里首先说明一下强弱三色不变:
    1、强三色
    强三色就是没有黑色对象引用白色对象的指针。结合前面增量垃圾回收中的说明可以知道,这样做的目的是为了解决内存对象的遗漏问题。防止被引用的白色对象被误清理。

    2、弱三色
    弱三色指被黑色对象引用的白色对象都处于灰色保护状。

    根据上述的两种三色标记,Golang引入了插入屏障和删除屏障两种机制:
    1、插入屏障
    使用强三色标记,A引用的对象必须指定为灰色。

    2、删除屏障
    使用弱三色标记,内存对象被删除时,如果自身为灰色或者白色,则标记为灰色。这样可以保证被删除费用的对象如果还有其它引用,就不会被误清理。

    在Golang1.8中的混合写屏障机制:
    在实际应用中可以看到,插入写屏障结束时需要STW来重新扫描栈,标记栈上引用的白色对象的存活;删除写屏障回收精度低,GC开始时STW扫描堆栈来记录初始快照,这个过程会保护开始时刻的所有存活对象。换句话说,这个对STW是一种强需求,那么可不可以有一种更优的机制呢?在golang1.8中便引入了混合写屏障机制:
    1、GC开始将栈上的对象全部扫描并标记为黑色(不再二次扫描,不用STW),
    2、GC期间,栈上创建的新对象设置为黑色。
    3、被删除的对象标记为灰色。
    4、被添加的对象标记为灰色。
    这是一种符合弱三色不变标记的屏障

    ** 上述的机制在网上和书上有的是相关介绍,这里就不再拾人牙慧,重复了。建议大家在网上搜索相关即有大把的文章,或者找一些相关的GC算法书籍,也有很多介绍。

    二、源码分析

    下面就来分析一下相关的源码,在前面的分析代码中提到,在gcStart中,有一个启动写屏障:

    //go:nosplit
    func setGCPhase(x uint32) {
    	atomic.Store(&gcphase, x)
    	writeBarrier.needed = gcphase == _GCmark || gcphase == _GCmarktermination
    	writeBarrier.enabled = writeBarrier.needed || writeBarrier.cgo
    }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6

    mstkbar.go和mbarrier.go这两个文件中是对写屏障的基本的汇集文件。
    golang 内部是通过一个队列+掩码位图来实现三色状态的,并没有大家想象的有三个标志属性即:
    白色对象:对象所在 span 的 gcmarkBits 中对应的 bit 为 0,不在队列;
    灰色对象:对象所在 span 的 gcmarkBits 中对应的 bit 为 1,且对象在扫描队列中;
    黑色对象:对象所在 span 的 gcmarkBits 中对应的 bit 为 1,且对象已经从扫描队列中处理并摘除掉。
    在前面分析过堆上的标记过程,后面会分析栈上的扫描过程,这其中都用到了写屏障:

    // cgoCheckWriteBarrier is called whenever a pointer is stored into memory.
    // It throws if the program is storing a Go pointer into non-Go memory.
    //go:nosplit
    //go:nowritebarrier
    func cgoCheckWriteBarrier(dst *uintptr, src uintptr) {
    	if !cgoIsGoPointer(unsafe.Pointer(src)) {
    		return
    	}
    	if cgoIsGoPointer(unsafe.Pointer(dst)) {
    		return
    	}
    
    	// If we are running on the system stack then dst might be an
    	// address on the stack, which is OK.
    	g := getg()
    	if g == g.m.g0 || g == g.m.gsignal {
    		return
    	}
    
    	// Allocating memory can write to various mfixalloc structs
    	// that look like they are non-Go memory.
    	if g.m.mallocing != 0 {
    		return
    	}
    
    	systemstack(func() {
    		println("write of Go pointer", hex(src), "to non-Go memory", hex(uintptr(unsafe.Pointer(dst))))
    		throw(cgoWriteBarrierFail)
    	})
    }
    
    // cgoCheckMemmove is called when moving a block of memory.
    // dst and src point off bytes into the value to copy.
    // size is the number of bytes to copy.
    // It throws if the program is copying a block that contains a Go pointer
    // into non-Go memory.
    //go:nosplit
    //go:nowritebarrier
    func cgoCheckMemmove(typ *_type, dst, src unsafe.Pointer, off, size uintptr) {
    	if typ.kind&kindNoPointers != 0 {
    		return
    	}
    	if !cgoIsGoPointer(src) {
    		return
    	}
    	if cgoIsGoPointer(dst) {
    		return
    	}
    	cgoCheckTypedBlock(typ, src, off, size)
    }
    
    // cgoCheckSliceCopy is called when copying n elements of a slice from
    // src to dst.  typ is the element type of the slice.
    // It throws if the program is copying slice elements that contain Go pointers
    // into non-Go memory.
    //go:nosplit
    //go:nowritebarrier
    func cgoCheckSliceCopy(typ *_type, dst, src slice, n int) {
    	if typ.kind&kindNoPointers != 0 {
    		return
    	}
    	if !cgoIsGoPointer(src.array) {
    		return
    	}
    	if cgoIsGoPointer(dst.array) {
    		return
    	}
    	p := src.array
    	for i := 0; i < n; i++ {
    		cgoCheckTypedBlock(typ, p, 0, typ.size)
    		p = add(p, typ.size)
    	}
    }
    
    // cgoCheckTypedBlock checks the block of memory at src, for up to size bytes,
    // and throws if it finds a Go pointer. The type of the memory is typ,
    // and src is off bytes into that type.
    //go:nosplit
    //go:nowritebarrier
    func cgoCheckTypedBlock(typ *_type, src unsafe.Pointer, off, size uintptr) {
    	// Anything past typ.ptrdata is not a pointer.
    	if typ.ptrdata <= off {
    		return
    	}
    	if ptrdataSize := typ.ptrdata - off; size > ptrdataSize {
    		size = ptrdataSize
    	}
    
    	if typ.kind&kindGCProg == 0 {
    		cgoCheckBits(src, typ.gcdata, off, size)
    		return
    	}
    
    	// The type has a GC program. Try to find GC bits somewhere else.
    	for _, datap := range activeModules() {
    		if cgoInRange(src, datap.data, datap.edata) {
    			doff := uintptr(src) - datap.data
    			cgoCheckBits(add(src, -doff), datap.gcdatamask.bytedata, off+doff, size)
    			return
    		}
    		if cgoInRange(src, datap.bss, datap.ebss) {
    			boff := uintptr(src) - datap.bss
    			cgoCheckBits(add(src, -boff), datap.gcbssmask.bytedata, off+boff, size)
    			return
    		}
    	}
    
    	aoff := uintptr(src) - mheap_.arena_start
    	idx := aoff >> _PageShift
    	s := mheap_.spans[idx]
    	if s.state == _MSpanStack {
    		// There are no heap bits for value stored on the stack.
    		// For a channel receive src might be on the stack of some
    		// other goroutine, so we can't unwind the stack even if
    		// we wanted to.
    		// We can't expand the GC program without extra storage
    		// space we can't easily get.
    		// Fortunately we have the type information.
    		systemstack(func() {
    			cgoCheckUsingType(typ, src, off, size)
    		})
    		return
    	}
    
    	// src must be in the regular heap.
    
    	hbits := heapBitsForAddr(uintptr(src))
    	for i := uintptr(0); i < off+size; i += sys.PtrSize {
    		bits := hbits.bits()
    		if i >= off && bits&bitPointer != 0 {
    			v := *(*unsafe.Pointer)(add(src, i))
    			if cgoIsGoPointer(v) {
    				systemstack(func() {
    					throw(cgoWriteBarrierFail)
    				})
    			}
    		}
    		hbits = hbits.next()
    	}
    }
    
    // cgoCheckBits checks the block of memory at src, for up to size
    // bytes, and throws if it finds a Go pointer. The gcbits mark each
    // pointer value. The src pointer is off bytes into the gcbits.
    //go:nosplit
    //go:nowritebarrier
    func cgoCheckBits(src unsafe.Pointer, gcbits *byte, off, size uintptr) {
    	skipMask := off / sys.PtrSize / 8
    	skipBytes := skipMask * sys.PtrSize * 8
    	ptrmask := addb(gcbits, skipMask)
    	src = add(src, skipBytes)
    	off -= skipBytes
    	size += off
    	var bits uint32
    	for i := uintptr(0); i < size; i += sys.PtrSize {
    		if i&(sys.PtrSize*8-1) == 0 {
    			bits = uint32(*ptrmask)
    			ptrmask = addb(ptrmask, 1)
    		} else {
    			bits >>= 1
    		}
    		if off > 0 {
    			off -= sys.PtrSize
    		} else {
    			if bits&1 != 0 {
    				v := *(*unsafe.Pointer)(add(src, i))
    				if cgoIsGoPointer(v) {
    					systemstack(func() {
    						throw(cgoWriteBarrierFail)
    					})
    				}
    			}
    		}
    	}
    }
    
    // cgoCheckUsingType is like cgoCheckTypedBlock, but is a last ditch
    // fall back to look for pointers in src using the type information.
    // We only use this when looking at a value on the stack when the type
    // uses a GC program, because otherwise it's more efficient to use the
    // GC bits. This is called on the system stack.
    //go:nowritebarrier
    //go:systemstack
    func cgoCheckUsingType(typ *_type, src unsafe.Pointer, off, size uintptr) {
    	if typ.kind&kindNoPointers != 0 {
    		return
    	}
    
    	// Anything past typ.ptrdata is not a pointer.
    	if typ.ptrdata <= off {
    		return
    	}
    	if ptrdataSize := typ.ptrdata - off; size > ptrdataSize {
    		size = ptrdataSize
    	}
    
    	if typ.kind&kindGCProg == 0 {
    		cgoCheckBits(src, typ.gcdata, off, size)
    		return
    	}
    	switch typ.kind & kindMask {
    	default:
    		throw("can't happen")
    	case kindArray:
    		at := (*arraytype)(unsafe.Pointer(typ))
    		for i := uintptr(0); i < at.len; i++ {
    			if off < at.elem.size {
    				cgoCheckUsingType(at.elem, src, off, size)
    			}
    			src = add(src, at.elem.size)
    			skipped := off
    			if skipped > at.elem.size {
    				skipped = at.elem.size
    			}
    			checked := at.elem.size - skipped
    			off -= skipped
    			if size <= checked {
    				return
    			}
    			size -= checked
    		}
    	case kindStruct:
    		st := (*structtype)(unsafe.Pointer(typ))
    		for _, f := range st.fields {
    			if off < f.typ.size {
    				cgoCheckUsingType(f.typ, src, off, size)
    			}
    			src = add(src, f.typ.size)
    			skipped := off
    			if skipped > f.typ.size {
    				skipped = f.typ.size
    			}
    			checked := f.typ.size - skipped
    			off -= skipped
    			if size <= checked {
    				return
    			}
    			size -= checked
    		}
    	}
    }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80
    • 81
    • 82
    • 83
    • 84
    • 85
    • 86
    • 87
    • 88
    • 89
    • 90
    • 91
    • 92
    • 93
    • 94
    • 95
    • 96
    • 97
    • 98
    • 99
    • 100
    • 101
    • 102
    • 103
    • 104
    • 105
    • 106
    • 107
    • 108
    • 109
    • 110
    • 111
    • 112
    • 113
    • 114
    • 115
    • 116
    • 117
    • 118
    • 119
    • 120
    • 121
    • 122
    • 123
    • 124
    • 125
    • 126
    • 127
    • 128
    • 129
    • 130
    • 131
    • 132
    • 133
    • 134
    • 135
    • 136
    • 137
    • 138
    • 139
    • 140
    • 141
    • 142
    • 143
    • 144
    • 145
    • 146
    • 147
    • 148
    • 149
    • 150
    • 151
    • 152
    • 153
    • 154
    • 155
    • 156
    • 157
    • 158
    • 159
    • 160
    • 161
    • 162
    • 163
    • 164
    • 165
    • 166
    • 167
    • 168
    • 169
    • 170
    • 171
    • 172
    • 173
    • 174
    • 175
    • 176
    • 177
    • 178
    • 179
    • 180
    • 181
    • 182
    • 183
    • 184
    • 185
    • 186
    • 187
    • 188
    • 189
    • 190
    • 191
    • 192
    • 193
    • 194
    • 195
    • 196
    • 197
    • 198
    • 199
    • 200
    • 201
    • 202
    • 203
    • 204
    • 205
    • 206
    • 207
    • 208
    • 209
    • 210
    • 211
    • 212
    • 213
    • 214
    • 215
    • 216
    • 217
    • 218
    • 219
    • 220
    • 221
    • 222
    • 223
    • 224
    • 225
    • 226
    • 227
    • 228
    • 229
    • 230
    • 231
    • 232
    • 233
    • 234
    • 235
    • 236
    • 237
    • 238
    • 239
    • 240
    • 241

    需要注意的是,写屏障是在编译期确定的,之后就不再动了。这就涉及到一个问题即“内存逃逸”,这个在前面分析过,也就是说,只有确定的了内存分配到底在堆上还是栈上(堆才有写屏障),才能正确的应用写屏障。有兴趣可以看看前面的相关文章。
    也就是说,上面写代码不是在函数调用时产生的,所以需要看一下相关的汇编代码:

    // void gogo(Gobuf*)
    // restore state from Gobuf; longjmp
    TEXT runtime·gogo(SB), NOSPLIT, $16-8
    	MOVQ	buf+0(FP), BX		// gobuf
    
    	// If ctxt is not nil, invoke deletion barrier before overwriting.
    	MOVQ	gobuf_ctxt(BX), AX
    	TESTQ	AX, AX
    	JZ	nilctxt
    	LEAQ	gobuf_ctxt(BX), AX
    	MOVQ	AX, 0(SP)
    	MOVQ	$0, 8(SP)
    	CALL	runtime·writebarrierptr_prewrite(SB)
    	MOVQ	buf+0(FP), BX
    ......
    // void gogo(Gobuf*)
    // restore state from Gobuf; longjmp
    TEXT runtime·gogo(SB), NOSPLIT, $16-8
    	MOVQ	buf+0(FP), BX		// gobuf
    
    	// If ctxt is not nil, invoke deletion barrier before overwriting.
    	MOVQ	gobuf_ctxt(BX), AX
    	TESTQ	AX, AX
    	JZ	nilctxt
    	LEAQ	gobuf_ctxt(BX), AX
    	MOVQ	AX, 0(SP)
    	MOVQ	$0, 8(SP)
    	CALL	runtime·writebarrierptr_prewrite(SB)
    	MOVQ	buf+0(FP), BX
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29

    可以看到,在这些汇编文件中调用了相关的写屏障的代码。这里最重要的其实是第一个函数gcmarkwb_m:

    func gcmarkwb_m(slot *uintptr, ptr uintptr) {
    	if writeBarrier.needed {
    		// Note: This turns bad pointer writes into bad
    		// pointer reads, which could be confusing. We avoid
    		// reading from obviously bad pointers, which should
    		// take care of the vast majority of these. We could
    		// patch this up in the signal handler, or use XCHG to
    		// combine the read and the write. Checking inheap is
    		// insufficient since we need to track changes to
    		// roots outside the heap.
    		if slot1 := uintptr(unsafe.Pointer(slot)); slot1 >= minPhysPageSize {
    			if optr := *slot; optr != 0 {
    				shade(optr)
    			}
    		}
    		// TODO: Make this conditional on the caller's stack color.
    		if ptr != 0 && inheap(ptr) {
    			shade(ptr)
    		}
    	}
    }
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21

    在这个文件的开头的说明中,对此函数进行了简化:
    gcmarkwb_m is the mark-phase write barrier, the only barrier we have.The rest of this file exists only to make calls to this function. This is a hybrid barrier that combines a Yuasa-style deletion barrier—which shades the object whose reference is being overwritten—with Dijkstra insertion barrier—which shades the object whose reference is being written. The insertion part of the barrier is necessary while the calling goroutine’s stack is grey. In pseudocode, the barrier is:

    writePointer(slot, ptr):
    shade(*slot)
    if current stack is grey:
    shade(ptr)
    *slot = ptr
    slot is the destination in Go code. ptr is the value that goes into the slot in Go code.

    整个开头文件的说明还是非常清晰的,英文好的可以直接看,不好的去Google翻译上看。不要想当然,这是最可怕的。
    slot is the destination in Go code.
    ptr is the value that goes into the slot in Go code.

    Shade indicates that it has seen a white pointer by adding the referent
    to wbuf as well as marking it.

    The two shades and the condition work together to prevent a mutator
    from hiding an object from the garbage collector:

    1. shade(*slot) prevents a mutator from hiding an object by moving
      the sole pointer to it from the heap to its stack. If it attempts
      to unlink an object from the heap, this will shade it.

    2. shade(ptr) prevents a mutator from hiding an object by moving
      the sole pointer to it from its stack into a black object in the
      heap. If it attempts to install the pointer into a black object,
      this will shade it.

    3. Once a goroutine’s stack is black, the shade(ptr) becomes
      unnecessary. shade(ptr) prevents hiding an object by moving it from
      the stack to the heap, but this requires first having a pointer
      hidden on the stack. Immediately after a stack is scanned, it only
      points to shaded objects, so it’s not hiding anything, and the
      shade(*slot) prevents it from hiding any other pointers on its
      stack.

    For a detailed description of this barrier and proof of
    correctness, see https://github.com/golang/proposal/blob/master/design/17503-eliminate-rescan.md
    Dealing with memory ordering:

    Both the Yuasa and Dijkstra barriers can be made conditional on the
    color of the object containing the slot. We chose not to make these
    conditional because the cost of ensuring that the object holding
    the slot doesn’t concurrently change color without the mutator
    noticing seems prohibitive.

    Consider the following example where the mutator writes into
    a slot and then loads the slot’s mark bit while the GC thread
    writes to the slot’s mark bit and then as part of scanning reads
    the slot.

    Initially both [slot] and [slotmark] are 0 (nil)
    Mutator thread GC thread
    st [slot], ptr st [slotmark], 1

    ld r1, [slotmark] ld r2, [slot]

    Without an expensive memory barrier between the st and the ld, the final
    result on most HW (including 386/amd64) can be r1r20. This is a classic
    example of what can happen when loads are allowed to be reordered with older
    stores (avoiding such reorderings lies at the heart of the classic
    Peterson/Dekker algorithms for mutual exclusion). Rather than require memory
    barriers, which will slow down both the mutator and the GC, we always grey
    the ptr object regardless of the slot’s color.
    //
    Another place where we intentionally omit memory barriers is when
    accessing mheap_.arena_used to check if a pointer points into the
    heap. On relaxed memory machines, it’s possible for a mutator to
    extend the size of the heap by updating arena_used, allocate an
    object from this new region, and publish a pointer to that object,
    but for tracing running on another processor to observe the pointer
    but use the old value of arena_used. In this case, tracing will not
    mark the object, even though it’s reachable. However, the mutator
    is guaranteed to execute a write barrier when it publishes the
    pointer, so it will take care of marking the object. A general
    consequence of this is that the garbage collector may cache the
    value of mheap_.arena_used. (See issue #9984.)

    Stack writes:
    The compiler omits write barriers for writes to the current frame,
    but if a stack pointer has been passed down the call stack, the
    compiler will generate a write barrier for writes through that
    pointer (because it doesn’t know it’s not a heap pointer).

    One might be tempted to ignore the write barrier if slot points
    into to the stack. Don’t do it! Mark termination only re-scans
    frames that have potentially been active since the concurrent scan,
    so it depends on write barriers to track changes to pointers in
    stack frames that have not been active.

    Global writes:

    The Go garbage collector requires write barriers when heap pointers
    are stored in globals. Many garbage collectors ignore writes to
    globals and instead pick up global -> heap pointers during
    termination. This increases pause time, so we instead rely on write
    barriers for writes to globals so that we don’t have to rescan
    global during mark termination.

    Publication ordering:

    三、总结

    所谓写屏障其实就类似于多线程中的内存保护,保证每个线程访问相关的内存都是安全。而不能每个线程想删除就删除想更改就更改。并行和并发在大多数情况下确实是能提高效率的,不然也不会朝着这个方向发展。但一个事务的出现,不光有好的一面,必然也有负面的一面。并行和并发的引入,复杂性和不易维护、难于调试就是负面的东西。但人总不能因噎废食,强化好的一方面,尽量减少不好的一方面,这才是每个程序员应该考虑的。
    看看Golang代码中很多的Debug部分,就明白,写一个程序真得需要花精气神的。

  • 相关阅读:
    Pytorch学习笔记9——AutoEncoder
    Linux从入门到实战 ---- 磁盘分区
    等差数列和特殊矩阵压缩公式
    一篇文章教你自动化测试如何解析excel文件?
    极光认证——手机号一键登录
    Spring使用的注解大全和解释
    centos安装docker
    node.js与内置模块
    MQTT.js 入门教程:学习笔记
    ConcurrentHashMap 1.7与1.8的区别
  • 原文地址:https://blog.csdn.net/fpcc/article/details/126361944