其实谈内存管理内存分配,这都吐沫乱飞了。搞得学习的人估计心里都麻木了。程序员如果不接触到比较深的底层或者说比较大的程序,一般也不会遇到内存这种问题。举一个例子,搞前端的,绝大多数可能对这个一无所知,搞后端的如果仅仅是搞业务开发也一般遇不到这种问题。而且,经常搞业务功能开发的,一般来说,在每个企业里都是大多数。
说这个是什么意思呢?大家都熟悉金字塔模型,也就是说,搞到需要对内存进行精细化管理的地步,应该就是金字塔比较靠上的了。有的人会问,那其它人就不能搞,不想搞,不会搞了么,这倒不是。就是说,学习实践是没问题的,但真正想在一个正式的商业项目中实践,就比较困难了。
话又说回来,再高大的金字塔也是从地基一步步的造出来的,不积跬步,无以致千里。所以,还是要认真搞一下内存的学习和实践。最好的办法就是多看看有名气的开源框架的相关代码,比如Redis、Java还及本文提到的Golang的内存管理部分。
这里分析一下Golang的内存分配,内存分配一般来说会分成三种,即小内存,大内存和巨型内存。小内存一般指栈或者比较小的堆空间分配,大内存一般指堆上的内存分配,巨型内存一般是指特殊情况,如果需要开辟一块上G的缓冲区等等。后者一般都是在特定场合下应用,比如内存型数据库或者图形图像处理等。
而不同大小的内存的分配,往往细节上又有不同,在Go语言里,对内存的分配使用的是Tcmalloc算法,在前面分析时提到,Go会提前申请一块内存(类似于内存池),然后将其划分成三个部分:
1、arena: 即堆区,runtime动态分配内存均在此区,其内存块划分为 8kb 的页,而 mspan由其中一些组合起来形成,前面提到过,其是 go 中内存管理的基本单元。
2、bitmap: 即堆区使用的内存映射表,其记录了哪些区域保存了对象,对象是否包含指针,以及 GC 的标记信息.
3、spans: 即mspan 的指针,由spans 区域的信息可以定位到 mspan.。从而在GC时快速发现大块的内存 mspan.
在Go的内存管理单元中,为了管理方便,把相关的内存抽象成为了一个对象。在早期的内存池技术中,也基本是类似的处理机制,针对常用的内存的大小,把内存划分成具体的Object,这种大小,就看实际应用的场景和具体的内存成本来决定。在Go的内存管理机制中,把mspans当做基本的管理单元,每个基本的单元又按照Object Size的大小划分成不同的管理块,而每一块或者说内存由前面的Bitmap来映射管理。同样,为了处理是否含有指针对象,又把mspan划分成两个,也就是说有一个spans size的概念,它是object size的两倍。
在go的内存分配中,对分配对象也同样进行了分类:
1、极小对象(小于16byte)在当前P的mcache上的tiny缓存上分配;
2、小对象(大等于16byte小等32k)在当前P的mcache上对应slot的空闲列表中分配,否则向mcentral申请,如果仍然无法分配则申请向mheap分配,如果还没有则向OS申请分配;
3、大对象(大于32k)在mheap分配。
内存管理单元的代码主要分布在mheap.go,内存的分配管理单元由三部分mcache, mcentral, mheap组成,这个在前面提到过,今天主要分析分配的过程。分配的代码主要在runtime文件夹下的malloc.go,在Go中使用的是Tcmalloc这种算法,回头有时间分析一下Google这种算法和Facebook 的Jemalloc以及Glib中的Ptmalloc(malloc)的不同。这里先埋个桩。三类内存的管理单元定义即arena,mspan,bitmap组成,先看一下上面提到的Object Size的相关定义:
//runtime/sizeclasses.go
const (
_MaxSmallSize = 32768
smallSizeDiv = 8
smallSizeMax = 1024
largeSizeDiv = 128
_NumSizeClasses = 67
_PageShift = 13
)
//此变量即为Object size定义
var class_to_size = [_NumSizeClasses]uint16{0, 8, 16, 32, 48, 64, 80, 96, 112, 128, 144, 160, 176, 192, 208, 224, 240, 256, 288, 320, 352, 384, 416, 448, 480, 512, 576, 640, 704, 768, 896, 1024, 1152, 1280, 1408, 1536, 1792, 2048, 2304, 2688, 3072, 3200, 3456, 4096, 4864, 5376, 6144, 6528, 6784, 6912, 8192, 9472, 9728, 10240, 10880, 12288, 13568, 14336, 16384, 18432, 19072, 20480, 21760, 24576, 27264, 28672, 32768}
var class_to_allocnpages = [_NumSizeClasses]uint8{0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 3, 2, 3, 1, 3, 2, 3, 4, 5, 6, 1, 7, 6, 5, 4, 3, 5, 7, 2, 9, 7, 5, 8, 3, 10, 7, 4}
type divMagic struct {
shift uint8
shift2 uint8
mul uint16
baseMask uint16
}
var class_to_divmagic = [_NumSizeClasses]divMagic{{0, 0, 0, 0}, {3, 0, 1, 65528}, {4, 0, 1, 65520}, {5, 0, 1, 65504}, {4, 9, 171, 0}, {6, 0, 1, 65472}, {4, 10, 205, 0}, {5, 9, 171, 0}, {4, 11, 293, 0}, {7, 0, 1, 65408}, {4, 9, 57, 0}, {5, 10, 205, 0}, {4, 12, 373, 0}, {6, 7, 43, 0}, {4, 13, 631, 0}, {5, 11, 293, 0}, {4, 13, 547, 0}, {8, 0, 1, 65280}, {5, 9, 57, 0}, {6, 9, 103, 0}, {5, 12, 373, 0}, {7, 7, 43, 0}, {5, 10, 79, 0}, {6, 10, 147, 0}, {5, 11, 137, 0}, {9, 0, 1, 65024}, {6, 9, 57, 0}, {7, 6, 13, 0}, {6, 11, 187, 0}, {8, 5, 11, 0}, {7, 8, 37, 0}, {10, 0, 1, 64512}, {7, 9, 57, 0}, {8, 6, 13, 0}, {7, 11, 187, 0}, {9, 5, 11, 0}, {8, 8, 37, 0}, {11, 0, 1, 63488}, {8, 9, 57, 0}, {7, 10, 49, 0}, {10, 5, 11, 0}, {7, 10, 41, 0}, {7, 9, 19, 0}, {12, 0, 1, 61440}, {8, 9, 27, 0}, {8, 10, 49, 0}, {11, 5, 11, 0}, {7, 13, 161, 0}, {7, 13, 155, 0}, {8, 9, 19, 0}, {13, 0, 1, 57344}, {8, 12, 111, 0}, {9, 9, 27, 0}, {11, 6, 13, 0}, {7, 14, 193, 0}, {12, 3, 3, 0}, {8, 13, 155, 0}, {11, 8, 37, 0}, {14, 0, 1, 49152}, {11, 8, 29, 0}, {7, 13, 55, 0}, {12, 5, 7, 0}, {8, 14, 193, 0}, {13, 3, 3, 0}, {7, 14, 77, 0}, {12, 7, 19, 0}, {15, 0, 1, 32768}}
var size_to_class8 = [smallSizeMax/smallSizeDiv + 1]uint8{0, 1, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12, 13, 13, 14, 14, 15, 15, 16, 16, 17, 17, 18, 18, 18, 18, 19, 19, 19, 19, 20, 20, 20, 20, 21, 21, 21, 21, 22, 22, 22, 22, 23, 23, 23, 23, 24, 24, 24, 24, 25, 25, 25, 25, 26, 26, 26, 26, 26, 26, 26, 26, 27, 27, 27, 27, 27, 27, 27, 27, 28, 28, 28, 28, 28, 28, 28, 28, 29, 29, 29, 29, 29, 29, 29, 29, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31, 31}
var size_to_class128 = [(_MaxSmallSize-smallSizeMax)/largeSizeDiv + 1]uint8{31, 32, 33, 34, 35, 36, 36, 37, 37, 38, 38, 39, 39, 39, 40, 40, 40, 41, 42, 42, 43, 43, 43, 43, 43, 44, 44, 44, 44, 44, 44, 45, 45, 45, 45, 46, 46, 46, 46, 46, 46, 47, 47, 47, 48, 48, 49, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 51, 51, 51, 51, 51, 51, 51, 51, 51, 51, 52, 52, 53, 53, 53, 53, 54, 54, 54, 54, 54, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 55, 56, 56, 56, 56, 56, 56, 56, 56, 56, 56, 57, 57, 57, 57, 57, 57, 58, 58, 58, 58, 58, 58, 58, 58, 58, 58, 58, 58, 58, 58, 58, 58, 59, 59, 59, 59, 59, 59, 59, 59, 59, 59, 59, 59, 59, 59, 59, 59, 60, 60, 60, 60, 60, 61, 61, 61, 61, 61, 61, 61, 61, 61, 61, 61, 62, 62, 62, 62, 62, 62, 62, 62, 62, 62, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 65, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66, 66}
说明在在Object的定义中,有_NumSizeClasses(67)种,而后面的页数分配定义相同,也是67种。
再看一下在mheap定义的bitmap相关:
type mheap struct {
......
// range of addresses we might see in the heap
bitmap uintptr // Points to one byte past the end of the bitmap
bitmap_mapped uintptr
arena_start uintptr
arena_used uintptr // always mHeap_Map{Bits,Spans} before updating
arena_end uintptr
arena_reserved bool
......
}
这几个基本都是用unitptr定义,这玩意儿其实就是c++中的指针,bitmap指向最后一个字节,需要注意的是其是从高向低增长的,而arena_start正好相反,所以二者指向的是同一个地址。换句话说,bitmap指向的是bitmap区的最后的地址,而arena_start指向的是arena开始的地址,而两个区域是挨在一起的。就像一个环形队列,开始就是结束,结束也是开始。
通过上面的分析可以看出来,go的内存分成三部分来管理,即span,bitmap和arena,基本的内存定义单元为mspan,内存分配管理的单元为mcache, mcentral, mheap,mcache管理线程在本地缓存的mspan;mcentral管理全局的mspan供所有线程使用;mheap管理Go的所有动态分配内存。这样从宏观上掌握了Go中对内存的设计管理,就好分析相关的代码和实际的执行流程了。
更具体的一些数据结构的定义,在上篇基本分析过了,有不清楚的可以看一下或者翻一下源码。
下面分析一下内存的分配流程,主要是按照上面提到的三类,极小,小和大三种情况分别来说明分析:
先看一下内存对象的New函数:
// implementation of new builtin
// compiler (both frontend and SSA backend) knows the signature
// of this function
func newobject(typ *_type) unsafe.Pointer {
return mallocgc(typ.size, typ, true)
}
//go:linkname reflect_unsafe_New reflect.unsafe_New
func reflect_unsafe_New(typ *_type) unsafe.Pointer {
return newobject(typ)
}
它会调用 :
// Allocate an object of size bytes.
// Small objects are allocated from the per-P cache's free lists.
// Large objects (> 32 kB) are allocated straight from the heap.
func mallocgc(size uintptr, typ *_type, needzero bool) unsafe.Pointer {
if gcphase == _GCmarktermination {
throw("mallocgc called with gcphase == _GCmarktermination")
}
if size == 0 {
return unsafe.Pointer(&zerobase)
}
if debug.sbrk != 0 {
align := uintptr(16)
if typ != nil {
align = uintptr(typ.align)
}
return persistentalloc(size, align, &memstats.other_sys)
}
// assistG is the G to charge for this allocation, or nil if
// GC is not currently active.
var assistG *g
if gcBlackenEnabled != 0 {
// Charge the current user G for this allocation.
assistG = getg()
if assistG.m.curg != nil {
assistG = assistG.m.curg
}
// Charge the allocation against the G. We'll account
// for internal fragmentation at the end of mallocgc.
assistG.gcAssistBytes -= int64(size)
if assistG.gcAssistBytes < 0 {
// This G is in debt. Assist the GC to correct
// this before allocating. This must happen
// before disabling preemption.
gcAssistAlloc(assistG)
}
}
// Set mp.mallocing to keep from being preempted by GC.
mp := acquirem()
if mp.mallocing != 0 {
throw("malloc deadlock")
}
if mp.gsignal == getg() {
throw("malloc during signal")
}
mp.mallocing = 1
shouldhelpgc := false
dataSize := size
c := gomcache()
var x unsafe.Pointer
noscan := typ == nil || typ.kind&kindNoPointers != 0
if size <= maxSmallSize {
if noscan && size < maxTinySize {
// Tiny allocator.
//
// Tiny allocator combines several tiny allocation requests
// into a single memory block. The resulting memory block
// is freed when all subobjects are unreachable. The subobjects
// must be noscan (don't have pointers), this ensures that
// the amount of potentially wasted memory is bounded.
//
// Size of the memory block used for combining (maxTinySize) is tunable.
// Current setting is 16 bytes, which relates to 2x worst case memory
// wastage (when all but one subobjects are unreachable).
// 8 bytes would result in no wastage at all, but provides less
// opportunities for combining.
// 32 bytes provides more opportunities for combining,
// but can lead to 4x worst case wastage.
// The best case winning is 8x regardless of block size.
//
// Objects obtained from tiny allocator must not be freed explicitly.
// So when an object will be freed explicitly, we ensure that
// its size >= maxTinySize.
//
// SetFinalizer has a special case for objects potentially coming
// from tiny allocator, it such case it allows to set finalizers
// for an inner byte of a memory block.
//
// The main targets of tiny allocator are small strings and
// standalone escaping variables. On a json benchmark
// the allocator reduces number of allocations by ~12% and
// reduces heap size by ~20%.
off := c.tinyoffset
// Align tiny pointer for required (conservative) alignment.
if size&7 == 0 {
off = round(off, 8)
} else if size&3 == 0 {
off = round(off, 4)
} else if size&1 == 0 {
off = round(off, 2)
}
if off+size <= maxTinySize && c.tiny != 0 {
// The object fits into existing tiny block.
x = unsafe.Pointer(c.tiny + off)
c.tinyoffset = off + size
c.local_tinyallocs++
mp.mallocing = 0
releasem(mp)
return x
}
// Allocate a new maxTinySize block.
span := c.alloc[tinySizeClass]
v := nextFreeFast(span)
if v == 0 {
v, _, shouldhelpgc = c.nextFree(tinySizeClass)
}
x = unsafe.Pointer(v)
(*[2]uint64)(x)[0] = 0
(*[2]uint64)(x)[1] = 0
// See if we need to replace the existing tiny block with the new one
// based on amount of remaining free space.
if size < c.tinyoffset || c.tiny == 0 {
c.tiny = uintptr(x)
c.tinyoffset = size
}
size = maxTinySize
} else {
var sizeclass uint8
if size <= smallSizeMax-8 {
sizeclass = size_to_class8[(size+smallSizeDiv-1)/smallSizeDiv]
} else {
sizeclass = size_to_class128[(size-smallSizeMax+largeSizeDiv-1)/largeSizeDiv]
}
size = uintptr(class_to_size[sizeclass])
span := c.alloc[sizeclass]
v := nextFreeFast(span)
if v == 0 {
v, span, shouldhelpgc = c.nextFree(sizeclass)
}
x = unsafe.Pointer(v)
if needzero && span.needzero != 0 {
memclrNoHeapPointers(unsafe.Pointer(v), size)
}
}
} else {
var s *mspan
shouldhelpgc = true
systemstack(func() {
s = largeAlloc(size, needzero)
})
s.freeindex = 1
s.allocCount = 1
x = unsafe.Pointer(s.base())
size = s.elemsize
}
var scanSize uintptr
if noscan {
heapBitsSetTypeNoScan(uintptr(x))
} else {
// If allocating a defer+arg block, now that we've picked a malloc size
// large enough to hold everything, cut the "asked for" size down to
// just the defer header, so that the GC bitmap will record the arg block
// as containing nothing at all (as if it were unused space at the end of
// a malloc block caused by size rounding).
// The defer arg areas are scanned as part of scanstack.
if typ == deferType {
dataSize = unsafe.Sizeof(_defer{})
}
heapBitsSetType(uintptr(x), size, dataSize, typ)
if dataSize > typ.size {
// Array allocation. If there are any
// pointers, GC has to scan to the last
// element.
if typ.ptrdata != 0 {
scanSize = dataSize - typ.size + typ.ptrdata
}
} else {
scanSize = typ.ptrdata
}
c.local_scan += scanSize
}
//其下都是各种情况判断和异常管理
//GC管理和GC的处理
// Ensure that the stores above that initialize x to
// type-safe memory and set the heap bits occur before
// the caller can make x observable to the garbage
// collector. Otherwise, on weakly ordered machines,
// the garbage collector could follow a pointer to x,
// but see uninitialized memory or stale heap bits.
//这个提醒是注意不同的机器的内存序
publicationBarrier()
// Allocate black during GC.
// All slots hold nil so no scanning is needed.
// This may be racing with GC so do it atomically if there can be
// a race marking the bit.
if gcphase != _GCoff {
gcmarknewobject(uintptr(x), size, scanSize)
}
if raceenabled {
racemalloc(x, size)
}
if msanenabled {
msanmalloc(x, size)
}
mp.mallocing = 0
releasem(mp)
if debug.allocfreetrace != 0 {
tracealloc(x, size, typ)
}
if rate := MemProfileRate; rate > 0 {
if size < uintptr(rate) && int32(size) < c.next_sample {
c.next_sample -= int32(size)
} else {
mp := acquirem()
profilealloc(mp, x, size)
releasem(mp)
}
}
if assistG != nil {
// Account for internal fragmentation in the assist
// debt now that we know it.
assistG.gcAssistBytes -= int64(size - dataSize)
}
//启动GC
if shouldhelpgc && gcShouldStart(false) {
gcStart(gcBackgroundMode, false)
}
return x
}
极小内存的分析过程:
off := c.tinyoffset
// Align tiny pointer for required (conservative) alignment.
if size&7 == 0 {
off = round(off, 8)
} else if size&3 == 0 {
off = round(off, 4)
} else if size&1 == 0 {
off = round(off, 2)
}
if off+size <= maxTinySize && c.tiny != 0 {
// The object fits into existing tiny block.
x = unsafe.Pointer(c.tiny + off)
c.tinyoffset = off + size
c.local_tinyallocs++
mp.mallocing = 0
releasem(mp)
return x
}
// Allocate a new maxTinySize block.
span := c.alloc[tinySizeClass]
v := nextFreeFast(span)
if v == 0 {
v, _, shouldhelpgc = (tinySizeClass)
}
x = unsafe.Pointer(v)
(*[2]uint64)(x)[0] = 0
(*[2]uint64)(x)[1] = 0
// See if we need to replace the existing tiny block with the new one
// based on amount of remaining free space.
if size < c.tinyoffset || c.tiny == 0 {
c.tiny = uintptr(x)
c.tinyoffset = size
}
size = maxTinySize
这里分配首先处理内存对齐,然后将小对象根据条件拼接分配,如果空间不足,则调用nextFreeFast、nextFree分配内存。
小内存的分配过程:
var sizeclass uint8
if size <= smallSizeMax-8 {
sizeclass = size_to_class8[(size+smallSizeDiv-1)/smallSizeDiv]
} else {
sizeclass = size_to_class128[(size-smallSizeMax+largeSizeDiv-1)/largeSizeDiv]
}
size = uintptr(class_to_size[sizeclass])
span := c.alloc[sizeclass]
v := nextFreeFast(span)
if v == 0 {
v, span, shouldhelpgc = c.nextFree(sizeclass)
}
x = unsafe.Pointer(v)
if needzero && span.needzero != 0 {
memclrNoHeapPointers(unsafe.Pointer(v), size)
}
这块是比较麻烦的,调用也比较多,深度也是到处乱跳。看到sizeclass 的计算没有,就是为了前面提到的那个67个的数组。从而寻找合适的mspan。看一下相关的代码:
// nextFreeFast returns the next free object if one is quickly available.
// Otherwise it returns 0.
func nextFreeFast(s *mspan) gclinkptr {
theBit := sys.Ctz64(s.allocCache) // Is there a free object in the allocCache?
if theBit < 64 {
result := s.freeindex + uintptr(theBit)
if result < s.nelems {
freeidx := result + 1
if freeidx%64 == 0 && freeidx != s.nelems {
return 0
}
s.allocCache >>= (theBit + 1)
s.freeindex = freeidx
v := gclinkptr(result*s.elemsize + s.base())
s.allocCount++
return v
}
}
return 0
}
// nextFree returns the next free object from the cached span if one is available.
// Otherwise it refills the cache with a span with an available object and
// returns that object along with a flag indicating that this was a heavy
// weight allocation. If it is a heavy weight allocation the caller must
// determine whether a new GC cycle needs to be started or if the GC is active
// whether this goroutine needs to assist the GC.
func (c *mcache) nextFree(sizeclass uint8) (v gclinkptr, s *mspan, shouldhelpgc bool) {
s = c.alloc[sizeclass]
shouldhelpgc = false
freeIndex := s.nextFreeIndex()
if freeIndex == s.nelems {
// The span is full.
if uintptr(s.allocCount) != s.nelems {
println("runtime: s.allocCount=", s.allocCount, "s.nelems=", s.nelems)
throw("s.allocCount != s.nelems && freeIndex == s.nelems")
}
systemstack(func() {
c.refill(int32(sizeclass))
})
shouldhelpgc = true
s = c.alloc[sizeclass]
freeIndex = s.nextFreeIndex()
}
if freeIndex >= s.nelems {
throw("freeIndex is not valid")
}
v = gclinkptr(freeIndex*s.elemsize + s.base())
s.allocCount++
if uintptr(s.allocCount) > s.nelems {
println("s.allocCount=", s.allocCount, "s.nelems=", s.nelems)
throw("s.allocCount > s.nelems")
}
return
}
nextFreeFast这个函数看名字就知道,是从缓存mcache中快速分配相关Object,而nextFree就比较麻烦了,需要向mcentral甚至mheap来分配,看它的说明,先判断有没有空闲的索引对象s.nextFreeIndex,如果有,则使用之。如果没有,则调用c.refill,如果越界直接抛出异常。refill则是从mcentral开始分配内存了。
// Gets a span that has a free object in it and assigns it
// to be the cached span for the given sizeclass. Returns this span.
func (c *mcache) refill(sizeclass int32) *mspan {
_g_ := getg()
_g_.m.locks++
// Return the current cached span to the central lists.
s := c.alloc[sizeclass]
if uintptr(s.allocCount) != s.nelems {
throw("refill of span with free space remaining")
}
if s != &emptymspan {
s.incache = false
}
// Get a new cached span from the central lists.
s = mheap_.central[sizeclass].mcentral.cacheSpan()
if s == nil {
throw("out of memory")
}
if uintptr(s.allocCount) == s.nelems {
throw("span has no free space")
}
c.alloc[sizeclass] = s
_g_.m.locks--
return s
}
它调用:
// Allocate a span to use in an MCache.
func (c *mcentral) cacheSpan() *mspan {
// Deduct credit for this span allocation and sweep if necessary.
spanBytes := uintptr(class_to_allocnpages[c.sizeclass]) * _PageSize
deductSweepCredit(spanBytes, 0)
lock(&c.lock)
sg := mheap_.sweepgen
retry:
var s *mspan
for s = c.nonempty.first; s != nil; s = s.next {
if s.sweepgen == sg-2 && atomic.Cas(&s.sweepgen, sg-2, sg-1) {
c.nonempty.remove(s)
c.empty.insertBack(s)
unlock(&c.lock)
s.sweep(true)
goto havespan
}
if s.sweepgen == sg-1 {
// the span is being swept by background sweeper, skip
continue
}
// we have a nonempty span that does not require sweeping, allocate from it
c.nonempty.remove(s)
c.empty.insertBack(s)
unlock(&c.lock)
goto havespan
}
for s = c.empty.first; s != nil; s = s.next {
if s.sweepgen == sg-2 && atomic.Cas(&s.sweepgen, sg-2, sg-1) {
// we have an empty span that requires sweeping,
// sweep it and see if we can free some space in it
c.empty.remove(s)
// swept spans are at the end of the list
c.empty.insertBack(s)
unlock(&c.lock)
s.sweep(true)
freeIndex := s.nextFreeIndex()
if freeIndex != s.nelems {
s.freeindex = freeIndex
goto havespan
}
lock(&c.lock)
// the span is still empty after sweep
// it is already in the empty list, so just retry
goto retry
}
if s.sweepgen == sg-1 {
// the span is being swept by background sweeper, skip
continue
}
// already swept empty span,
// all subsequent ones must also be either swept or in process of sweeping
break
}
unlock(&c.lock)
// Replenish central list if empty.
s = c.grow()
if s == nil {
return nil
}
lock(&c.lock)
c.empty.insertBack(s)
unlock(&c.lock)
// At this point s is a non-empty span, queued at the end of the empty list,
// c is unlocked.
havespan:
cap := int32((s.npages << _PageShift) / s.elemsize)
n := cap - int32(s.allocCount)
if n == 0 || s.freeindex == s.nelems || uintptr(s.allocCount) == s.nelems {
throw("span has no free objects")
}
usedBytes := uintptr(s.allocCount) * s.elemsize
if usedBytes > 0 {
reimburseSweepCredit(usedBytes)
}
atomic.Xadd64(&memstats.heap_live, int64(spanBytes)-int64(usedBytes))
if trace.enabled {
// heap_live changed.
traceHeapAlloc()
}
if gcBlackenEnabled != 0 {
// heap_live changed.
gcController.revise()
}
s.incache = true
freeByteBase := s.freeindex &^ (64 - 1)
whichByte := freeByteBase / 8
// Init alloc bits cache.
s.refillAllocCache(whichByte)
// Adjust the allocCache so that s.freeindex corresponds to the low bit in
// s.allocCache.
s.allocCache >>= s.freeindex % 64
return s
}
// grow allocates a new empty span from the heap and initializes it for c's size class.
func (c *mcentral) grow() *mspan {
npages := uintptr(class_to_allocnpages[c.sizeclass])
size := uintptr(class_to_size[c.sizeclass])
n := (npages << _PageShift) / size
s := mheap_.alloc(npages, c.sizeclass, false, true)
if s == nil {
return nil
}
p := s.base()
s.limit = p + size*n
heapBitsForSpan(s.base()).initSpan(s)
return s
}
func (h *mheap) alloc(npage uintptr, sizeclass int32, large bool, needzero bool) *mspan {
// Don't do any operations that lock the heap on the G stack.
// It might trigger stack growth, and the stack growth code needs
// to be able to allocate heap.
var s *mspan
systemstack(func() {
s = h.alloc_m(npage, sizeclass, large)
})
if s != nil {
if needzero && s.needzero != 0 {
memclrNoHeapPointers(unsafe.Pointer(s.base()), s.npages<<_PageShift)
}
s.needzero = 0
}
return s
}
// Allocate a new span of npage pages from the heap for GC'd memory
// and record its size class in the HeapMap and HeapMapCache.
func (h *mheap) alloc_m(npage uintptr, sizeclass int32, large bool) *mspan {
_g_ := getg()
if _g_ != _g_.m.g0 {
throw("_mheap_alloc not on g0 stack")
}
lock(&h.lock)
// To prevent excessive heap growth, before allocating n pages
// we need to sweep and reclaim at least n pages.
if h.sweepdone == 0 {
// TODO(austin): This tends to sweep a large number of
// spans in order to find a few completely free spans
// (for example, in the garbage benchmark, this sweeps
// ~30x the number of pages its trying to allocate).
// If GC kept a bit for whether there were any marks
// in a span, we could release these free spans
// at the end of GC and eliminate this entirely.
h.reclaim(npage)
}
// transfer stats from cache to global
memstats.heap_scan += uint64(_g_.m.mcache.local_scan)
_g_.m.mcache.local_scan = 0
memstats.tinyallocs += uint64(_g_.m.mcache.local_tinyallocs)
_g_.m.mcache.local_tinyallocs = 0
s := h.allocSpanLocked(npage)
if s != nil {
// Record span info, because gc needs to be
// able to map interior pointer to containing span.
atomic.Store(&s.sweepgen, h.sweepgen)
h.sweepSpans[h.sweepgen/2%2].push(s) // Add to swept in-use list.
s.state = _MSpanInUse
s.allocCount = 0
s.sizeclass = uint8(sizeclass)
if sizeclass == 0 {
s.elemsize = s.npages << _PageShift
s.divShift = 0
s.divMul = 0
s.divShift2 = 0
s.baseMask = 0
} else {
s.elemsize = uintptr(class_to_size[sizeclass])
m := &class_to_divmagic[sizeclass]
s.divShift = m.shift
s.divMul = m.mul
s.divShift2 = m.shift2
s.baseMask = m.baseMask
}
// update stats, sweep lists
h.pagesInUse += uint64(npage)
if large {
memstats.heap_objects++
atomic.Xadd64(&memstats.heap_live, int64(npage<<_PageShift))
// Swept spans are at the end of lists.
if s.npages < uintptr(len(h.free)) {
h.busy[s.npages].insertBack(s)
} else {
h.busylarge.insertBack(s)
}
}
}
// heap_scan and heap_live were updated.
if gcBlackenEnabled != 0 {
gcController.revise()
}
if trace.enabled {
traceHeapAlloc()
}
// h.spans is accessed concurrently without synchronization
// from other threads. Hence, there must be a store/store
// barrier here to ensure the writes to h.spans above happen
// before the caller can publish a pointer p to an object
// allocated from s. As soon as this happens, the garbage
// collector running on another processor could read p and
// look up s in h.spans. The unlock acts as the barrier to
// order these writes. On the read side, the data dependency
// between p and the index in h.spans orders the reads.
unlock(&h.lock)
return s
}
// Allocates a span of the given size. h must be locked.
// The returned span has been removed from the
// free list, but its state is still MSpanFree.
func (h *mheap) allocSpanLocked(npage uintptr) *mspan {
var list *mSpanList
var s *mspan
// Try in fixed-size lists up to max.
for i := int(npage); i < len(h.free); i++ {
list = &h.free[i]
if !list.isEmpty() {
s = list.first
goto HaveSpan
}
}
// Best fit in list of large spans.
list = &h.freelarge
s = h.allocLarge(npage)
if s == nil {
if !h.grow(npage) {
return nil
}
s = h.allocLarge(npage)
if s == nil {
return nil
}
}
HaveSpan:
// Mark span in use.
if s.state != _MSpanFree {
throw("MHeap_AllocLocked - MSpan not free")
}
if s.npages < npage {
throw("MHeap_AllocLocked - bad npages")
}
list.remove(s)
if s.inList() {
throw("still in list")
}
if s.npreleased > 0 {
sysUsed(unsafe.Pointer(s.base()), s.npages<<_PageShift)
memstats.heap_released -= uint64(s.npreleased << _PageShift)
s.npreleased = 0
}
if s.npages > npage {
// Trim extra and put it back in the heap.
t := (*mspan)(h.spanalloc.alloc())
t.init(s.base()+npage<<_PageShift, s.npages-npage)
s.npages = npage
p := (t.base() - h.arena_start) >> _PageShift
if p > 0 {
h.spans[p-1] = s
}
h.spans[p] = t
h.spans[p+t.npages-1] = t
t.needzero = s.needzero
s.state = _MSpanStack // prevent coalescing with s
t.state = _MSpanStack
h.freeSpanLocked(t, false, false, s.unusedsince)
s.state = _MSpanFree
}
s.unusedsince = 0
p := (s.base() - h.arena_start) >> _PageShift
for n := uintptr(0); n < npage; n++ {
h.spans[p+n] = s
}
memstats.heap_inuse += uint64(npage << _PageShift)
memstats.heap_idle -= uint64(npage << _PageShift)
//println("spanalloc", hex(s.start<<_PageShift))
if s.inList() {
throw("still in list")
}
return s
}
// Try to add at least npage pages of memory to the heap,
// returning whether it worked.
//
// h must be locked.
func (h *mheap) grow(npage uintptr) bool {
// Ask for a big chunk, to reduce the number of mappings
// the operating system needs to track; also amortizes
// the overhead of an operating system mapping.
// Allocate a multiple of 64kB.
npage = round(npage, (64<<10)/_PageSize)
ask := npage << _PageShift
if ask < _HeapAllocChunk {
ask = _HeapAllocChunk
}
v := h.sysAlloc(ask)
if v == nil {
if ask > npage<<_PageShift {
ask = npage << _PageShift
v = h.sysAlloc(ask)
}
if v == nil {
print("runtime: out of memory: cannot allocate ", ask, "-byte block (", memstats.heap_sys, " in use)\n")
return false
}
}
// Create a fake "in use" span and free it, so that the
// right coalescing happens.
s := (*mspan)(h.spanalloc.alloc())
s.init(uintptr(v), ask>>_PageShift)
p := (s.base() - h.arena_start) >> _PageShift
for i := p; i < p+s.npages; i++ {
h.spans[i] = s
}
atomic.Store(&s.sweepgen, h.sweepgen)
s.state = _MSpanInUse
h.pagesInUse += uint64(s.npages)
h.freeSpanLocked(s, false, true, 0)
return true
}
// sysAlloc allocates the next n bytes from the heap arena. The
// returned pointer is always _PageSize aligned and between
// h.arena_start and h.arena_end. sysAlloc returns nil on failure.
// There is no corresponding free function.
func (h *mheap) sysAlloc(n uintptr) unsafe.Pointer {
if n > h.arena_end-h.arena_used {
// We are in 32-bit mode, maybe we didn't use all possible address space yet.
// Reserve some more space.
p_size := round(n+_PageSize, 256<<20)
new_end := h.arena_end + p_size // Careful: can overflow
if h.arena_end <= new_end && new_end-h.arena_start-1 <= _MaxArena32 {
// TODO: It would be bad if part of the arena
// is reserved and part is not.
var reserved bool
p := uintptr(sysReserve(unsafe.Pointer(h.arena_end), p_size, &reserved))
if p == 0 {
return nil
}
if p == h.arena_end {
h.arena_end = new_end
h.arena_reserved = reserved
} else if h.arena_start <= p && p+p_size-h.arena_start-1 <= _MaxArena32 {
// Keep everything page-aligned.
// Our pages are bigger than hardware pages.
h.arena_end = p + p_size
used := p + (-p & (_PageSize - 1))
h.mapBits(used)
h.mapSpans(used)
h.arena_used = used
h.arena_reserved = reserved
} else {
// We haven't added this allocation to
// the stats, so subtract it from a
// fake stat (but avoid underflow).
stat := uint64(p_size)
sysFree(unsafe.Pointer(p), p_size, &stat)
}
}
}
if n <= h.arena_end-h.arena_used {
// Keep taking from our reservation.
p := h.arena_used
sysMap(unsafe.Pointer(p), n, h.arena_reserved, &memstats.heap_sys)
h.mapBits(p + n)
h.mapSpans(p + n)
h.arena_used = p + n
if raceenabled {
racemapshadow(unsafe.Pointer(p), n)
}
if p&(_PageSize-1) != 0 {
throw("misrounded allocation in MHeap_SysAlloc")
}
return unsafe.Pointer(p)
}
// If using 64-bit, our reservation is all we have.
if h.arena_end-h.arena_start > _MaxArena32 {
return nil
}
// On 32-bit, once the reservation is gone we can
// try to get memory at a location chosen by the OS.
p_size := round(n, _PageSize) + _PageSize
p := uintptr(sysAlloc(p_size, &memstats.heap_sys))
if p == 0 {
return nil
}
if p < h.arena_start || p+p_size-h.arena_start > _MaxArena32 {
top := ^uintptr(0)
if top-h.arena_start-1 > _MaxArena32 {
top = h.arena_start + _MaxArena32 + 1
}
print("runtime: memory allocated by OS (", hex(p), ") not in usable range [", hex(h.arena_start), ",", hex(top), ")\n")
sysFree(unsafe.Pointer(p), p_size, &memstats.heap_sys)
return nil
}
p_end := p + p_size
p += -p & (_PageSize - 1)
if p+n > h.arena_used {
h.mapBits(p + n)
h.mapSpans(p + n)
h.arena_used = p + n
if p_end > h.arena_end {
h.arena_end = p_end
}
if raceenabled {
racemapshadow(unsafe.Pointer(p), n)
}
}
if p&(_PageSize-1) != 0 {
throw("misrounded allocation in MHeap_SysAlloc")
}
return unsafe.Pointer(p)
}
不用看别的,就看函数的类成员变量定义,就知道这一路从mcache->mcentral->mheap->os。
大内存的分配过程:
var s *mspan
shouldhelpgc = true
systemstack(func() {
s = largeAlloc(size, needzero)
})
s.freeindex = 1
s.allocCount = 1
x = unsafe.Pointer(s.base())
size = s.elemsize
func largeAlloc(size uintptr, needzero bool) *mspan {
// print("largeAlloc size=", size, "\n")
if size+_PageSize < size {
throw("out of memory")
}
npages := size >> _PageShift
if size&_PageMask != 0 {
npages++
}
// Deduct credit for this span allocation and sweep if
// necessary. mHeap_Alloc will also sweep npages, so this only
// pays the debt down to npage pages.
deductSweepCredit(npages*_PageSize, npages)
s := mheap_.alloc(npages, 0, true, needzero)
if s == nil {
throw("out of memory")
}
s.limit = s.base() + size
heapBitsForSpan(s.base()).initSpan(s)
return s
}
GC是个技术活,但分配也是个技术活。没有分配,哪来的GC?等程序启动后,GC和分配就成了一个循环的过程。阅读代码,重点掌握设计思想 、框架流程以及重点算法、关键路径的实现。对一些小细节该放要放开。