xerces-c++是一XML解析器。在讲其内存管理策略之前,需要先讲一下一个奇怪的new用法,之后会继续介绍它的内存管理策略,最后会说明它是如何耗费大量内存的。
DOMAttr *DOMDocumentImpl::createAttribute(const XMLCh *nam)
{
if(!nam || !isXMLName(nam))
throw DOMException(DOMException::INVALID_CHARACTER_ERR,0, getMemoryManager());
return new (this, DOMMemoryManager::ATTR_OBJECT) DOMAttrImpl(this,nam);
}
placement new的用法一般为 new (address) (type) initializer 的形式,address是一个指针指向已经存在的一块内存,而上面的代码却有两个参数紧跟在new后面。于是查了下官方文档,还真有这种用法:
new expression
C++ C++ language Expressions
Creates and initializes objects with dynamic storage duration, that is, objects whose lifetime is not necessarily limited by the scope in which they were created.
Syntax
::(optional) new ( type ) initializer(optional) (1)
::(optional) new new-type initializer(optional) (2)
::(optional) new (placement-params) ( type ) initializer(optional) (3)
::(optional) new (placement-params) new-type initializer(optional) (4)
官方文档还有一个例子与xerces代码很像:
new(2,f) T; // calls operator new(sizeof(T), 2, f)
都是相当于下面两步:
inline void * operator new(size_t amt, DOMDocumentImpl *doc, DOMMemoryManager::NodeObjectType type)
{
void *p = doc->allocate(amt, type);
return p;
}
DOMAttrImpl::DOMAttrImpl(DOMDocument *ownerDoc, const XMLCh *aName)
: fNode(ownerDoc), fParent (ownerDoc), fSchemaType(0)
{
DOMDocumentImpl *docImpl = (DOMDocumentImpl *)ownerDoc;
fName = docImpl->getPooledString(aName);
fNode.isSpecified(true);
}
所以,看着怪异,其实和一般的placement new没什么大的区别。
通过上面被重载的new函数可以看到它调用了DOMDocumentImpl::allocate(amt, type), 而后者又会调用到:
void* DOMDocumentImpl::allocate(XMLSize_t amount)
{
// Align the request size so that suballocated blocks
// beyond this one will be maintained at the same alignment.
amount = XMLPlatformUtils::alignPointerForNewBlockAllocation(amount);
// If the request is for a largish block, hand it off to the system
// allocator. The block still must be linked into the list of
// allocated blocks so that it will be deleted when the time comes.
if (amount > kMaxSubAllocationSize)
{
// The size of the header we add to our raw blocks
XMLSize_t sizeOfHeader = XMLPlatformUtils::alignPointerForNewBlockAllocation(sizeof(void *));
// Try to allocate the block
void* newBlock;
newBlock = fMemoryManager->allocate(sizeOfHeader + amount);
// Link it into the list beyond current block, as current block
// is still being subdivided. If there is no current block
// then track that we have no bytes to further divide.
if (fCurrentBlock)
{
*(void **)newBlock = *(void **)fCurrentBlock;
*(void **)fCurrentBlock = newBlock;
}
else
{
*(void **)newBlock = 0;
fCurrentBlock = newBlock;
fFreePtr = 0;
fFreeBytesRemaining = 0;
}
void *retPtr = (char*)newBlock + sizeOfHeader;
return retPtr;
}
// It's a normal (sub-allocatable) request.
// Are we out of room in our current block?
if (amount > fFreeBytesRemaining)
{
// Request doesn't fit in the current block.
// The size of the header we add to our raw blocks
XMLSize_t sizeOfHeader = XMLPlatformUtils::alignPointerForNewBlockAllocation(sizeof(void *));
// Get a new block from the system allocator.
void* newBlock;
newBlock = fMemoryManager->allocate(fHeapAllocSize);
*(void **)newBlock = fCurrentBlock;
fCurrentBlock = newBlock;
fFreePtr = (char *)newBlock + sizeOfHeader;
fFreeBytesRemaining = fHeapAllocSize - sizeOfHeader;
if(fHeapAllocSize<kMaxHeapAllocSize)
fHeapAllocSize*=2;
}
// Subdivide the request off current block
void *retPtr = fFreePtr;
fFreePtr += amount;
fFreeBytesRemaining -= amount;
return retPtr;
}
这便是内存分配的核心代码,逻辑不复杂,可以总结为以下几点:
DOMDocumentImpl是对外的接口,要想创建节点(Node)就必须通过一系列的createXXX来创建(工厂模式?),比如createAttribute,createElement,而这些create函数都会走allocate函数。也就是说每个Attribute/Element实例都来自链表上的大块内存。这个策略让我想起了《Effecive C++》也有类似的代码。
这些节点中途不会释放,直到最后要释放整个Document时才一起释放。请参考以下代码:
DOMDocumentImpl::~DOMDocumentImpl()
{
...
// Delete the heap for this document. This uncerimoniously yanks the storage
// out from under all of the nodes in the document. Destructors are NOT called.
this->deleteHeap();``
}`
void DOMDocumentImpl::deleteHeap()
{
while (fCurrentBlock != 0)
{
void *nextBlock = *(void **)fCurrentBlock;
fMemoryManager->deallocate(fCurrentBlock);
fCurrentBlock = nextBlock;
}
}
根本原因是描述节点、属性等的数据结构太大,可以想像成重型卡车(每个节点或属性)只拉一袋大米。
既然所有的节点、属性等类的实例内存分配都走allocate,那我们就让它打印出为哪个类分配了多少字节,看看每辆卡车自身多重?
void * DOMDocumentImpl::allocate(XMLSize_t amount, DOMMemoryManager::NodeObjectType type)
{
static std::map<int, std::string> maps = {
{DOMMemoryManager::NodeObjectType::ATTR_OBJECT , "ATTR_OBJECT"},
{DOMMemoryManager::NodeObjectType::ATTR_NS_OBJECT , "ATTR_NS_OBJECT"},
{DOMMemoryManager::NodeObjectType::CDATA_SECTION_OBJECT , "CDATA_SECTION_OBJECT"},
{DOMMemoryManager::NodeObjectType::COMMENT_OBJECT , "COMMENT_OBJECT"},
{DOMMemoryManager::NodeObjectType::DOCUMENT_FRAGMENT_OBJECT , "DOCUMENT_FRAGMENT_OBJECT"},
{DOMMemoryManager::NodeObjectType::DOCUMENT_TYPE_OBJECT , "DOCUMENT_TYPE_OBJECT"},
{DOMMemoryManager::NodeObjectType::ELEMENT_OBJECT , "ELEMENT_OBJECT"},
{DOMMemoryManager::NodeObjectType::ELEMENT_NS_OBJECT , "ELEMENT_NS_OBJECT"},
{DOMMemoryManager::NodeObjectType::ENTITY_OBJECT , "ENTITY_OBJECT"},
{DOMMemoryManager::NodeObjectType::ENTITY_REFERENCE_OBJECT , "ENTITY_REFERENCE_OBJECT"},
{DOMMemoryManager::NodeObjectType::NOTATION_OBJECT , "NOTATION_OBJECT"},
{DOMMemoryManager::NodeObjectType::PROCESSING_INSTRUCTION_OBJECT , "PROCESSING_INSTRUCTION_OBJECT"},
{DOMMemoryManager::NodeObjectType::TEXT_OBJECT , "TEXT_OBJECT"}
};
std::cout<<"New for "<<maps[type]<<" size=0x"<<std::hex<<amount<<std::endl;
if (!fRecycleNodePtr)
return allocate(amount);
DOMNodePtr* ptr = fRecycleNodePtr->operator[](type);
if (!ptr || ptr->empty())
return allocate(amount);
return (void*) ptr->pop();
}
一个简单的XML及日志:
<TopNode>
<SectionOfDataA>
<TestData>MEMORY COST 1 1TestData>
<TestData>MEMORY COST 2 1TestData>
SectionOfDataA>
TopNode>
New for ELEMENT_OBJECT size=0x68
New for TEXT_OBJECT size=0x38
New for ELEMENT_OBJECT size=0x68
New for TEXT_OBJECT size=0x38
New for ELEMENT_OBJECT size=0x68
New for TEXT_OBJECT size=0x38
New for TEXT_OBJECT size=0x38
New for ELEMENT_OBJECT size=0x68
New for TEXT_OBJECT size=0x38
New for TEXT_OBJECT size=0x38
New for TEXT_OBJECT size=0x38
可见DOMElementImpl的SIZE为0x68, DOMTextImpl为0x38, 这得顶多少个字符串!
打开DOMElementImpl.hpp便可以看到这个类包含多少member,这还不包括父类的。
// New data introduced in DOM Level 3
const XMLCh* fInputEncoding;
const XMLCh* fXmlEncoding;
bool fXmlStandalone;
const XMLCh* fXmlVersion;
const XMLCh* fDocumentURI;
DOMConfiguration* fDOMConfiguration;
XMLStringPool fUserDataTableKeys;
RefHash2KeysTableOf<DOMUserDataRecord, PtrHasher>* fUserDataTable;
// Per-Document heap Variables.
// The heap consists of one or more biggish blocks which are
// sub-allocated for individual allocations of nodes, strings, etc.
// The big blocks form a linked list, allowing them to be located for deletion.
//
// There is no provision for deleting suballocated blocks, other than
// deleting the entire heap when the document is deleted.
//
// There is no header on individual sub-allocated blocks.
// The header on big blocks consists only of a single back pointer to
// the previously allocated big block (our linked list of big blocks)
//
//
// revisit - this heap should be encapsulated into its own
// class, rather than hanging naked on Document.
//
void* fCurrentBlock;
char* fFreePtr;
XMLSize_t fFreeBytesRemaining,
fHeapAllocSize;
// To recycle the DOMNode pointer
RefArrayOf<DOMNodePtr>* fRecycleNodePtr;
// To recycle DOMBuffer pointer
RefStackOf<DOMBuffer>* fRecycleBufferPtr;
// Pool of DOMNodeList for getElementsByTagName
DOMDeepNodeListPool<DOMDeepNodeListImpl>* fNodeListPool;
// Other data
DOMDocumentType* fDocType;
DOMElement* fDocElement;
DOMStringPoolEntry** fNameTable;
XMLSize_t fNameTableSize;
DOMNormalizer* fNormalizer;
Ranges* fRanges;
NodeIterators* fNodeIterators;
MemoryManager* fMemoryManager; // configurable memory manager
DOMImplementation* fDOMImplementation;
int fChanges;
bool errorChecking; // Bypass error checking.
然而,只在DOMDocumentImpl::allocate(XMLSize_t amount, DOMMemoryManager::NodeObjectType type)上加log输出还不全,它仅仅统计了Node实例的分配,这一点在第二个参数type上有所体现。让我们仔细看看DOMElementImpl的构造函数:
DOMElementImpl::DOMElementImpl(DOMDocument *ownerDoc, const XMLCh *eName)
: fNode(ownerDoc), fParent(ownerDoc), fAttributes(0), fDefaultAttributes(0)
{
DOMDocumentImpl *docImpl = (DOMDocumentImpl *)ownerDoc;
fName = docImpl->getPooledString(eName); //(1)
setupDefaultAttributes();
if (!fDefaultAttributes) {
fDefaultAttributes = new (docImpl) DOMAttrMapImpl(this); //(2)
fAttributes = new (docImpl) DOMAttrMapImpl(this); //(3)
}
else {
fAttributes = new (docImpl) DOMAttrMapImpl(this, fDefaultAttributes);
}
}
(1) 是为节点字符串分配内存,如“TopNode”,“SectionOfDataA”
(2)(3) 应该和节点的属性有关,这一点我没有验证。
docImpl->getPooledString 在大块内存的基础上又加入了防重复算法(hash的办法),此处不再细说。
最后,如果想把所有的内存分配揪出来,只需要在DOMDocumentImpl::allocate(XMLSize_t amount)内加点log即可,如下,可见耗内存之多。
New for ELEMENT_OBJECT size=0x68
subdivide from old block with amount=68 old fFreeBytesRemainin=3f10 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3ea8 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3e88 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3e68 this=000000000067FEF8
New for TEXT_OBJECT size=0x38
subdivide from old block with amount=38 old fFreeBytesRemainin=3e48 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3e10 this=000000000067FEF8
subdivide from old block with amount=28 old fFreeBytesRemainin=3df0 this=000000000067FEF8
New for ELEMENT_OBJECT size=0x68
subdivide from old block with amount=68 old fFreeBytesRemainin=3dc8 this=000000000067FEF8
subdivide from old block with amount=30 old fFreeBytesRemainin=3d60 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3d30 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3d10 this=000000000067FEF8
New for TEXT_OBJECT size=0x38
subdivide from old block with amount=38 old fFreeBytesRemainin=3cf0 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3cb8 this=000000000067FEF8
subdivide from old block with amount=30 old fFreeBytesRemainin=3c98 this=000000000067FEF8
New for ELEMENT_OBJECT size=0x68
subdivide from old block with amount=68 old fFreeBytesRemainin=3c68 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3c00 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3be0 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3bc0 this=000000000067FEF8
New for TEXT_OBJECT size=0x38
subdivide from old block with amount=38 old fFreeBytesRemainin=3ba0 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3b68 this=000000000067FEF8
subdivide from old block with amount=40 old fFreeBytesRemainin=3b48 this=000000000067FEF8
New for TEXT_OBJECT size=0x38
subdivide from old block with amount=38 old fFreeBytesRemainin=3b08 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3ad0 this=000000000067FEF8
subdivide from old block with amount=30 old fFreeBytesRemainin=3ab0 this=000000000067FEF8
New for ELEMENT_OBJECT size=0x68
subdivide from old block with amount=68 old fFreeBytesRemainin=3a80 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3a18 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=39f8 this=000000000067FEF8
New for TEXT_OBJECT size=0x38
subdivide from old block with amount=38 old fFreeBytesRemainin=39d8 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=39a0 this=000000000067FEF8
subdivide from old block with amount=40 old fFreeBytesRemainin=3980 this=000000000067FEF8
New for TEXT_OBJECT size=0x38
subdivide from old block with amount=38 old fFreeBytesRemainin=3940 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3908 this=000000000067FEF8
subdivide from old block with amount=28 old fFreeBytesRemainin=38e8 this=000000000067FEF8
New for TEXT_OBJECT size=0x38
subdivide from old block with amount=38 old fFreeBytesRemainin=38c0 this=000000000067FEF8
subdivide from old block with amount=20 old fFreeBytesRemainin=3888 this=000000000067FEF8
subdivide from old block with amount=28 old fFreeBytesRemainin=3868 this=000000000067FEF8
subdivide from old block with amount=40 old fFreeBytesRemainin=3840 this=000000000067FEF8
subdivide from old block with amount=40 old fFreeBytesRemainin=3800 this=000000000067FEF8
subdivide from old block with amount=18 old fFreeBytesRemainin=37c0 this=000000000067FEF8
抽取了xerces-c++关于内存管理的代码,便于demo或学习使用,请移步下面的链接下载。