2

I'm trying to figure out the fastest way to count the number of child elements of a Xerces C++ DOMNode object, as I'm trying to optimise the performance of a Windows application which uses the Xerces 2.6 DOMParser.

It seems most of the time is spent counting and accessing children. Our application needs to iterate every single node in the document to attach data to it using DOMNode::setUserData() and we were initially using DOMNode::getChildNodes(), DOMNodeList::getLength() and DOMNodeList::item(int index) to count and access children, but these are comparatively expensive operations.

A large performance improvement was observed when we used a different idiom of calling DOMNode:: getFirstChild() to get the first child node and invoke DOMNode::getNextSibling() to either access a child at a specific index or count the number of siblings of the first child element to get a total child node count.

However, getNextSibling() remains a bottleneck in our parsing step, so I'm wondering is there an even faster way to traverse and access child elements using Xerces.

4

2 回答 2

0

是的,在我发布后不久,我添加了代码来存储和管理每个节点的子数,这产生了很大的不同。重复访问相同的节点并且每次都重新计算子计数。这是一项相当昂贵的操作,因为 Xerces 本质上是为该节点重建 DOM 结构以保证其活跃性。我们有自己的对象,它封装了一个 Xerces DOMNode 以及我们需要的额外信息,我们使用 DOMNode::setUserData 将我们的对象与相关的 DOMnode 关联起来,现在这似乎是最后剩下的瓶颈。

于 2012-09-19T02:04:08.903 回答
0

问题DOMNodeList在于,它实际上是一个非常简单的列表,因此在代码中可以看到像这样的操作length并且item(i)具有 as 的成本,例如这里的长度:O(n)

XMLSize_t DOMNodeListImpl::getLength() const{
    XMLSize_t count = 0;
    if (fNode) {
        DOMNode *node = fNode->fFirstChild;
        while(node != 0){
            ++count;
            node = castToChildImpl(node)->nextSibling;
        }
    }

    return count;
}

因此,DOMNodeList如果不希望在迭代时更改 DOM 树,则不应使用,因为访问项目O(n)从而使迭代成为一种O(n^2)操作 - 等待发生的灾难(即 xml 文件足够大)。

使用[DOMNode::getFistChild()][2]andDOMNode::getNextSibling()是一个足够好的迭代解决方案:

DOMNode *child = docNode->getFirstChild();
while (child != nullptr) {
    // do something with the node
    ...
    child = child->getNextSibling();
}

正如预期的那样发生在O(n^2).

也可以使用[DOMNodeIterator][3],但为了创建它,需要正确DOMDocument的,当需要迭代时并不总是在手边。

于 2020-09-11T08:28:27.153 回答