3

I have a very large (1.8GB) XML document. I'd like to simply find the number of elements with the tag <Product>.

I've got this far:

context = etree.iterparse('./test.xml', tag='Product')
num_elems = 0
for event, elem in context:
    num_elems += 1
print num_elems

It works, but is there a faster way of doing it?

4

1 回答 1

1

Since this works, I take it that memory use is not an issue (iterparse will build a tree of the entire file in memory unless you prune it while iterating over the elements). In that case, save yourself the trouble of iterating and counting in Python and let LXML/libxml handle that in C:

tree = etree.parse("./test.xml")
num_elems = tree.xpath("count(//Product)")    # note: returns a float
于 2012-05-22T13:59:30.170 回答