python - iterparse 抛出“找不到元素：第 1 行，第 0 列”，我不知道为什么

Question

我有一个网络应用程序（使用 Twisted），它通过 Internet 接收 xml 块（因为整个 xml 可能不会在一个数据包中完整地出现）。我的想法是慢慢构建收到的 xml 消息。我已经“解决”了来自 xml.etree.ElementTree 的 iterparse。我一直在涉足一些代码，以下（非 Twisted 代码）工作正常：

import xml.etree.ElementTree as etree
from io import StringIO

buff = StringIO(unicode('<notorious><burger/></notorious>'))

for event, elem in etree.iterparse(buff, events=('end',)):
    if elem.tag == 'notorious':
        print(etree.tostring(elem))

然后我构建了以下代码来模拟我端如何接收数据：

import xml.etree.ElementTree as etree
from io import StringIO

chunks = ['<notorious>','<burger/>','</notorious>']
buff = StringIO()

for ch in chunks:
    buff.write(unicode(ch))
    if buff.getvalue() == '<notorious><burger/></notorious>':
        print("it should work now")
    try:
        for event, elem in etree.iterparse(buff, events=('end',)):
            if elem.tag == 'notorious':
                print(etree.tostring(elem))
        except Exception as e:
            print(e)

但是代码吐了出来：

'未找到元素：第 1 行，第 0 列'

我无法绕过它。当第二个示例中的 stringIO 与第一个代码示例中的 stringIO 内容相同时，为什么会发生该错误？

ps：

我知道我不是第一个提出这个问题的人，但没有其他线程回答我的问题。如果我错了，请提供适当的线程。
如果您对使用其他模块有建议，请不要将它们放在答案中。添加评论。

谢谢

score 3 · Accepted Answer

文件对象和类文件对象具有文件位置。一旦它被读/写，文件位置就会前进。您需要<file_object>.seek(..)在将文件对象传递给之前更改文件位置（使用），etree.iterparse以便它可以从文件的开头读取。

...
buff.seek(0) # <-----
for event, elem in etree.iterparse(buff, events=('end',)):
    if elem.tag == 'notorious':
        print(etree.tostring(elem))

score 1 · Accepted Answer

即使在你写完之后你关闭了文件，文件位置指向最后一个 pos。所以你必须使用搜索命令 fd.seek(0) 移动文件 pos 现在你可以使用 et.parse 命令打开和解析文件。

python - iterparse 抛出“找不到元素：第 1 行，第 0 列”，我不知道为什么

2 回答 2

Related

Reference