2

我正在使用 xml.etree.ElementTree.tostring() 将 etree 元素转换为字符串。但有时我有问题:

xpath = "..."
htmlparser = etree.HTMLParser()
tree = etree.parse(response, htmlparser)
result = tree.xpath(xpath)
xml.etree.ElementTree.tostring(result[0], encoding='utf-8')

错误是:

File "../abc.py", line 165, in abc
    results.append(xml.etree.ElementTree.tostring(result[0], encoding='utf-8'))
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 1127, in tostring
    ElementTree(element).write(file, encoding, method=method)
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 818, in write
    self._root, encoding, default_namespace
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 887, in _namespaces
    _raise_serialization_error(tag)
  File "C:\Python27\lib\xml\etree\ElementTree.py", line 1053, in _raise_serialization_error
    "cannot serialize %r (type %s)" % (text, type(text).__name__)
TypeError: cannot serialize <built-in function Comment> (type builtin_function_or_method)

我该如何解决?

4

1 回答 1

2

看起来像是result[0]评论,您可能想跳过。这样的事情应该做:

etree.HTMLParser(remove_comments=True)

文档

ElementTree 在解析 XML 时会忽略注释和处理指令,而 etree 会读入它们并将它们分别视为 Comment 或 ProcessingInstruction 元素。这在文本内容中发现注释时尤其明显,然后由 Comment 元素拆分。

您可以通过将布尔型 remove_comments 和/或 remove_pis 关键字参数传递给您使用的解析器来禁用此行为。为了方便和支持可移植代码,您还可以使用 etree.ETCompatXMLParser 代替默认的 etree.XMLParser。它试图提供一个尽可能接近 ElementTree 解析器的默认设置。

于 2013-04-23T04:21:30.237 回答