1

我很难弄清楚如何将我自己的 ResolveEntityHandler 绑定到 SAX 解析器。在 SO 那里有这个答案。但不幸的是,我无法在那里重现结果。

当我运行以下代码时,实际上是从上述答案中复制的,刚刚更新到 Python 3,

import io
import xml.sax
from xml.sax.handler import ContentHandler

# Inheriting from EntityResolver and DTDHandler is not necessary
class TestHandler(ContentHandler):

    # This method is only called for external entities. Must return a value.
    def resolveEntity(self, publicID, systemID):
        print ("TestHandler.resolveEntity(): %s %s" % (publicID, systemID))
        return systemID

    def skippedEntity(self, name):
        print ("TestHandler.skippedEntity(): %s" % (name))

    def unparsedEntityDecl(self, name, publicID, systemID, ndata):
        print ("TestHandler.unparsedEntityDecl(): %s %s" % (publicID, systemID))

    def startElement(self, name, attrs):
        summary = attrs.get('summary', '')
        print ('TestHandler.startElement():', summary)

def main(xml_string):
    try:
        parser = xml.sax.make_parser()
        curHandler = TestHandler()
        parser.setContentHandler(curHandler)
        parser.setEntityResolver(curHandler)
        parser.setDTDHandler(curHandler)

        stream = io.StringIO(xml_string)
        parser.parse(stream)
        stream.close()
    except xml.sax.SAXParseException as e:
        print ("ERROR %s" % e)

XML = """<!DOCTYPE test SYSTEM "test.dtd">
<test summary='step: &num;'>Entity: &not;</test>
"""

main(XML)

和外部test.dtd

<!ENTITY num "FOO">
<!ENTITY pic SYSTEM 'bar.gif' NDATA gif>

我得到的是

TestHandler.startElement(): step: 
TestHandler.skippedEntity(): not

Process finished with exit code 0

所以我的问题是:

  1. 为什么resolveEntity从未被调用?
  2. 如何将 ResolveEntityHandler 绑定到您的解析器?
4

1 回答 1

1

您所看到的与Python 3.7.1 中的更改有关

在 3.7.1 版更改:默认情况下,SAX 解析器不再处理一般外部实体以提高安全性。之前,解析器创建网络连接以从 DTD 和实体的文件系统中获取远程文件或加载本地文件。可以使用setFeature()解析器对象和参数上的方法再次启用该功能feature_external_ges

要获得与早期版本相同的行为,请添加以下行:

from xml.sax.handler import feature_external_ges

和(在main函数中)

parser.setFeature(feature_external_ges, True)
于 2019-10-24T06:25:14.437 回答