10

Python标准库提供xml.sax.xmlreader.IncrementalParserfeed()方法的接口。Jython 还提供了xml.sax在底层使用 Java SAX 解析器实现的包,但它似乎没有提供IncrementalParser.

没有办法在 Jython 中增量解析 XML 块?乍一看还以为用协程之类的就可以实现greenlet,但马上意识到在Jython中是不能用的。

4

2 回答 2

3

您可以使用StAX. StAX解析器流式传输,但SAX维护一个游标,并允许您使用hasNext()and提取游标处的内容next()

以下代码改编自此 java 示例。请注意,这是我第一次尝试使用jython,所以如果我做了一些非常规的事情,请不要绞死我,但这个例子很有效。

http://www.javacodegeeks.com/2013/05/parsing-xml-using-dom-sax-and-stax-parser-in-java.html

from javax.xml.stream import XMLStreamConstants, XMLInputFactory, XMLStreamReader
from java.io import ByteArrayInputStream;
from java.lang import String

xml = String(
"""<?xml version="1.0" encoding="ISO-8859-1"?>
<employees>
  <employee id="111">
    <firstName>Rakesh</firstName>
    <lastName>Mishra</lastName>
    <location>Bangalore</location>
  </employee>
  <employee id="112">
    <firstName>John</firstName>
    <lastName>Davis</lastName>
    <location>Chennai</location>
  </employee>
  <employee id="113">
    <firstName>Rajesh</firstName>
    <lastName>Sharma</lastName>
    <location>Pune</location>
  </employee>
</employees>
""")

class Employee:
    id = None
    firstName = None
    lastName = None
    location = None

    def __str__(self):
        return self.firstName + " " + self.lastName + "(" + self.id + ") " + self.location

factory = XMLInputFactory.newInstance();
reader = factory.createXMLStreamReader(ByteArrayInputStream(xml.getBytes()))
employees = []
employee = None
tagContent = None

while reader.hasNext():
    event = reader.next();

    if event == XMLStreamConstants.START_ELEMENT:
        if "employee" == reader.getLocalName():
            employee = Employee()
            employee.id = reader.getAttributeValue(0)
    elif event == XMLStreamConstants.CHARACTERS:
        tagContent = reader.getText()
    elif event == XMLStreamConstants.END_ELEMENT:
        if "employee" == reader.getLocalName():
            employees.append(employee)
        elif "firstName" == reader.getLocalName():
            employee.firstName = tagContent
        elif "lastName" == reader.getLocalName():
            employee.lastName = tagContent
        elif "location" == reader.getLocalName():
            employee.location = tagContent

for employee in employees:
    print employee
于 2013-10-22T03:04:14.440 回答
1

您可以直接使用 Java 的 sax 解析器。

from javax.xml.parsers import SAXParserFactory
factory = SAXParserFactory.newInstance()
xmlReader = XMLReaderFactory.createXMLReader()

from org.xml.sax.helpers import DefaultHandler
handler = DefaultHandler() # or use your own handler
xmlReader.setContentHandler(handler)
xmlReader.parse(new InputSource(streamReader))
于 2013-10-16T18:10:54.127 回答