Python标准库提供xml.sax.xmlreader.IncrementalParser
有feed()
方法的接口。Jython 还提供了xml.sax
在底层使用 Java SAX 解析器实现的包,但它似乎没有提供IncrementalParser
.
有没有办法在 Jython 中增量解析 XML 块?乍一看还以为用协程之类的就可以实现greenlet
,但马上意识到在Jython中是不能用的。
Python标准库提供xml.sax.xmlreader.IncrementalParser
有feed()
方法的接口。Jython 还提供了xml.sax
在底层使用 Java SAX 解析器实现的包,但它似乎没有提供IncrementalParser
.
有没有办法在 Jython 中增量解析 XML 块?乍一看还以为用协程之类的就可以实现greenlet
,但马上意识到在Jython中是不能用的。
您可以使用StAX
. StAX
解析器流式传输,但SAX
维护一个游标,并允许您使用hasNext()
and提取游标处的内容next()
。
以下代码改编自此 java 示例。请注意,这是我第一次尝试使用jython,所以如果我做了一些非常规的事情,请不要绞死我,但这个例子很有效。
http://www.javacodegeeks.com/2013/05/parsing-xml-using-dom-sax-and-stax-parser-in-java.html
from javax.xml.stream import XMLStreamConstants, XMLInputFactory, XMLStreamReader
from java.io import ByteArrayInputStream;
from java.lang import String
xml = String(
"""<?xml version="1.0" encoding="ISO-8859-1"?>
<employees>
<employee id="111">
<firstName>Rakesh</firstName>
<lastName>Mishra</lastName>
<location>Bangalore</location>
</employee>
<employee id="112">
<firstName>John</firstName>
<lastName>Davis</lastName>
<location>Chennai</location>
</employee>
<employee id="113">
<firstName>Rajesh</firstName>
<lastName>Sharma</lastName>
<location>Pune</location>
</employee>
</employees>
""")
class Employee:
id = None
firstName = None
lastName = None
location = None
def __str__(self):
return self.firstName + " " + self.lastName + "(" + self.id + ") " + self.location
factory = XMLInputFactory.newInstance();
reader = factory.createXMLStreamReader(ByteArrayInputStream(xml.getBytes()))
employees = []
employee = None
tagContent = None
while reader.hasNext():
event = reader.next();
if event == XMLStreamConstants.START_ELEMENT:
if "employee" == reader.getLocalName():
employee = Employee()
employee.id = reader.getAttributeValue(0)
elif event == XMLStreamConstants.CHARACTERS:
tagContent = reader.getText()
elif event == XMLStreamConstants.END_ELEMENT:
if "employee" == reader.getLocalName():
employees.append(employee)
elif "firstName" == reader.getLocalName():
employee.firstName = tagContent
elif "lastName" == reader.getLocalName():
employee.lastName = tagContent
elif "location" == reader.getLocalName():
employee.location = tagContent
for employee in employees:
print employee
您可以直接使用 Java 的 sax 解析器。
from javax.xml.parsers import SAXParserFactory
factory = SAXParserFactory.newInstance()
xmlReader = XMLReaderFactory.createXMLReader()
from org.xml.sax.helpers import DefaultHandler
handler = DefaultHandler() # or use your own handler
xmlReader.setContentHandler(handler)
xmlReader.parse(new InputSource(streamReader))