0

我有以下 xml 文件(其中包含超过 2 GB 的数据):

<events version="1.0">
    <event time="10998.0" type="actend" person="1" link="link36" actType="home"  />
    <event time="10998.0" type="departure" person="1" link="link36" legMode="car"  />
    <event time="10998.0" type="PersonEntersVehicle" person="1" vehicle="1"  />
....
</events>

为了阅读和分析数据,我尝试使用这种方法:http ://boscoh.com/programming/reading-xml-serially.html

但是当我尝试命名空间的事情时:

nsmap = {}
for event, elem in etree.iterparse(xmL, events=('start-ns')):
  ns, url = elem
  nsmap[ns] = url
print(nsmap)

发生错误:

Traceback (most recent call last):

  File "<ipython-input-16-6baf583a11d5>", line 1, in <module>
    runfile('C:/Codezeug/Pypy/01/PlayingAround.py', wdir='C:/Codezeug/Pypy/01')

  File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 668, in runfile
    execfile(filename, namespace)

  File "C:\Users\AppData\Local\Continuum\anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 108, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/Codezeug/Pypy/01/PlayingAround.py", line 22, in <module>
    for event, elem in etree.iterparse(one, events=('start-ns')):

  File "C:\Users\AppData\Local\Continuum\anaconda3\lib\xml\etree\ElementTree.py", line 1218, in iterparse
    pullparser = XMLPullParser(events=events, _parser=parser)

  File "C:\Users\AppData\Local\Continuum\anaconda3\lib\xml\etree\ElementTree.py", line 1261, in __init__
    self._parser._setevents(self._events_queue, events)

ValueError: unknown event 's'

这段代码是如何工作的,为什么它会搜索“s”?

4

1 回答 1

0

您需要提供一个元组

for event, elem in etree.iterparse(xmL, events=('start-ns',)): # added , to make it a tuple

否则它将把字符串解释为可迭代的,并分别尝试每个字符。


您的 XML 不包含命名空间:

t = """<events version="1.0">
    <event time="10998.0" type="actend" person="1" link="link36" actType="home"  />
    <event time="10998.0" type="departure" person="1" link="link36" legMode="car"  />
    <event time="10998.0" type="PersonEntersVehicle" person="1" vehicle="1"  />
</events>"""

with open("data.xml","w") as f: f.write(t)

import xml.etree.ElementTree as etree
with open("data.xml") as f:
    for event, elem in etree.iterparse(f, events=('start-ns', )):
        print (event, elem)

工作,但什么也不打印 - 将 xml 更改为具有命名空间的 xml 以获取输出:

t = """<events version="1.0" xmlns:k="some_namespace">
    <event time="10998.0" type="actend" person="1" link="link36" actType="home"  />
    <event time="10998.0" type="departure" person="1" link="link36" legMode="car"  />
    <event time="10998.0" type="PersonEntersVehicle" person="1" vehicle="1"  />
</events>"""

输出:

start-ns ('k', 'some_namespace')
于 2019-01-03T13:09:32.653 回答