python - Python XML 解析

Question

我正在尝试使用 Python 解析从 OCTranspo（渥太华城市巴士公司）检索到的 XML 文件。我的问题是我似乎无法访问子字段，例如纬度和经度。

这是示例 xml 文件的一个大大缩短的版本，它仍然会导致问题：

<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>

<Route xmlns="http://tempuri.org/">

<Trips>
<Trip><TripDestination>Barrhaven Centre</TripDestination
<TripStartTime>19:32</TripStartTime><Latitude>45.285458</Latitude
<Longitude>-75.746786</Longitude></Trip>
</Trips>

</Route>

</soap:Body>
</soap:Envelope>

这是我的代码：

import xml.etree.ElementTree as ET
import urllib

u = urllib.urlopen('https://api.octranspo1.com/v1.1/GetNextTripsForStop', 'appID=7a51d100&apiKey=5c5a8438efc643286006d82071852789&routeNo=95&stopNo=3044')
data = u.read()

f = open('route3044.xml', 'wb')
f.write(data)
f.close()

doc = ET.parse('route3044.xml')

for bus in doc.findall('Trip'):
    lat = bus.findtext('Latitude')
    #NEVER EXECUTES
    print trip

如果我对一个非常简单的 xml 文件（没有soap:Envelope ...）执行相同的代码，那么该代码将完美运行。但是，由于我需要的 xml 是由 OCTranspo 生成的，因此我无法控制格式。

我不确定问题是“命名空间”问题还是 Python 中的错误。

任何援助将不胜感激。

更新：2013 年 9 月 21 日

我将搜索 Lat 和 Lon 的代码更改为：

doc = ET.parse('Stop1A.xml')

for a in doc.findall('{http://schemas.xmlsoap.org/soap/envelope/}Body'):
    for b in a.findall('{http://octranspo.com}GetNextTripsForStopResponse'): 
        for c in b.findall('{http://octranspo.com}GetNextTripsForStopResult'):   
            for d in c.findall('{http://tempuri.org/}Route'):
                for e in d.findall('{http://tempuri.org/}RouteDirection'):
                    direction = e.findtext('{http://tempuri.org/}Direction')
                    if direction == 'Eastbound':
                        for f in e.findall('{http://tempuri.org/}Trips'):
                            for g in f.findall('{http://tempuri.org/}Trip'):
                                lat = g.findtext('{http://tempuri.org/}Latitude')
                                lon = g.findtext('{http://tempuri.org/}Longitude')
                                print lat + ',' + lon
                                print 'Done'

最终结果是我现在可以看到 95 号公路上的“东行”巴士。我知道这段代码不漂亮，但它可以工作。我的下一个目标可能是使用命名空间技巧进行优化。

如果有人想尝试访问该 url，请注意通常会在 5-7 分钟内看到“无巴士”，因为该 url 只是返回最近的 6 辆巴士到车站。三辆巴士去东行，三辆巴士去西行。如果最近的公共汽车超过 7 分钟路程，则返回为空。该代码返回公共汽车的纬度和经度 - 然后我可以使用谷歌地图绘制位置。

凯利

score 2 · Accepted Answer

根据ElementTree 文档：

Element.findall() 仅查找带有标签且是当前元素的直接子元素的元素。（重点补充）

幸运的是，ElementTree支持 XPath

doc.findall('Trip')将（通过 doc 的直接子代搜索）更改为doc.findall('.//Trip')（递归搜索 doc 的子代），它应该可以按预期工作。

score 1 · Accepted Answer

这是获取每次旅行的纬度和经度的简单方法。您不需要遍历每个元素。注意使用.//来查找所有 {http://tempuri.org/}Trip元素。

import xml.etree.ElementTree as ET

doc = ET.parse("temp.xml")     # Your shortened XML document

for bus in doc.findall('.//{http://tempuri.org/}Trip'):
    lat = bus.findtext('{http://tempuri.org/}Latitude')
    lon = bus.findtext('{http://tempuri.org/}Longitude')
    print lat, lon

输出：

45.285458 -75.746786

python - Python XML 解析

2 回答 2

Related

Reference