python - 在 Python 中从解析的 XML 中提取值（仅限）

Question

我正在尝试使用 Beautiful Soup 从 Python 中的一些 XML 中提取一个值（但如果推荐的话，我会很高兴地将它转储为其他任何东西）。考虑以下代码；

global humidity, temperature, weatherdescription, winddescription

query = urllib2.urlopen('http://www.google.com/ig/api?weather="Aberdeen+Scotland"')
weatherxml = query.read()
weathersoup = BeautifulSoup(weatherxml)
query.close()

print weatherxml

这会将苏格兰阿伯丁的天气预报打印为 XML（当前）因此（删除了很多 XML 以防止巨大的文本墙综合症）；

<?xml version="1.0"?><xml_api_reply version="1"><weather module_id="0"
tab_id="0" mobile_row="0" mobile_zipped="1" row="0" section="0"
><forecast_information><city data="Aberdeen, Aberdeen City"/><postal_code data="&quot;Aberdeen Scotland&quot;"/><latitude_e6
data=""/><longitude_e6 data=""/><forecast_date
data="2012-07-31"/><current_date_time data="1970-01-01 00:00:00
+0000"/><unit_system data="US"/></forecast_information><current_conditions><condition
data="Clear"/><temp_f data="55"/><temp_c data="13"/><humidity
data="Humidity: 82%"/><icon
data="/ig/images/weather/sunny.gif"/><wind_condition data="Wind: SE at
8 mph"/></current_conditions>

例如，现在我希望能够使用此 XML 中的天气值填充变量，例如 make temperature = 13。解析它是一场噩梦。

如果我在weathersoup 上使用任何find 函数，我会得到整个标签（例如对于temp_c 它返回"<temp_c data="13">），其他各种函数什么也不返回，或者整个工作表，或者它的一部分。

我如何简单地返回任何给定 XML 标记的 VALUE，而没有混乱的“strip”，或者诉诸正则表达式，或者基本上是破解它？

score 2 · Accepted Answer

2

data要访问element 中的属性temp_c：

weathersoup.temp_c['data']

于 2012-07-31T23:08:34.700 回答

score 0 · Accepted Answer

使用lxml，并与 XPath 友好相处。此示例中的某些内容对您提供的 XML 没有意义，因为它无法正确解析……但希望它能让您了解 XPath 的强大功能。

from lxml import etree
# xmlstr is the string of the input XML data
root = etree.fromstring(xmlstr)

# print the text in all current_date_time elements
for elem in root.xpath('//current_date_time'):
    print elem.text

# print the values for every data attribute in every temp_c element
for value in root.xpath('//temp_c@data'):
    print value

# print the text for only the temp_c elements whose data element is 'Celsius'
for elem in root.xpath('//temp_c[@data="Celsius"]'):
    print elem.text

# print the text for only the temp_c elements that are under the temperatures element, which is under the root.
for elem in root.xpath('/temperatures/temp_c'):
    print elem.text

python - 在 Python 中从解析的 XML 中提取值（仅限）

2 回答 2

Related

Reference