0

据我所知,这个问题不是重复的,因为我几天来一直在寻找解决方案,根本无法确定问题。我正在尝试使用 Python 从 XML 文档标记中打印嵌套属性。我相信我遇到的错误与我试图从中获取信息的标签具有多个属性这一事实有关。有什么方法可以指定我想要“第二标签”标签中的“状态”值吗?非常感谢您的帮助。

我的 XML 文档“test.xml”:

<?xml version="1.0" encoding="UTF-8"?>
<first-tag xmlns="http://somewebsite.com/" date-produced="20130703" lang="en" produced-   by="steve" status="OFFLINE">
    <second-tag country="US" id="3651653" lang="en" status="ONLINE">
    </second-tag>
</first-tag>

我的 Python 文件:

import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()
whatiwant = root.find('second-tag').get('status')
print whatiwant

错误:

AttributeError: 'NoneType' object has no attribute 'get'
4

3 回答 3

2

你在 .find('second-tag') 上失败了,而不是在 .get 上。

对于你想要的和你的习惯,BeautifulSoup 大放异彩。

from BeautifulSoup import BeautifulStoneSoup
soup = BeautifulStoneSoup(xml_string)
whatyouwant = soup.find('second-tag')['status']
于 2013-07-03T19:41:54.870 回答
0

The problem here is that there is no tag named second-tag here. There's a tag named {http://somewebsite.com/}second-tag.

You can see this pretty easily:

>>> print(root.getchildren())
[<Element '{http://somewebsite.com/}second-tag' at 0x105b24190>]

A non-namespace-compliant XML parser might do the wrong thing and ignore that, making your code work. A parser that bends over backward to be friendly (like BeautifulSoup) will, in effect, automatically try {http://somewebsite.com/}second-tag when you ask for second-tag. But ElementTree is neither.

If that isn't all you need to know, you first need to read a tutorial on namespaces (maybe this one).

于 2013-07-03T19:57:24.943 回答
0

我不知道 elementtree 但我会用 ehp 或 easyhtmlparser 这里是链接。 http://easyhtmlparser.sourceforge.net/ 一位朋友告诉我这个工具,我还在学习它,它非常好和简单。

from ehp import *

data = '''<?xml version="1.0" encoding="UTF-8"?>
<first-tag xmlns="http://somewebsite.com/" date-produced="20130703" lang="en" produced-   by="steve" status="OFFLINE">
    <second-tag country="US" id="3651653" lang="en" status="ONLINE">
    </second-tag>
</first-tag>'''

html  = Html()
dom   = html.feed(data)
item = dom.fst('second-tag')
value = item.attr['status']
print value
于 2013-07-03T19:43:32.350 回答