在 Python 2.6(及更早版本)中,您需要手动注册命名空间并将其解析为 Clark 表示法,然后 forfind()
才能识别它们。
首先,按照http://effbot.org/zone/element-namespaces.htm中的描述注册命名空间:
from xml import ElementTree
try:
register_namespace = ElementTree.register_namespace
except AttributeError:
def register_namespace(prefix, uri):
ElementTree._namespace_map[uri] = prefix
for short_name, url in NS_PREFIXES.items():
register_namespace(short_name, url)
接下来,您需要自己将命名空间 XPath 解析为find()
内部使用的 Clark 表示法。例如,awis:Title
解析为{http://awis.amazonaws.com/doc/2005-07-11}Title
:
def resolved_xpath(xpath, namespace):
result = xpath
for short_name, url in namespace.items():
result = re.sub(r'\b' + short_name + ':', '{' + url + '}', result)
return result
现在,即使使用 Python 2.6,也可以轻松编写修改后find()
的findall()
命名空间:
def find_with_namespace(element, xpath, namespace):
return element.find(resolved_xpath(xpath, namespace))
def findall_with_namespace(element, xpath, namespace):
return element.findall(resolved_xpath(xpath, namespace))
您的示例可以实现为:
NS_PREFIXES = {
"alexa": "http://alexa.amazonaws.com/doc/2005-10-05/",
"awis": "http://awis.amazonaws.com/doc/2005-07-11",
}
tree = api.sites_linking_in(domain + ".eu", count=10, start=0)
alexa_sites_linkin_in = {}
for element in findall_with_namespace(tree, '//awis:SitesLinkingIn/awis:Site',NS_PREFIXES):
title = find_with_namespace(element, 'awis:Title', NS_PREFIXES).text
url = find_with_namespace(element, 'awis:Url', NS_PREFIXES).text
alexa_sites_linkin_in[title] = url
所以,是的,如果可能的话,使用lxml
.