python - AttributeError: NoneType 用于 lxml 解析 getroot 方法

Question

我正在尝试使用 lxml 和 mechanize 废弃网站，但出现错误：

AttributeError：“NoneType”对象没有属性“xpath”

经过一番检查，我发现html没有返回。

有趣的是，这段代码可以在其他网站上运行，只是无法在这个特定的网站上运行（http://www.selangortimes.com）

url = 'http://www.selangortimes.com'
br = mechanize.Browser()
br.set_handle_robots(False)
br.set_handle_refresh(False)
br.addheaders = [('User-Agent', 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)')]
br.open(url)
resp = br.response()
html = lxml.html.parse(resp).getroot()
link_targets = [link.attrib.get('href') for link in html.xpath(expr)]

感谢你的帮助：）

更新： 使用上述代码的工作网站示例 - http://www.themalaysianinsider.com

score 1 · Accepted Answer

您发布的以下代码稍作修改版本，使用lxml 2.3.6和mechanize 0.2.5生成url元素中所有href属性的列表。请注意有关您必须的最新评论。<a>http://www.selangortimes.comimport lxml.html

import mechanize
import lxml.html

url = 'http://www.selangortimes.com'
br = mechanize.Browser()
br.set_handle_robots(False)
br.set_handle_refresh(False)
br.addheaders = [('User-Agent', 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)')]
br.open(url)
resp = br.response()
html = lxml.html.parse(resp).getroot()
link_targets = [link.attrib.get('href') for link in html.xpath('//a')]
print(link_targets)

python - AttributeError: NoneType 用于 lxml 解析 getroot 方法

1 回答 1

Related

Reference