python - 类属性的多个值

Question

我正在尝试使用 beautifulsoup 为来自维基百科的人获取生日。例如http://en.wikipedia.org/wiki/Ezra_Taft_Benson的生日是 1899 年 8 月 4 日。要到生日，我使用以下代码：

bday = url.find("span", class_="bday")

bday但是，它正在获取作为另一个标签的一部分出现在 html 代码中的实例。即<span class="bday dtstart published updated">1985-11-10 </span>。

有没有办法只匹配确切的类标签bday？

我希望这个问题很清楚，因为目前我得到的bday是 1985-11-10，这不是正确的日期。

score 4 · Accepted Answer

当 BeautifulSoup 的所有其他匹配方法都失败时，您可以使用带有单个参数（标签）的函数：

>>> url.find(lambda tag: tag.name == 'span' and tag.get('class', []) == ['bday'])
<span class="bday">1899-08-04</span>

上面搜索了一个span标签，其类属性是单个元素的列表（'bday'）。

score 1 · Accepted Answer

我会这样做：

import urllib
from BeautifulSoup import BeautifulSoup

url = 'http://en.wikipedia.org/wiki/Ezra_Taft_Benson'
file_pointer = urllib.urlopen(url)
html_object = BeautifulSoup(file_pointer)

bday = html_object('span',{'class':'bday'})[0].contents[0]

这将1899-08-04作为值返回bday

score 0 · Accepted Answer

尝试将lxml与beautifulsoup解析器一起使用。以下查找<span>仅具有类的标签bday（在此页面的情况下只有一个）：

>>> from lxml.html.soupparser import fromstring
>>> root = fromstring(open('Ezra_Taft_Benson'))
>>> span_bday_nodes = root.findall('.//span[@class="bday"]')
[<Element span at 0x1be9290>]
>>> span_bday_node[0].text
'1899-08-04'

python - 类属性的多个值

3 回答 3

Related

Reference