0

我想从以下wiki的信息框中获取位置信息。

这是我尝试过的:

r = requests.get('https://en.wikipedia.org/wiki/Alabama_Department_of_Youth_Services_Schools', proxies = proxies)
html_source = r.text
soup = BeautifulSoup(html_source)

school_d['name'] = soup.find('h1', 'firstHeading').get_text()
print soup.find('th', text=re.compile("location")).find_next_sibling()

输出:None

猜测我无法访问该<td>元素,因为它不是兄弟姐妹?

有什么建议吗?

4

1 回答 1

1
>>> table = soup.find("table", class_ = "infobox")
>>> name = table.find("th").text
>>> country = table.find("th",text="Country").parent.find("td").text
>>> table = soup.find("table", class_ = "infobox")
>>> name = table.find("th").text
>>> country = table.find("th",text="Country").parent.find("td").text
>>> country = table.find("th",text="Country").find_next_sibling().text #also works
>>> location =  table.find("th",text="Location").parent.find("td").text
>>> location = table.find("th",text="Location").find_next_sibling().text #also works

类似的东西?

于 2013-08-12T19:03:06.133 回答