python - 美丽的汤：访问
元素来自
没有身份证

Question

这是现有的代码：

hdr = {'User-Agent': 'Mozilla/5.0'}
site = "http://en.wikipedia.org/wiki/"+"january"+"_"+"1"
req = urllib2.Request(site,headers=hdr)    
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)

print soup

这一切都很好，我得到了整个 HTML 页面，但我想要特定的数据，我不知道如何在没有 id 的情况下使用 Beautiful Soup 访问它。<ul>标签没有id，标签也没有<li>。另外，我不能只要求每个<li>标签，因为页面上还有其他列表。有没有一种特定的方式来调用给定的列表？（我不能只对这一页使用修复，因为我计划遍历所有日期并获取每一页的生日，并且我不能保证每一页的布局都与这一页完全相同）。

score 9 · Accepted Answer

这个想法是获取span带有Birthsid 的，找到父级的下一个兄弟（即ul）并迭代它的li元素。这是一个完整的例子requests（虽然它不相关）：

from bs4 import BeautifulSoup as Soup, Tag

import requests


response = requests.get("http://en.wikipedia.org/wiki/January_1")
soup = Soup(response.content)

births_span = soup.find("span", {"id": "Births"})
births_ul = births_span.parent.find_next_sibling()

for item in births_ul.findAll('li'):
    if isinstance(item, Tag):
        print item.text

印刷：

871 – Zwentibold, Frankish son of Arnulf of Carinthia (d. 900)
1431 – Pope Alexander VI (d. 1503)
1449 – Lorenzo de' Medici, Italian politician (d. 1492)
1467 – Sigismund I the Old, Polish king (d. 1548)
1484 – Huldrych Zwingli, Swiss pastor and theologian (d. 1531)
1511 – Henry, Duke of Cornwall (d. 1511)
1516 – Margaret Leijonhufvud, Swedish wife of Gustav I of Sweden (d. 1551)
...

希望有帮助。

score 6 · Accepted Answer

找到出生部分：

section = soup.find('span', id='Births').parent

然后找到下一个无序列表：

births = section.find_next('ul').find_all('li')

python - 美丽的汤：访问元素来自没有身份证

2 回答 2

Related

Reference

python - 美丽的汤：访问
元素来自
没有身份证