我正在做一个项目,我试图从这个维基百科页面中抓取数据,我想要带有年份的列(恰好是 a <th>
)和第四列“沃尔特迪斯尼公园和度假村”。
代码:
from urllib.request import urlopen
from bs4 import BeautifulSoup
html = urlopen("https://en.wikipedia.org/wiki/The_Walt_Disney_Company#Revenues")
bsObj = BeautifulSoup(html, "html.parser")
t = open("scrape_project.txt", "w")
year = bsObj.find("table", {"class":"wikitable"}).tr.next_sibling.next_sibling.th
money = bsObj.find("table", {"class":"wikitable"}).td.next_sibling.next_sibling.next_sibling.next_sibling
for year_data in year:
year.sup.clear()
print(year.get_text())
for revenue in money:
print(money.get_text())
t.close()
现在,当我通过终端运行它时,所有打印的都是 1991(两次)和 2,794。我需要它来打印沃尔特迪斯尼乐园和度假村的所有年份和相关收入。我也试图让它写入文件“scrape_project.tx”
任何帮助,将不胜感激!