我对网络抓取相当陌生,我想从 worldometer.com 抓取有关 COVID-19 的数据。但是 selenium 和 BeautifulSoup 只能找到 7 个最新标签。这是代码:
from selenium import webdriver
driver=webdriver.Firefox()
driver.get('https://www.worldometers.info/coronavirus/country/india/')
rise = driver.find_elements_by_class_name("news_li")
num_days = len(rise) print(len(rise)) print for i in range(num_days):
print(rise[i].text)
这是美丽汤的代码:
from bs4 import BeautifulSoup from urllib.request import Request,
urlopen url="https://www.worldometers.info/coronavirus/country/india/"
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
web_byte = urlopen(req).read()
webpage = web_byte.decode('utf-8')
bsobj=BeautifulSoup(webpage, 'html.parser') for k in
bsobj.findAll("li",{"class":"news_li"}):
print(k.find("strong").next_sibling.next_sibling.get_text())
for b in bsobj.findAll("button",{"class":"btn btn-light date-btn"}):
print(b['data-date'])
这是硒的输出:
7 印度新增 1,125 例病例和 12 例新增死亡 [来源]
这是美丽汤的输出:
2,006 例新增死亡 395 例新增死亡 321 例新增死亡 309 例新增死亡 389 例新增死亡 2020-06-17 2020-06-16 2020-06-15 2020-06-14 2020-06-13 2020-06-12