python - 无法使用 Beautiful Soup 和 Selenium 遍历元素以刮取烂番茄评级数据

Question

我试图找到包含评级数据的元素，但我不知道如何遍历它（下面链接的图片）。评论评分和收视率的 span 元素属于同一类 (mop-ratings-wrap__percentage)。我试图通过分别遍历它们各自的 div（'mop-ratings-wrap__half' 和 'mop-ratings-wrap__half Audience-score'）来获取这些元素，但我收到了这个错误：

runfile('/Users/*/.spyder-py3/temp.py', wdir='/Users/*/.spyder-py3')
Traceback (most recent call last):

  File "/Users/*/.spyder-py3/temp.py", line 22, in <module>
    cr=a.find('span', attrs={'class':'mop-ratings-wrap__percentage'})

TypeError: find() takes no keyword arguments

这是我的代码：

# -*- coding: utf-8 -*-
from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd

driver = webdriver.Chrome("/Users/*/Downloads/chromedriver")


critics_rating=[]
audience_rating=[]
driver.get("https://www.rottentomatoes.com/m/bill_and_ted_face_the_music")

content = driver.page_source
soup = BeautifulSoup(content, "lxml")

for a in soup.find('div', attrs={'class':'mop-ratings-wrap__half'}):
      cr=a.find('span', attrs={'class':'mop-ratings-wrap__percentage'})
      critics_rating.append(cr.text)


for b in soup.find('div', attrs={'class':'mop-ratings-wrap__half audience-score'}):
      ar=b.find('span', attrs={'class':'mop-ratings-wrap__percentage'})
      audience_rating.append(ar.text) 

print(critics_rating)

我正在关注这篇文章：https ://www.edureka.co/blog/web-scraping-with-python/#demo

这是我要提取的数据

score 0 · Accepted Answer

我怀疑

soup.find()

返回一个字符串而不是您期望的 bs4 对象。因此你在打电话

"somestring".find()

它不接受关键字参数。

（我会对此发表评论，但我缺乏声誉，抱歉）

score 0 · Accepted Answer

问题出在您的循环中， for a in soup.find('div', attrs={'class':'mop-ratings-wrap__half'}):您返回了一个元素，然后尝试遍历它，这相当于遍历返回的字符串元素的每个字母。现在你不能find在字母上运行方法。解决方案如果您想遍历元素以find在它们之上使用方法，请find_all改用。因为它将返回一个列表webelements，您可以使用循环一个一个地遍历它。

    content = driver.page_source
soup = BeautifulSoup(content, 'html.parser')
ratings =[]
for a in soup.find_all('div', attrs={'class':'mop-ratings-wrap__half'}):
      cr=a.find('span', attrs={'class':'mop-ratings-wrap__percentage'})
      ratings.append(cr.text)

for rating in ratings:
    print(rating.replace("\n", "").strip())

输出：上面的代码将打印：

注意：要打印您想要的结果，以上不是最常见的方式。但我试图回答你的疑问，而不是给出更好的解决方案。您可以使用ratings[0]打印评论评分和ratings[1]打印用户评分。

python - 无法使用 Beautiful Soup 和 Selenium 遍历元素以刮取烂番茄评级数据

2 回答 2

Related

Reference