0

我是 python 新手,我一直在尝试从电影页面上抓取观众分数。79%

但是它总是返回空。

<div class="score-board-wrap">
      <slot name="title"></slot>
      <div class="info-container">
        <span id="rating">R</span>
        <slot name="info"></slot>
      </div>
      <div class="scores-container">
        <div class="tomatometer-container" data-qa="tomatometer-container">
          <div tabindex="0">
            <score-icon-critic alignment="right" size="medium" data-qa="score-icon-critic" state="certified_fresh" percentage="98"></score-icon-critic>
          </div>
          <span class="score-type">Tomatometer</span>
          <slot name="critics-count"></slot>
        </div>
        <div class="audience-container" data-qa="audience-score-container">
          <div tabindex="0">
            <score-icon-audience alignment="left" size="medium" skeleton="chip" data-qa="score-icon-audience" state="upright" percentage="79"></score-icon-audience>
          </div>
          <span class="score-type">Audience Score</span>
          <slot name="audience-count"></slot>
        </div>
      </div>
    </div>

我得到的最接近的是当我输入 x=soup.select('score-board') 时,我得到一个带有文本的字符串,但我不能从中传递任何东西。

    from bs4 import BeautifulSoup
    import requests
    
    url= 'https://www.rottentomatoes.com/m/one_night_in_miami'
    page=requests.get(url)
    soup=BeautifulSoup(page.content, 'html.parser')
    x=soup.select('score-board')
    x
    
    Output:
    
    [<score-board audiencescore="79" audiencestate="upright" class="scoreboard" data-qa="score-panel" rating="R" skeleton="panel" tomatometerscore="98" tomatometerstate="certified-fresh">
     <h1 class="scoreboard__title" data-qa="score-panel-movie-title" slot="title">One Night in Miami</h1>
     <p class="scoreboard__info" slot="info">2020, Drama, 1h 50m</p>
     <a class="scoreboard__link scoreboard__link--tomatometer" data-qa="tomatometer-review-count" href="/m/one_night_in_miami/reviews?intcmp=rt-scorecard_tomatometer-reviews" slot="critics-count">330 Reviews</a>
     <a class="scoreboard__link scoreboard__link--audience" data-qa="audience-rating-count" href="/m/one_night_in_miami/reviews?type=user&amp;intcmp=rt-scorecard_audience-score-reviews" slot="audience-count">1,000+ Ratings</a>
     </score-board>]


#first_mscore = soup.find('span', class_ = 'percentage')
#print(first_mscore)

#mov=soup.find('span',{'class':'audience-score'})

我所需要的只是刮掉的观众分数。任何帮助将不胜感激。蒂亚!

4

2 回答 2

0

您可以通过传递到标签来找到属性的值,也可以尝试使用.get方法

from bs4 import BeautifulSoup
import requests

url= 'https://www.rottentomatoes.com/m/one_night_in_miami'
page=requests.get(url)
soup=BeautifulSoup(page.content, 'html.parser')
per=soup.find("score-board")['audiencescore']
print(per)

输出:

79

方法二:

soup.find("score-board").get("audiencescore")
于 2021-08-24T04:05:30.063 回答
0

你快到了。

只需从中提取audiencescore属性x[0]由于select返回一个列表,您需要选择第一项,然后提取属性。

x[0]['audiencescore']

使用select

soup=BeautifulSoup(page.content, 'html.parser')
x = soup.select('score-board')

print(x[0]['audiencescore'])

使用select_one

x = soup.select_one('score-board')['audiencescore']

输出

79
于 2021-08-24T13:23:20.323 回答