0

I am getting a KeyError: 'title' error in my web scraping program and not sure what the issue is. When I use inspect element on the webpage I can see the element that I am trying to find;

import pandas as pd
import requests
from bs4 import BeautifulSoup
import re

url = 'https://www.ncaagamesim.com/college-basketball-predictions.asp'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')

table = soup.find('table')

# Get column names
headers = table.find_all('th')
cols = [x.text for x in headers]

# Get all rows in table body
table_rows = table.find_all('tr')

rows = []
# Grab the text of each td, and put into a rows list
for each in table_rows[1:]:
    odd_avail = True
    data = each.find_all('td')
    time = data[0].text.strip()
    try:
        matchup, odds = data[1].text.strip().split('\xa0')
        odd_margin = float(odds.split('by')[-1].strip())
    except:
        matchup = data[1].text.strip()
        odd_margin = '-'
        odd_avail = False
    odd_team_win = data[1].find_all('img')[-1]['title']

    sim_team_win = data[2].find('img')['title']
    sim_margin = float(re.findall("\d+\.\d+", data[2].text)[-1])

    if odd_avail == True:
        if odd_team_win == sim_team_win:
            diff = sim_margin - odd_margin
        else:
            diff = -1 * odd_margin - sim_margin
    else:
        diff = '-'

    row = {cols[0]: time, 'Matchup': matchup, 'Odds Winner': odd_team_win, 'Odds': odd_margin,
           'Simulation Winner': sim_team_win, 'Simulation Margin': sim_margin, 'Diff': diff}
    rows.append(row)

df = pd.DataFrame(rows)
print (df.to_string())
# df.to_csv('odds.csv', index=False)

I am getting the error on setting the sim_team_win line. It is getting data[2] which is the 3rd column on the website and finding the img title to get the team name. Is it because the img title is within another div? Also, when running this code it also does not print out the "Odds" column, which is being stored in the odd_margin variable. Is there something that is wrong when setting that variable? Thanks in advance for the help!

4

1 回答 1

1

至于没有找到 img 标题,如果您查看带有 New Mexico @ Dixie State 的行,第三列中没有图像 - 源中也没有 img 标题。

对于 Odds 列,在尝试/排除 sim_team_win 分配后,我得到了表中的所有 Odds 值。

于 2021-01-14T03:37:09.527 回答