0

I am new to using BeautifulSoup and am try to use it to grab some test data from NHL.com. Here is my code so far but I am pretty lost...

Here is a snippet of the HTML code I want to extract data from:

<tr>
    <td rowspan="1" colspan="1"> … </td>
    <td style="text-align: left;" rowspan="1" colspan="1">
        <a href="/ice/player.htm?id=8474564">

            Steven Stamkos

        </a>
    </td>
    <td style="text-align: center;" rowspan="1" colspan="1">
        <a href="javascript:void(0);" rel="TBL" onclick="loadTeamSpotlight(jQuery(this));" style="border-bottom:1px dotted;">

            TBL

        </a>
    </td>
    <td style="text-align: center;" rowspan="1" colspan="1">

        C

    </td>
    <td style="center" rowspan="1" colspan="1">

        16

    </td>
    <td style="center" rowspan="1" colspan="1">

        14

    </td>
    <td style="center" rowspan="1" colspan="1">

        9

    </td>

I would like to extract data from these fields for the entire page, so there are about 30 different table rows. Here is my Python code so far, I'm not really sure where to go.

from bs4 import BeautifulSoup
import requests

r  = requests.get("http://www.nhl.com/ice/playerstats.htm?fetchKey=20142ALLSASAll&viewName=summary&sort=points&pg=1")

data = r.text
t_data=[]
soup = BeautifulSoup(data)
table = soup.find('table', {'class': 'data stats'})

I know it isn't much but I have no idea how to go about this. Thanks for the help everyone

EDIT: I solved the problem, and hopefully this will help anyone in the future. Here is my code:

from bs4 import BeautifulSoup
import requests

r  = requests.get("http://www.nhl.com/ice/playerstats.htm?fetchKey=20142ALLSASAll&viewName=summary&sort=points&pg=1")

player=[]
team=[]
goals=[]
assists=[]
cells=[]
points=[]
i=0
data = r.text
soup = BeautifulSoup(data)
table = soup.find('table', {'class': 'data stats'})
row=[]
for rows in table.find_all('tr'):
    cells=rows.find_all('td')
    if(len(cells)==19):
        player.append(cells[1].find(text=True))
        team.append(cells[2].find(text=True))
        goals.append(cells[5].find(text=True))
        assists.append(cells[6].find(text=True))
        points.append(cells[7].find(text=True))
        print(player[i],team[i],goals[i],assists[i],points[i])
        i=i+1
4

2 回答 2

1

我只是想发布另一种方法,因此您不必使用 6 个不同的列表来存储连接的数据。此外,还有一种更短、更优雅的方式来获取所有预期的行。

# getting data
#...
from bs4 import BeautifulSoup
from collections import namedtuple
soup = BeautifulSoup(data)
# thats where the data are collected
rows = list()
# named tuple to store the relevant data of one player
Player = namedtuple('Player', ['name', 'team', 'goals', 'assists', 'points'])
# getting every row of the tbody in the specified table
for tr in soup.select('table.data.stats tbody tr'):
    # put text-contents of the row in a list
    cellStrings = [cell.find(text = True) for cell in tr.findAll('td')]
    # add it to the
    rows.append(
        Player(
            name=cellStrings[1],
            team=cellStrings[2],
            goals=cellStrings[5],
            assists=cellStrings[6],
            points=cellStrings[7]
        )
    )

rows看起来像那样

[Player(name=u'Steven Stamkos', team=u'TBL', goals=u'14', assists=u'9', points=u'23'),
 Player(name=u'Sidney Crosby', team=u'PIT', goals=u'8', assists=u'15', points=u'23'),
 Player(name=u'Ryan Getzlaf', team=u'ANA', goals=u'10', assists=u'12', points=u'22'),
 Player(name=u'Alexander Steen', team=u'STL', goals=u'14', assists=u'7', points=u'21'),
 Player(name=u'Corey Perry', team=u'ANA', goals=u'11', assists=u'10', points=u'21'),
 Player(name=u'Alex Ovechkin', team=u'WSH', goals=u'13', assists=u'7', points=u'20'),
 ....

像这样访问

>>> rows[20].name
u'Bryan Little'
于 2013-11-11T10:58:28.887 回答
0

您没有确切地提到您需要哪些数据,但您可以继续执行以下操作:

from BeautifulSoup import BeautifulSoup
...
table = soup.find('table', {'class': 'data stats'})
rows = table.find('tr')
for row in rows:
    cols = row.findAll('td')
    for col in cols:
        print col.text
        link = col.find("a")
        if link:
            print link.get("href"), link.get("rel"), link.get("onclick"), link.text
于 2013-11-11T08:02:16.350 回答