2

我有一些代码可以从页面http://sports.yahoo.com/nhl/scoreboard?d=2013-04-01输出团队及其所有得分值(不带空格)。

from bs4 import BeautifulSoup
from urllib.request import urlopen

url = urlopen("http://sports.yahoo.com/nhl/scoreboard?d=2013-04-01")

content = url.read()

soup = BeautifulSoup(content)

listnames = ''
listscores = ''

for table in soup.find_all('table', class_='scores'):
    for row in table.find_all('tr'):
        for cell in row.find_all('td', class_='yspscores'):
            if cell.text.isdigit():
                listscores += cell.text
        for cell in row.find_all('td', class_='yspscores team'):
            listnames += cell.text

print (listnames)
print (listscores)

我无法解决的问题是我不太了解 Python 如何使用任何提取的信息并以如下格式为正确的团队提供正确的整数值:

Team X: 1, 5, 11.

该网站的问题是所有分数都属于同一类;所有表都属于同一类。唯一不同的是href。

4

1 回答 1

0

当您想将值与名称相关联时,adict通常是要走的路。这是您的代码的修改以演示原理:

from bs4 import BeautifulSoup
from urllib.request import urlopen

url = urlopen('http://sports.yahoo.com/nhl/scoreboard?d=2013-04-01')

content = url.read()

soup = BeautifulSoup(content)

results = {}

for table in soup.find_all('table', class_='scores'):
    for row in table.find_all('tr'):
        scores = []
        name = None
        for cell in row.find_all('td', class_='yspscores'):
            link = cell.find('a')
            if link:
                name = link.text
            elif cell.text.isdigit():
                scores.append(cell.text)
        if name is not None:
            results[name] = scores

for name, scores in results.items():
    print('%s: %s' % (name, ', '.join(scores)))

...运行时给出此输出:

$ python3 get_scores.py
St. Louis: 1, 2, 1
San Jose: 0, 3, 0
Colorado: 0, 0, 2
Dallas: 0, 0, 0
New Jersey: 0, 1, 0
NY Islanders: 2, 0, 1
Nashville: 0, 0, 2, 0
Minnesota: 0, 1, 0
Detroit: 1, 2, 0
NY Rangers: 1, 1, 2
Anaheim: 0, 3, 1
Winnipeg: 2, 0, 0
Chicago: 1, 1, 0, 0
Calgary: 0, 0, 1
Vancouver: 0, 1, 1
Edmonton: 3, 0, 1
Montreal: 1, 1, 2
Carolina: 1, 0, 0

除了使用字典之外,另一个重要的变化是我们现在正在检查是否存在一个a元素来获取团队的名称,而不是一个额外的team类。这确实是一种风格选择,但对我来说,这样的代码似乎更具表现力。

于 2013-08-16T15:25:49.367 回答