python - 将多个团队与多个标题/值相关联：

Question

我想让你看看这个网站：

http://www.nhl.com/ice/teamstats.htm

现在，我的代码在这里。这只会打印出表格顶部的所有标题：

from bs4 import BeautifulSoup
from urllib.request import urlopen

url = urlopen("http://www.nhl.com/ice/teamstats.htm")

content = url.read()

soup = BeautifulSoup(content)

results = {}

for table in soup.find_all('table', class_='data stats'):
    for row in table.find_all('tr'):
        name = None
        for cell in row.find_all('th'):
            link = cell.find('a')
            if link:
                name = cell.a.string
                print (name)

可以肯定的是，这个东西更复杂。在很多帮助和重新学习一些被遗忘的 Python 类的情况下，我能够在这个网站上进行团队和分数的关联：http ://sports.yahoo.com/nhl/scoreboard?d=2013-04 -01

但是，前一个网页（第一个）有多个与其值相关联的标题。

我只是要求其中的一些要点，以便我可以进一步完成其余的工作而不会出现问题（或者可能是一些，谁知道）。从某种意义上说，这就是我希望实现的目标：

Team X: GP: 30. W: 16. L: 4, etc.

谢谢！

score 1 · Accepted Answer

您的代码仅处理th. 也应该处理td。

尝试以下操作：

from bs4 import BeautifulSoup
from urllib.request import urlopen

u = urlopen("http://www.nhl.com/ice/teamstats.htm")
soup = BeautifulSoup(u)
u.close()

for table in soup.find_all('table', class_='data stats'):
    row = table.find('tr')
    header = []
    for cell in row.find_all('th')[1:]:
        name = cell.string.strip()
        header.append(name)
    for row in table.find_all('tr')[1:]:
        for name, cell in zip(header, row.find_all('td')[1:]):
            value = cell.string.strip()
            print('{}: {}'.format(name, value), end=', ')
        print()

python - 将多个团队与多个标题/值相关联：

1 回答 1

Related

Reference