python-2.7 - 抓取所需的表格和数据有什么问题？

Question

我正在尝试从http://www.scoresandodds.com/grid_20111225.html的表格中获取迈阿密热火队及其对手的数据。我遇到的问题是 NBA 和 NFL 以及其他运动的表格都标记相同，我得到的所有数据都来自 NFL 表格。另一个问题是我想抓取整个赛季的数据以及不同表格的数量变化以及迈阿密在表格中的位置变化。到目前为止，这是我用于不同表格的代码；

那么为什么这没有完成工作呢？谢谢你的耐心；我是一个真正的初学者，我已经尝试解决这个问题几天了，但没有效果。

def tableSnO(htmlSnO):
gameSections = soup.findAll('div', 'gameSection')
for gameSection in gameSections:
    header = gameSection.find('div', 'header')
    if header.get('id') == 'nba':
        rows = gameSections.findAll('tr')
        def parse_string(el):
            text = ''.join(el.findAll(text=True))
            return text.strip()
        for row in rows:
            data = map(parse_string, row.findAll('td'))
            return data

最近我决定尝试一种不同的方法。如果我抓取整个页面并获取相关数据的索引（这是它停止的地方：）我可以从列表中获取下一组数据，因为表的结构永远不会改变。我也可以像获得 htmlSnO 一样获得对手的球队名称。感觉这是如此基本的东西，它让我无法做到正确。

def tableSnO(htmlSnO):
oddslist = soupSnO.find('table', {"width" : "100%", "cellspacing" : "0", "cellpadding" : "0"})
rows = oddslist.findAll('tr',)
def parse_string(el):
    text = ''.join(el.findAll(text=True))
    return text.strip()
for row in rows:
    data = map(parse_string, row.findAll('td'))

    for teamName in data:
        if re.match("(.*)MIAMI HEAT(.*)", teamName):
            return teamName
            return data.index(teamName)

score 0 · Accepted Answer

带有工作代码的新的和最终的答案：

你想要的页面部分有这个：

<div class="gameSection">
    <div class="header" id="nba">

这应该可以让你进入 NBA 表：

def tableSnO(htmlSnO):
    gameSections = soup.findAll('div', 'gameSection')
    for gameSection in gameSections:
        header = gameSection.find('div', 'header')
        if header.get('id') == 'nba':
            # process this gameSection
            print gameSection.prettify()

作为一个完整的例子，这是我用来测试的完整代码：

import sys
import urllib2
from bs4 import BeautifulSoup

f = urllib2.urlopen('http://www.scoresandodds.com/grid_20111225.html')
html = f.read()
soup = BeautifulSoup(html)

gameSections = soup.findAll('div', 'gameSection')
for gameSection in gameSections:
    header = gameSection.find('div', 'header')
    if header.get('id') == 'nba':
        table = gameSection.find('table', 'data')
        print table.prettify()

这将打印 NBA 数据表。

python-2.7 - 抓取所需的表格和数据有什么问题？

1 回答 1

带有工作代码的新的和最终的答案：

Related

Reference