0

我正在尝试使用 robobrowser 解析网页,其中一些 html 如下:

<table class="lineScore mlbBoxScore postEvent"><tr class="gameInfo">
<tdclass="gameStatus"></td><td class="finalStatus" colspan="12">Final Top 
9th</td></tr><tr class="periodLabels MLBHOU"><td></td><td><div>1</div></td>
<td><div>2</div></td><td><div>3</div></td><td><div>4</div></td><td>
<div>5</div></td><td><div>6</div></td><td><div>7</div></td><td><div>8</div>
</td><td><div>9</div></td><td><div>R</div></td></tr><tr class="teamInfo 
awayTeam MLBHOU"><td class="teamName"><a href="/mlb/teams/page/HOU/houston-
astros"><img 
delaysrc="http://sports.cbsimg.net/images/mlb/logos/40x40/HOU.png" 
width="40" height="40" border="0" class="teamLogo"></a><div 
class="teamLocation"><a href="/mlb/teams/page/HOU/houston-
astros">Houston</a> </div></td><td class="periodScore">1</td><td 
class="periodScore">0</td><td class="periodScore">0</td><td 
class="periodScore">0</td><td class="periodScore">0</td><td 
class="periodScore">0</td><td class="periodScore">2</td><td 
class="periodScore">0</td><td class="periodScore">0</td><td 
class="runsScore">3</td></tr></tr><tr class="teamInfo homeTeam MLBWAS"><td 
class="teamName"><a href="/mlb/teams/page/WAS/washington-nationals"><img 
delaysrc="http://sports.cbsimg.net/images/mlb/logos/40x40/WAS.png" 
width="40" height="40" border="0" class="teamLogo"></a><div 
class="teamLocation"><a href="/mlb/teams/page/WAS/washington-
nationals">Washington</a> </div></td><td class="periodScore">0</td><td 
class="periodScore">1</td><td class="periodScore">2</td><td 
class="periodScore">0</td><td class="periodScore">0</td><td 
class="periodScore">0</td><td class="periodScore">1</td><td 
class="periodScore">1</td><td class="periodScore">0</td><td 
class="runsScore">5</td></tr></tr></table>

但是,当我尝试使用find_all(class_="lineScore mlbBoxScore postEvent") 它时,它会返回:

<table class="lineScore mlbBoxScore postEvent"><tr class="gameInfo"><td 
class="gameStatus"></td><td class="finalStatus" colspan="12">Final 9th</td>
</tr><tr class="periodLabels MLBBOS"><td></td><td><div>1</div></td><td>
<div>2</div></td><td><div>3</div></td><td><div>4</div></td><td><div>5</div>
</td><td><div>6</div></td><td><div>7</div></td><td><div>8</div></td><td>
<div>9</div></td><td><div>R</div></td></tr><tr class="teamInfo awayTeam     
MLBBOS"><td class="teamName"><a href="/mlb/teams/page/BOS/boston-red-sox">
<img border="0" class="teamLogo" 
delaysrc="http://sports.cbsimg.net/images/mlb/logos/40x40/BOS.png" 
height="40" width="40"/></a><div class="teamLocation"><a 
href="/mlb/teams/page/BOS/boston-red-sox">Boston</a> </div></td><td 
class="periodScore">0</td><td class="periodScore">2</td><td 
 class="periodScore">1</td><td class="periodScore">0</td><td 
class="periodScore">0</td><td class="periodScore">0</td><td 
class="periodScore">1</td><td class="periodScore">0</td><td 
class="periodScore">0</td><td class="runsScore">4</td></tr></table>

它在第一个</table>标签处停止。我该如何防止这种情况。beatifulsoup 和其他人也会出现这种情况吗?任何帮助表示赞赏。

编辑:

我现在的代码如下:

browser = RoboBrowser(parser='html.parser')
browser.open("http://www.cbssports.com/"+league+"/scoreboard")
doneGames = browser.find_all(class_="lineScore "+league+"BoxScore postEvent")

网址是 www.cbssports.com/mlb/scoreboard

4

0 回答 0