1
import urllib2
from BeautifulSoup import BeautifulSoup

contenturl = "http://espnfc.com/tables/_/league/esp.1/spanish-la-liga?cc=5901"
soup = BeautifulSoup(urllib2.urlopen(contenturl).read())

table = soup.find('div id', attrs={'class': 'content'})

rows = soup.findAll('tr')
for tr in rows:
    cols = tr.findAll('td')
    for td in cols:
        text = td.find(text=True)
        print text,  
    print

and I get: (note this is only a little bit of what I was looking for, which are standings for a soccer league)

  Overall None Home None Away None  
POS None TEAM P W D L F A None W D L F A None W D L F A None GD Pts
1 
Barcelona 38 32 4 2 115 40 None 18 1 0 63 15 None 14 3 

My question is, Why is there a "None" after every word? Is there a way I can make it stop doing that?

4

2 回答 2

1

如果您在网站上注意到,一些信息之间会有空格,这包含在每个 td.

您可能会注意到所有空格都有宽度。所以,你可以这样做:

cols = tr.findAll('td', width=None)

如果您决定在任何阶段切换到 BeautifulSoup 4,请使用:

cols = tr.findAll('td', width=False)
于 2013-07-03T00:41:49.407 回答
0

None 发生在一个元素有多个子元素时,就像文档中所说的那样

摆脱的最简单的方法None是这样的:

for tr in rows:
    cols = tr.findAll('td')
    for td in cols:
        text = td.find(text=True)
        if text is not None:
            print text,  
    print  

这将检查是否text = None以及是否是它不会打印它

于 2013-07-03T00:55:36.423 回答