python - Why is this BeautifulSoup code outputting "None"?

Question

import urllib2
from BeautifulSoup import BeautifulSoup

contenturl = "http://espnfc.com/tables/_/league/esp.1/spanish-la-liga?cc=5901"
soup = BeautifulSoup(urllib2.urlopen(contenturl).read())

table = soup.find('div id', attrs={'class': 'content'})

rows = soup.findAll('tr')
for tr in rows:
    cols = tr.findAll('td')
    for td in cols:
        text = td.find(text=True)
        print text,  
    print

and I get: (note this is only a little bit of what I was looking for, which are standings for a soccer league)

&nbsp; Overall None Home None Away None &nbsp;
POS None TEAM P W D L F A None W D L F A None W D L F A None GD Pts
1 
Barcelona 38 32 4 2 115 40 None 18 1 0 63 15 None 14 3

My question is, Why is there a "None" after every word? Is there a way I can make it stop doing that?

score 1 · Accepted Answer

如果您在网站上注意到，一些信息之间会有空格，这包含在每个 td.

您可能会注意到所有空格都有宽度。所以，你可以这样做：

cols = tr.findAll('td', width=None)

如果您决定在任何阶段切换到 BeautifulSoup 4，请使用：

cols = tr.findAll('td', width=False)

score 0 · Accepted Answer

None 发生在一个元素有多个子元素时，就像文档中所说的那样

摆脱的最简单的方法None是这样的：

for tr in rows:
    cols = tr.findAll('td')
    for td in cols:
        text = td.find(text=True)
        if text is not None:
            print text,  
    print

这将检查是否text = None以及是否是它不会打印它

python - Why is this BeautifulSoup code outputting "None"?

2 回答 2

Related

Reference