2

我正在尝试通过提取table. 现在,当我尝试抓取这个美属萨摩亚页面时,findAll()找不到<td>哪个是真的。如何捕捉这个异常?

这是我的代码:

from bs4 import BeautifulSoup                                                                                                                                                                                                                
import urllib2                                                                                                                                                                                                                               
import re                                                                                                                                                                                                                                    

url = "http://www.howtocallabroad.com/american-samoa"
html_page = urllib2.urlopen(url)
soup = BeautifulSoup(html_page)

areatable = soup.find('table',{'id':'codes'})
d = {}

def chunks(l, n):
    return [l[i:i+n] for i in range(0, len(l), n)]

li = dict(chunks([i.text for i in areatable.findAll('td')], 2))
if li != []:
    print li

    for key in li:
            print key, ":", li[key]
else:
    print "list is empty"

这是我得到的错误

Traceback (most recent call last):
  File "extract_table.py", line 15, in <module>
    li = dict(chunks([i.text for i in areatable.findAll('td')], 2))
AttributeError: 'NoneType' object has no attribute 'findAll'

我也试过这个,但也不起作用

def gettdtag(tag):
    return "empty" if areatable.findAll(tag) is None else tag

all_td = gettdtag('td')
print all_td
4

1 回答 1

2

错误说areatableNone

areatable = soup.find('table',{'id':'codes'})
#areatable = soup.find('table', id='codes')  # Also works

if areatable is None:
    print 'Something happened'
    # Exit out

另外,我会使用find_all而不是findAllget_text()而不是text.

于 2013-06-06T06:05:56.973 回答