在抓取这个阿富汗页面时,我收到一条错误消息:
Traceback (most recent call last):
File "extract_table.py", line 23, in <module>
li = dict(chunks([i.text for i in all_td], 2))
ValueError: dictionary update sequence element #28 has length 1; 2 is required
但是在抓取阿根廷页面时,代码运行良好。
有什么方法可以判断是否all_td
返回了新列表?我想知道要使用python中的哪些函数。
像这样的伪代码:
if all_td is new list,
execute dict(chunks([i.text for i in all_td], 2))
else
execute dict(chunks([i.text for i in areatable.findAll('td')], 2))
我想要完成的是将代码运行到阿富汗和阿根廷这两个国家。
这是我的代码
from bs4 import BeautifulSoup
import urllib2
import re
url = "http://www.howtocallabroad.com/afghanistan" # argentina works fine
html_page = urllib2.urlopen(url)
soup = BeautifulSoup(html_page)
areatable = soup.find('table',{'id':'codes'})
if areatable is None:
print "areatable is None"
else:
d = {}
def chunks(l, n):
return [l[i : i + n] for i in range(0, len(l), n)]
all_td = areatable.findAll('td')
all_td = filter(lambda x: x.attrs == {}, all_td)
print ">>>>> all_td=", all_td
li = dict(chunks([i.text for i in all_td], 2))
print ">>>>> li=", li