python - 输入链接没有适当形式时的异常处理

Question

例如，我有一个这样的链接列表：

linklists = ['www.right1.com', www.right2.com', 'www.wrong.com', 'www.right3.com']

而right1,right2,right3各自的html格式为：

<html>
<p>
hi
</p>
<strong>
hello
</strong>
</html>

www.wrong.com html的形式是（实际的html要复杂得多）：

<html>
<p>
hi
</p>
</html>

我正在使用这样的代码：

from BeautifulSoup import BeautifulSoup
stronglist=[]
for httplink in linklists:  
    url = httplink
    page = urllib2.urlopen(url)
        html = page.read()
        soup = BeautifulSoup(html)
    findstrong = soup.findAll("strong")
    findstrong = str(findstrong)
    findstrong = re.sub(r'\[|\]|\s*<[^>]*>\s*', '', findstrong)        #remove tag
    stronglist.append(findstrong)

我想做的是：

从列表中获取 html 链接'linklists'
之间查找数据<strong>
将它们添加到列表中'stronglist'

但问题是：有一个错误的链接 ( www.wrong.com) 没有 . 然后代码说错误...

我想要的是一个异常处理（或其他东西），如果链接没有“强”字段（它有错误），我希望代码将字符串“空”添加到强列表中，因为它无法获取来自链接的数据。

我一直在使用 'if 来解决这个问题，但这对我来说有点难

有什么建议么？

score 1 · Accepted Answer

无需使用异常处理。只需确定 findAll 方法何时返回一个空列表并处理它。

from BeautifulSoup import BeautifulSoup
strong_list=[]
for url in link_list:  
    soup = BeautifulSoup(urllib2.urlopen(url).read())
    strong_tags = soup.findAll("strong")
    if not strong_tags:
        strong_list.append('null')
        continue
    for strong_tag in strong_tags:
        strong_list.append(strong_tag.text)

python - 输入链接没有适当形式时的异常处理

1 回答 1

Related

Reference