line - 如果当前一个返回 404，我如何让 python 尝试我文件中的下一个 URL？

Question

我在弄清楚我需要创建什么代码以使 python 在我的 csv 文件中尝试下一个 url 时遇到问题，每个 url 都在这样的一行上：

http : //www.indexedamerica.com/states/PR/Adjuntas/Restaurants-Adjuntas-00601.html http://www.indexedamerica.com/states/PR/Aguada/Restaurants-Aguada-00602.html http:// www.indexedamerica.com/states/PR/Aguadilla/Restaurants-Aguadilla-00603.html http: //www.indexedam erica.com/states/PR/Aguadilla/Restaurants-Aguadilla-00604.html http://www.indexedamerica。 com/states/PR/Aguadilla/Restaurants-Aguadilla-00605.html http ://www.indexedamerica.com/states/PR/Maricao/Restaurants-Maricao-00606.html http ://www.indexedam erica.com/states/ MI/Kent/Restaurants-Grand-Rapids-49503.html

#open csv file
#read csv file line by line
#Pass each line to beautiful soup to try
#If URL raises a 404 error continue to next line
#extract tables from url

from mechanize import Browser
from BeautifulSoup import BeautifulSoup
import csv

mech = Browser()
indexed = open('C://python27/longlist.csv')
reader = csv.reader(indexed)
html = mech.open(reader)

for line in html:
    try:
        mechanize.open(html)
        table = soup.find("table", border=3)
else:
#!!!! try next url from file. How do I do this?

for row in table.findAll('tr')[2:]:
    col = row.findAll('td')
    BusinessName = col[0].string
    Phone = col[1].string
    Address = col[2].string
    City = col[3].string
    State = col[4].string
    Zip = col[5].string
    Restaurantinfo = (BusinessName, Phone, Address, City, State)
print "|".join(Restaurantinfo)

score 2 · Accepted Answer

for line in html:
    try:
        mechanize.open(html)
        table = soup.find("table", border=3)
    except Exception:
        continue

或者，您可以检查页面的状态代码，如果收到 404（在 for 循环中）则跳过：

if urllib.urlopen(url).getcode() == '404':
    continue

continue在循环中，停止执行进一步的代码并继续循环中的下一个条目。

score 1 · Accepted Answer

将要搜索的所有 url 添加到列表中。然后遍历列表，依次打开每个 url。如果给定的 url 返回任何类型的错误，那么您可以选择使用 continue 忽略该 url 文件并继续下一个。

line - 如果当前一个返回 404，我如何让 python 尝试我文件中的下一个 URL？

2 回答 2

Related

Reference