9
from difflib import *
import urllib.request,urllib.parse,urllib.error
from urllib.parse import unquote
import time
import pdb

try:
    file2 = urllib.request.Request('site goes here')
    file2.add_header("User-Agent", 'Opera/9.61 (Windows NT 5.1; U; en) Presto/2.1.1')
    ResponseData = urllib.request.urlopen(file2).read().decode("utf8", 'ignore')
except urllib.error.URLError as e: print('http'); ResponseData = ''
except socket.error as e: ResponseData = ''
except socket.timeout as e: ResponseData = ''
except UnicodeEncodeError as e: ResponseData = ''
except http.client.BadStatusLine as e: ResponseData = ''
except http.client.IncompleteRead as e: ResponseData = ''
except urllib.error.HTTPError as e: ResponseData = ''

嗨,当我在包含诸如“Microsoft VBScript 运行时错误”之类的错误的页面上运行以下代码时……请求失败并返回为 urllib.error.URLError……即使该页面包含大量其他代码。如何从页面返回所有 html 而不仅仅是异常错误。我想尽可能地保留我当前的代码(如果可能的话)。谢谢

4

2 回答 2

17

谢谢,我已经解决了问题

except urllib.error.URLError as e: ResponseData = e.read().decode("utf8", 'ignore')
于 2012-08-19T11:44:42.913 回答
3

URLError 具有“原因”属性,因此您可以调用:

except urllib.error.URLError as e: ResponseData = e.reason

(例如,这将是“禁止”)。

您还应该小心在超类之前捕获错误的子类。在您的示例中,这意味着将 HTTPError 放在 URLError 之前。否则,子类永远不会被抓住。

于 2017-10-20T13:16:44.857 回答