1

我有这个代码:

from urllib.request import urlopen
from bs4 import BeautifulSoup

page = urlopen("http://www.doctoralia.com")
soup = BeautifulSoup(page)
myfile = open('data.txt','w')
myfile.write(soup.prettify())
myfile.close()
print('done boy !')

它运作良好!但是当我改变它时,我urlopen("http://www.doctoralia.com")urlopen("http://www.doctoralia.com/healthpros")抛出这个错误:

Traceback (most recent call last):
File "test.py", line 4, in <module>
page = urlopen("http://www.doctoralia.com/healthpros")
File "C:\Python33\lib\urllib\request.py", line 156, in urlopen
return opener.open(url, data, timeout)
File "C:\Python33\lib\urllib\request.py", line 475, in open
response = meth(req, response)
File "C:\Python33\lib\urllib\request.py", line 587, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python33\lib\urllib\request.py", line 513, in error
return self._call_chain(*args)
File "C:\Python33\lib\urllib\request.py", line 447, in _call_chain
result = func(*args)
File "C:\Python33\lib\urllib\request.py", line 595, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)

有什么问题 ?谢谢

4

1 回答 1

1

如果您仍想查看实际代码,则必须处理此 HTTPError。例子:

from urllib.request import urlopen
from urllib.error import HTTPError
from bs4 import BeautifulSoup

try:
    page = urlopen("http://www.doctoralia.com/healthpros")
except HTTPError as e:
    if e.code == 404:
        soup = BeautifulSoup(e.fp.read())
        print(soup.prettify())

如果页面给出 404 HTTPError,这将输出代码。

您可以删除 if 语句并对每个 HTTPError 执行此操作。

于 2013-10-22T17:43:56.213 回答