仍在弄清楚这个网络抓取问题。尝试抓取 HTTPS 站点时遇到错误。与 SSL 证书和站点拒绝我的连接有关吗?这是我的代码:
from bs4 import BeautifulSoup
import requests
import csv
with open('UrlsList.csv', newline='') as f_urls, open('Output.csv', 'w', newline='') as f_output:
csv_urls = csv.reader(f_urls)
csv_output = csv.writer(f_output)
for line in csv_urls:
page = requests.get(line[0], verify='.\Cert.cer').text
soup = BeautifulSoup(page, 'html.parser')
results = soup.findAll('td', {'class' :' alpha'})
for r in range(len(results)):
csv_output.writerow([results[r].text])
...这给了我一个大屏幕的问题,底部有以下错误:
raise exception_type(errors)
OpenSSL.SSL.Error: []
我也试过把 verify=False 也放进去,这给了我以下错误:
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))
我试图自己研究答案,但到目前为止我似乎无法理解任何解决方案。我最近也将我的 PyOpenSSL 更新到了 18 版。似乎我要抓取的网站不接受我的连接,但 URL 是真实的,我可以从 Chrome 中查看该网站没问题?
非常感谢!