我一直在尝试requests-html
在 venv 环境中使用(python 3.7.0 - MacOS 10.15.1),但是我正在处理一些证书问题(我不在任何代理/防火墙后面):
主要调用是:
from requests_html import HTMLSession
sessao = HTMLSession()
r1 = sessao.get(url=url_inicio)
运行 GET 方法时引发异常,如下所示:
/Users/ricardobarroslourenco/PycharmProjects/zarc/venv/bin/python "/Users/ricardobarroslourenco/Library/Application Support/JetBrains/Toolbox/apps/PyCharm-P/ch-0/192.6817.19/PyCharm.app/Contents/helpers/pydev/pydevd.py" --multiproc --qt-support=auto --client 127.0.0.1 --port 50377 --file /Users/ricardobarroslourenco/PycharmProjects/zarc/zarc_scraper/main.py
pydev debugger: process 9369 is connecting
Connected to pydev debugger (build 192.6817.19)
[W:pyppeteer.chromium_downloader] start chromium download.
Download may take a few minutes.
Traceback (most recent call last):
File "/Users/ricardobarroslourenco/PycharmProjects/zarc/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 672, in urlopen
chunked=chunked,
File "/Users/ricardobarroslourenco/PycharmProjects/zarc/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 376, in _make_request
self._validate_conn(conn)
File "/Users/ricardobarroslourenco/PycharmProjects/zarc/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 994, in _validate_conn
conn.connect()
File "/Users/ricardobarroslourenco/PycharmProjects/zarc/venv/lib/python3.7/site-packages/urllib3/connection.py", line 394, in connect
ssl_context=context,
File "/Users/ricardobarroslourenco/PycharmProjects/zarc/venv/lib/python3.7/site-packages/urllib3/util/ssl_.py", line 370, in ssl_wrap_socket
return context.wrap_socket(sock, server_hostname=server_hostname)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 412, in wrap_socket
session=session
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 850, in _create
self.do_handshake()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 1108, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1045)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/ricardobarroslourenco/PycharmProjects/zarc/venv/lib/python3.7/site-packages/requests_html.py", line 714, in browser
self._browser = await pyppeteer.launch(ignoreHTTPSErrors=not(self.verify), headless=True, args=self.__browser_args)
File "/Users/ricardobarroslourenco/PycharmProjects/zarc/venv/lib/python3.7/site-packages/pyppeteer/launcher.py", line 311, in launch
return await Launcher(options, **kwargs).launch()
File "/Users/ricardobarroslourenco/PycharmProjects/zarc/venv/lib/python3.7/site-packages/pyppeteer/launcher.py", line 125, in __init__
download_chromium()
File "/Users/ricardobarroslourenco/PycharmProjects/zarc/venv/lib/python3.7/site-packages/pyppeteer/chromium_downloader.py", line 136, in download_chromium
extract_zip(download_zip(get_url()), DOWNLOADS_FOLDER / REVISION)
File "/Users/ricardobarroslourenco/PycharmProjects/zarc/venv/lib/python3.7/site-packages/pyppeteer/chromium_downloader.py", line 78, in download_zip
data = http.request('GET', url, preload_content=False)
File "/Users/ricardobarroslourenco/PycharmProjects/zarc/venv/lib/python3.7/site-packages/urllib3/request.py", line 76, in request
method, url, fields=fields, headers=headers, **urlopen_kw
File "/Users/ricardobarroslourenco/PycharmProjects/zarc/venv/lib/python3.7/site-packages/urllib3/request.py", line 97, in request_encode_url
return self.urlopen(method, url, **extra_kw)
File "/Users/ricardobarroslourenco/PycharmProjects/zarc/venv/lib/python3.7/site-packages/urllib3/poolmanager.py", line 330, in urlopen
response = conn.urlopen(method, u.request_uri, **kw)
File "/Users/ricardobarroslourenco/PycharmProjects/zarc/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 760, in urlopen
**response_kw
File "/Users/ricardobarroslourenco/PycharmProjects/zarc/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 760, in urlopen
**response_kw
File "/Users/ricardobarroslourenco/PycharmProjects/zarc/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 760, in urlopen
**response_kw
File "/Users/ricardobarroslourenco/PycharmProjects/zarc/venv/lib/python3.7/site-packages/urllib3/connectionpool.py", line 720, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/Users/ricardobarroslourenco/PycharmProjects/zarc/venv/lib/python3.7/site-packages/urllib3/util/retry.py", line 436, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='storage.googleapis.com', port=443): Max retries exceeded with url: /chromium-browser-snapshots/Mac/575458/chrome-mac.zip (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1045)')))
有关如何解决此问题的任何提示?这个想法是抓取一些使用 javascript 生成 cookie 的网站,requests-html
据说可以解决渲染问题(发生在常规requests
包上)。