为了抓取 binance.com,我使用库 pyppeteer 来呈现网页并获得干净的 html 代码而不是 javascript 代码。
我的问题是:会话第一次在远程 Ubuntu 20.04 服务器上正常工作,但是当我再次运行代码时,我得到pyppeteer.errors.PageError: Page crashed!或pyppeteer.errors.TimeoutError: Navigation Timeout Exceeded: 超过 100000 ms。此外,当我从我的主 Windows 系统在 PyCharm 中运行代码时,该代码可以工作,但问题恰好发生在 ubuntu 上。
我认为这个问题与无人认领的 pyppeteer 会话有关,但我不确定。
这是我的代码:
from requests_html import HTMLSession
from bs4 import BeautifulSoup
import time
from datetime import datetime
from sql import *
if __name__ == "__main__":
while True:
session = HTMLSession()
r = session.get('https://www.binance.com/ru/trade/ETH_BTC')
r.html.render(sleep = 1, keep_page=True, scrolldown=1, timeout=1000)
soup = BeautifulSoup(r.html.html, "lxml")
price = soup.find("div", class_ = lambda value: value and value.startswith("showPrice"))
now = datetime.now()
dt_string = now.strftime("%d/%m/%Y %H:%M:%S")
sql(dt_string, price.text)
print(dt_string + " ETH/BTC: " + price.text)
r.close()
session.close()
这是崩溃错误日志:
Traceback (most recent call last):
File "binance.py", line 13, in <module>
r.html.render(sleep = 1, keep_page=True, scrolldown=1, timeout=1000)
File "/usr/local/lib/python3.8/dist-packages/requests_html.py", line 598, in render
content, result, page = self.session.loop.run_until_complete(self._async_render(url=self.url, script=script, sleep=sleep, wait=wait, content=self.html, reload=reload, scrolldown=scrolldown, timeout=timeout, keep_page=keep_page))
File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
return future.result()
File "/usr/local/lib/python3.8/dist-packages/requests_html.py", line 512, in _async_render
await page.goto(url, options={'timeout': int(timeout * 1000)})
File "/usr/local/lib/python3.8/dist-packages/pyppeteer/page.py", line 885, in goto
raise error
pyppeteer.errors.TimeoutError: Navigation Timeout Exceeded: 1000000 ms exceeded.
[E:pyppeteer.connection] connection unexpectedly closed
Task exception was never retrieved
future: <Task finished name='Task-105' coro=<Connection._async_send() done, defined at /usr/local/lib/python3.8/dist-packages/pyppeteer/connection.py:69> exception=InvalidStateError('invalid state')>
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/websockets/protocol.py", line 827, in transfer_data
message = await self.read_message()
File "/usr/local/lib/python3.8/dist-packages/websockets/protocol.py", line 895, in read_message
frame = await self.read_data_frame(max_size=self.max_size)
File "/usr/local/lib/python3.8/dist-packages/websockets/protocol.py", line 971, in read_data_frame
frame = await self.read_frame(max_size)
File "/usr/local/lib/python3.8/dist-packages/websockets/protocol.py", line 1047, in read_frame
frame = await Frame.read(
File "/usr/local/lib/python3.8/dist-packages/websockets/framing.py", line 105, in read
data = await reader(2)
File "/usr/lib/python3.8/asyncio/streams.py", line 721, in readexactly
raise exceptions.IncompleteReadError(incomplete, n)
asyncio.exceptions.IncompleteReadError: 0 bytes read on a total of 2 expected bytes
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/pyppeteer/connection.py", line 73, in _async_send
await self.connection.send(msg)
File "/usr/local/lib/python3.8/dist-packages/websockets/protocol.py", line 555, in send
await self.ensure_open()
File "/usr/local/lib/python3.8/dist-packages/websockets/protocol.py", line 803, in ensure_open
raise self.connection_closed_exc()
websockets.exceptions.ConnectionClosedError: code = 1006 (connection closed abnormally [internal]), no reason
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/pyppeteer/connection.py", line 79, in _async_send
await self.dispose()
File "/usr/local/lib/python3.8/dist-packages/pyppeteer/connection.py", line 170, in dispose
await self._on_close()
File "/usr/local/lib/python3.8/dist-packages/pyppeteer/connection.py", line 151, in _on_close
cb.set_exception(_rewriteError(
asyncio.exceptions.InvalidStateError: invalid state
Task exception was never retrieved
future: <Task finished name='Task-2' coro=<Connection._recv_loop() done, defined at /usr/local/lib/python3.8/dist-packages/pyppeteer/connection.py:53> exception=PageError('Page crashed!')>
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/pyppeteer/connection.py", line 61, in _recv_loop
await self._on_message(resp)
File "/usr/local/lib/python3.8/dist-packages/pyppeteer/connection.py", line 143, in _on_message
self._on_query(msg)
File "/usr/local/lib/python3.8/dist-packages/pyppeteer/connection.py", line 123, in _on_query
session._on_message(params.get('message'))
File "/usr/local/lib/python3.8/dist-packages/pyppeteer/connection.py", line 276, in _on_message
self.emit(obj.get('method'), obj.get('params'))
File "/usr/local/lib/python3.8/dist-packages/pyee/_base.py", line 108, in emit
handled = self._call_handlers(event, args, kwargs)
File "/usr/local/lib/python3.8/dist-packages/pyee/_base.py", line 91, in _call_handlers
self._emit_run(f, args, kwargs)
File "/usr/local/lib/python3.8/dist-packages/pyee/_compat.py", line 49, in _emit_run
coro = f(*args, **kwargs)
File "/usr/local/lib/python3.8/dist-packages/pyppeteer/page.py", line 205, in <lambda>
lambda event: self._onTargetCrashed())
File "/usr/local/lib/python3.8/dist-packages/pyppeteer/page.py", line 228, in _onTargetCrashed
self.emit('error', PageError('Page crashed!'))
File "/usr/local/lib/python3.8/dist-packages/pyee/_base.py", line 111, in emit
self._emit_handle_potential_error(event, args[0] if args else None)
File "/usr/local/lib/python3.8/dist-packages/pyee/_base.py", line 83, in _emit_handle_potential_error
raise error
pyppeteer.errors.PageError: Page crashed!