我正在使用 puppeteer 来做一些轻量级的爬行 ~2K 页面。但我不断看到这个错误再次发生
File "/env/local/lib/python3.7/site-packages/pyppeteer/execution_context.py", line 106, in evaluateHandle
'userGesture': True,
pyppeteer.errors.NetworkError: Protocol error (Runtime.callFunctionOn): Cannot find context with specified id
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
...
File "/user_code/main.py", line 434, in main_program
crawl_data = asyncio.get_event_loop().run_until_complete(crawl(browser, url))
File "/opt/python3.7/lib/python3.7/asyncio/base_events.py", line 573, in run_until_complete
return future.result()
File "/user_code/main.py", line 394, in crawl
title = await page.title()
File "/env/local/lib/python3.7/site-packages/pyppeteer/page.py", line 1437, in title
return await frame.title()
File "/env/local/lib/python3.7/site-packages/pyppeteer/frame_manager.py", line 752, in title
return await self.evaluate('() => document.title')
File "/env/local/lib/python3.7/site-packages/pyppeteer/frame_manager.py", line 295, in evaluate
pageFunction, *args, force_expr=force_expr)
File "/env/local/lib/python3.7/site-packages/pyppeteer/execution_context.py", line 55, in evaluate
pageFunction, *args, force_expr=force_expr)
File "/env/local/lib/python3.7/site-packages/pyppeteer/execution_context.py", line 109, in evaluateHandle
_rewriteError(e)
File "/env/local/lib/python3.7/site-packages/pyppeteer/execution_context.py", line 238, in _rewriteError
raise type(error)(msg)
pyppeteer.errors.NetworkError: Execution context was destroyed, most likely because of a navigation.
"
我不明白它是如何触发相关的错误的,frame.title()
因为在我的代码中,它只查找实际的页面标题,而不是在其框架内。
此外,它在导航到任何框架内容之前调用页面标题:
try:
# max timeout of 8 seconds
response = await page.goto(
url,
{'timeout': 12000}
)
if response.status != 200:
await page.close()
return(False)
except TimeoutError:
return(False)
except Exception as e:
print(e)
return(False)
# had this in before, but it was causing too many timeouts. Error still persists
#await page.waitForNavigation();
try:
source_code = await page.content()
except:
return(False)
# title
title = await page.title()
title = title[:1000]
# get all the frames
frames = page.frames
content = ""
for frame in frames:
content_new = await frame.content();
content += content_new
await page.close()
这种反复出现的错误的可能原因是什么?