嘿,我在 python playwright 中有用于获取页面源的代码:
import json
import sys
import bs4
import urllib.parse
from bs4 import BeautifulSoup
server_proxy = urllib.parse.unquote(sys.argv[1])
link = urllib.parse.unquote(sys.argv[2])
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
#browser = p.chromium.launch(headless = False)
browser = p.chromium.launch(proxy={"server": server_proxy,'username': 'xxx',"password": 'xxx' })
context = browser.new_context(user_agent='Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.114 Safari/537.36')
page = context.new_page()
cookie_file = open('cookies_tessco.json')
cookies = json.load(cookie_file)
context.add_cookies(cookies)
page.goto(link)
try:
page.wait_for_timeout(10000)
cont = page.content()
print(cont)
page.close()
context.close()
browser.close()
except Exception as e:
print("Error in playwright script." + page)
page.close()
context.close()
browser.close()
这可以正常工作,但有时我会收到此错误:
Traceback (most recent call last):
File "page_tessco.py", line 17, in <module>
page.goto(link)
File "/usr/local/lib/python3.9/site-packages/playwright/sync_api/_generated.py", line 5774, in goto
self._sync(
File "/usr/local/lib/python3.9/site-packages/playwright/_impl/_sync_base.py", line 103, in _sync
return task.result()
File "/usr/local/lib/python3.9/site-packages/playwright/_impl/_page.py", line 464, in goto
return await self._main_frame.goto(**locals_to_params(locals()))
File "/usr/local/lib/python3.9/site-packages/playwright/_impl/_frame.py", line 117, in goto
await self._channel.send("goto", locals_to_params(locals()))
File "/usr/local/lib/python3.9/site-packages/playwright/_impl/_connection.py", line 36, in send
return await self.inner_send(method, params, False)
File "/usr/local/lib/python3.9/site-packages/playwright/_impl/_connection.py", line 47, in inner_send
result = await callback.future
playwright._impl._api_types.TimeoutError: Timeout 30000ms exceeded.
=========================== logs ===========================
navigating to "https://www.tessco.com/product/207882", waiting until "load"
我试图添加
page.wait_for_timeout(10000)
但是,这些错误有时会出现,有什么帮助,我也很困惑为什么这个错误只是偶尔出现,是什么导致这个错误,如果有人有经验请分享?