0

我正在抓取类似网页的列表,有时会出错(见最后)。

我使用的代码:

from requests_html import HTMLSession    
import pyppdf.patch_pyppeteer

link = 'https://www.wildberries.ru/catalog/1588749/detail.aspx?targetUrl=BP'
# It's always a different link from the list, but here I simplified it.

session = HTMLSession()
resp = session.get(link)
resp.html.render()

大多数页面不会导致错误,但少数页面会导致错误。错误出现在resp = session.get(link)或上resp.html.render()。这里是:

Traceback (most recent call last):
  File "/Users/max/Dropbox/WORK/projects/wildberries_parser/parsers/catalog_parser_3.py", line 133, in <module>
    row = parse_item_page(link)
  File "/Users/max/Dropbox/WORK/projects/wildberries_parser/parsers/catalog_parser_3.py", line 36, in parse_item_page
    resp.html.render()
  File "/Users/max/opt/anaconda3/envs/wildberries_parser/lib/python3.6/site-packages/requests_html.py", line 598, in render
    content, result, page = self.session.loop.run_until_complete(self._async_render(url=self.url, script=script, sleep=sleep, wait=wait, content=self.html, reload=reload, scrolldown=scrolldown, timeout=timeout, keep_page=keep_page))
  File "/Users/max/opt/anaconda3/envs/wildberries_parser/lib/python3.6/asyncio/base_events.py", line 488, in run_until_complete
    return future.result()
  File "/Users/max/opt/anaconda3/envs/wildberries_parser/lib/python3.6/site-packages/requests_html.py", line 512, in _async_render
    await page.goto(url, options={'timeout': int(timeout * 1000)})
  File "/Users/max/opt/anaconda3/envs/wildberries_parser/lib/python3.6/site-packages/pyppeteer/page.py", line 856, in goto
    raise PageError(result)
pyppeteer.errors.PageError: net::ERR_NAME_NOT_RESOLVED at https://www.wildberries.ru/catalog/1588749/detail.aspx?targetUrl=BP

我无法理解,也没有自己弄清楚。你能告诉我,这是怎么回事吗?

4

1 回答 1

1

ERR_NAME_NOT_RESOLVED表示从名称解析 IP 地址时出现问题。这可能是您的计算机、路由器或 DNS 解析器的问题。

您可能想尝试将您的 DNS 提供商更改为 Google 的(8.8.8.8 和 8.8.4.4)。

于 2020-04-30T20:52:46.683 回答