cookies - 使用 Asyncio 支持会话 cookie

Question

我在 Python 中制作了一个需要与 Asyncio 一起使用的抓取脚本。我还需要它来支持网络 cookie。最初使用 urllib.request 制作，脚本如下所示：

urls = ['example.com/subdomain-A', 'example.com/subdomain-B', 'example.com/subdomain-C', ...]
for url in urls:
    page = bs4.BeautifulSoup(urllib.request.build_opener(req.HTTPCookieProcessor).open(url))
    # Do some stuff with page

这目前工作正常，但我还需要使用 Asyncio 使其成为多线程。由于我没有找到任何有关它的文档，因此我尝试了以下方法：

@asyncio.coroutine
def get(*args, **kwargs):
    response = yield from aiohttp.request('GET', *args, **kwargs)
    res = yield from response.read()
    return res
        
@asyncio.coroutine
def scraper(url):
    connector = aiohttp.connector.BaseConnector(share_cookies=True)
    response = yield from get(url, connector=connector)
    page = bs4.BeautifulSoup(response)
    # Do some stuff with page

urls = ['example.com/subdomain-A', 'example.com/subdomain-B', 'example.com/subdomain-C', ...]
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
tasks = [scraper(url) for url in urls]
loop.run_until_complete(asyncio.wait(tasks))

作为上下文，我试图抓取的页面是这样制作的，它们在加载时测试会话 cookie 的存在，如果不存在则创建一个，然后重定向到它们自己。现在，通过简单的抓取方法，我陷入了一个循环，并且使用 Asyncio/Aiohttp，什么都不会发生。

cookies - 使用 Asyncio 支持会话 cookie

0 回答 0

Related

Reference