0

我正在尝试从下面的一段代码中实现逻辑,该代码使用 aiohttp 向谷歌搜索发出请求,我的解决方案似乎是等效的,但由于某种原因没有按需要设置 cookie。有什么帮助吗?

from http.cookiejar import LWPCookieJar
from urllib.request import Request, urlopen

USER_AGENT = 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)'
cookie_jar = LWPCookieJar(os.path.join(home_folder, '.google-cookie'))
cookie_jar.load()


def get_page(url, user_agent=None, verify_ssl=True):
    if user_agent is None:
        user_agent = USER_AGENT
    request = Request(url)
    request.add_header('User-Agent', user_agent)
    cookie_jar.add_cookie_header(request)
    response = urlopen(request)
    cookie_jar.extract_cookies(response, request)
    html = response.read()
    response.close()
    try:
        cookie_jar.save()
    except Exception:
        pass
    return html

我的解决方案:

import aiohttp

USER_AGENT = 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)'
abs_cookie_jar = aiohttp.CookieJar()
abs_cookie_jar.load('.aiogoogle-cookie')


async def get_page(url, user_agent=None, verify_ssl=True):
    if user_agent is None:
        user_agent = USER_AGENT
    async with aiohttp.ClientSession(headers={'User-Agent': user_agent}, cookie_jar=abs_cookie_jar) as session:
        response = await session.get(url)
        if response.cookies:
            abs_cookie_jar.update_cookies(cookies=response.cookies)
            abs_cookie_jar.save('.aiogoogle-cookie')
        html = await response.text()
    return html
4

1 回答 1

1

What happens is when you head to google.com you are getting redirected. As a result, 3 HTTP requests are performed with response codes 301, 302, 200 (You can display them by accessing response.history attribute).

The Set-Cookie header is added to the first response, but what you have in response variable is the last one, which does not contain cookies.

The update part in your implementation: abs_cookie_jar.update_cookies(cookies=response.cookies) is not needed as aiohttp does that automatically for all requests see source.

How your solution could be fixed:

import aiohttp, asyncio

USER_AGENT = 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0)'
abs_cookie_jar = aiohttp.CookieJar()
abs_cookie_jar.load('.aiogoogle-cookie')

async def get_page(url, user_agent=None, verify_ssl=True):
    if user_agent is None:
        user_agent = USER_AGENT
    async with aiohttp.ClientSession(headers={'User-Agent': user_agent}, cookie_jar=abs_cookie_jar) as session:
        response = await session.get(url)

        html = await response.text()

        # display redirect responses
        for resp in response.history:
            print(resp)

        # print cookies for human readable format
        for cookie in abs_cookie_jar:
            print(cookie)

        # save jar which already have response cookies
        abs_cookie_jar.save('.aiogoogle-cookie')

    return html

loop = asyncio.get_event_loop()

loop.run_until_complete(get_page('https://google.com'))
于 2021-08-21T16:44:35.220 回答