0

我正在尝试登录并抓取评分网站。我设置了以下代码来访问该网站并输入有效负载: - 用户名/电子邮件 - 密码 - csrf_token 是否需要在有效负载中包含其他信息才能登录?

我正在使用python 2.7。我添加了代码来打印脚本打开的最后一页,它打印出登录页面,让我认为它从未成功登录。

import requests
from lxml import html

payload = {
    "username": "...",
    "password": "...",
    "csrf_token": "ImE2N2E1YzkzZGU2ZjY3NjQ0YTc4YmZiYWJjNWRiN2Y3MjlhYWZmYjQi.XBvDVg.ALSRF6Ui7Y2L7ST0kQG-CC4HTzQ"
}

session_requests = requests.session()

login_url = "https://www.zipgrade.com/login"
user_url = 'https://www.zipgrade.com/user'

result = session_requests.get(login_url)

# make HTML parse tree from page
tree = html.fromstring(result.text)
authenticity_token = 
list(set(tree.xpath("//input[@name='csrf_token']")))[0]

# send payload through
result = session_requests.post(
    login_url,
    data = payload,
    headers = dict(referer=login_url)
)

result = session_requests.get(
    user_url,
    headers = dict(referer = user_url)
)

tree = html.fromstring(result.content)
bucket_names = tree.xpath("//div[@class='row']")
print(result.ok)

print(bucket_names[0].text_content().strip())

我希望它带我到“ https://www.zipgrade.com/user ”页面,但它看起来像是停留在“ https://www.zipgrade.com/login ”页面上。

4

1 回答 1

0

嗯..似乎在 cookie 标头中传递了一个会话令牌;我只是试图模仿登录,我的请求如下所示:

import http.client

conn = http.client.HTTPConnection("www,zipgrade,com")

payload = "username=some%40email.com&password=some%40password&csrf_token=IjhmNWU1Y2EzYWExMjcwM2FiZmY5MjEzOGUwNDQ2N2UxZWQ4ODY0OTMi.XBwSeg.RU2oZBM15U7-ECl1Ldfv7LYlcnQ%5E&origURL="

headers = {
    'Connection': "keep-alive",
    'Cache-Control': "max-age=0",
    'Origin': "https://www.zipgrade.com",
    'Upgrade-Insecure-Requests': "1",
    'Content-Type': "application/x-www-form-urlencoded",
    'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36",
    'Accept': "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
    'Referer': "https://www.zipgrade.com/login/",
    'Accept-Encoding': "gzip, deflate, br",
    'Accept-Language': "en-US,en;q=0.9",
    'Cookie': "session=eyJfcGVybWFuZW50Ijp0cnVlLCJjc3JmX3Rva2VuIjp7IiBiIjoiT0dZMVpUVmpZVE5oWVRFeU56QXpZV0ptWmpreU1UTTRaVEEwTkRZM1pURmxaRGc0TmpRNU13PT0ifX0.XBwSeg.EPMMH0CcBMif4qUoxGPKFvcnzRw",
    'cache-control': "no-cache",
    'Postman-Token': "865a89b0-c5cc-49b1-9e24-df413be64fc0"
    }

conn.request("POST", "login,", payload, headers)

res = conn.getresponse()
data = res.read()

print(data.decode("utf-8"))

请注意,您的有效载荷是正确的;您正在正确传递参数;但是在标头中传递了一个会话;您需要获取会话令牌并将其与您的标头一起传递;

我会提出两个请求,一个是对登录页面https://www.zipgrade.com/login/的普通请求,它将返回一个包含您需要的会话参数的 cookie;解析cookie并提取会话;完成后恢复到您的抓取功能并确保使用该会话更新标头变量;

当您敲击会话的 URL 时,您可以同时从隐藏的输入字段中获取 csrf 令牌,例如:

这样,您的第一个电话就可以为您的抓取电话做好准备;通过从 cookie 和隐藏输入字段中收集动态标记。

请记住,不同站点上的会话有不同的到期时间;一些会话令牌可用于多页抓取,而另一些则需要在每次跳转时获取一个新会话。只是一个提示;但我认为这会引导你走向正确的方向。

于 2018-12-20T22:14:25.277 回答