python-3.x - MechanicalSoup 可以登录到需要 SAML 身份验证的页面吗？

Question

我正在尝试从 SSO（单点登录）站点后面下载一些文件。它似乎是经过 SAML 身份验证的，这就是我卡住的地方。一旦通过身份验证，我将能够执行返回 JSON 的 API 请求，因此无需解释/抓取。

不太确定如何在机械汤中处理这个问题（并且相对不熟悉网络编程），非常感谢您的帮助。

这是我到目前为止所得到的：

import mechanicalsoup
from getpass import getpass
import json

login_url = ...
br = mechanicalsoup.StatefulBrowser()
response = br.open(login_url)
if verbose: print(response)

# provide the username + password
br.select_form('form[id="loginForm"]')
print(br.get_current_form().print_summary()) # Just to see what's there. 
br['UserName'] = input('Email: ')
br['Password'] = getpass()
response = br.submit_selected().text
if verbose: print(response)

此时我得到一个页面，告诉我 javascript 已禁用，我必须单击提交才能继续。所以我这样做：

br.select_form()
response = br.submit_selected().text
if verbose: print(response)

这就是我抱怨状态信息丢失的地方。

输出：

<h2>State information lost</h2>

State information lost, and no way to restart the request<h3>Suggestions for resolving this problem:</h3><ul><li>Go back to the previous page and try again.</li><li>Close the web browser, and try again.</li></ul><h3>This error may be caused by:</h3><ul><li>Using the back and forward buttons in the web browser.</li><li>Opened the web browser with tabs saved from the previous session.</li><li>Cookies may be disabled in the web browser.</li></ul>

我在 SAML 登录背后的抓取中发现的唯一命中都是使用 selenium 方法（有时会下降到请求）。

这可以用机械汤吗？

score 0 · Accepted Answer

我的情况原来需要 Javascript 才能登录。我最初关于进入 SAML 身份验证的问题不是真正的环境。所以这个问题还没有真正得到回答。
感谢@Daniel Hemberger 在评论中帮助我解决这个问题。

在这种情况下，MechanicalSoup 不是正确的工具（由于 Javascript），我最终使用selenium 来通过身份验证，然后使用 requests。

python-3.x - MechanicalSoup 可以登录到需要 SAML 身份验证的页面吗？

1 回答 1

Related

Reference