python - 在 POST 之前读取页面源

Question

我想知道是否有办法在阅读页面源后发布参数。例如：在发布 ID 之前阅读验证码#

我当前的代码：

import requests
id_number = "1"
url = "http://www.submitmyforum.com/page.php"
data = dict(id = id_number, name = 'Alex')
post = requests.post(url, data=data)

每次向http://submitforum.com/page.php请求（obv 不是真实站点）后，都有一个可更改的验证码我想读取该参数并将其提交给“数据”变量。

score 0 · Accepted Answer

正如 OP 评论中所讨论的，可以使用 selenium，也可能存在没有浏览器仿真的方法！

使用 Selenium ( http://selenium-python.readthedocs.io/ ) 而不是 requests 模块方法：

import re
import selenium
from selenium import webdriver

regexCaptcha = "k=.*&co="
url = "http://submitforum.com/page.php"

# Get to the URL
browser = webdriver.Chrome()
browser.get(url)

# Example for getting page elements (using css seletors)
# In this example, I'm getting the google recaptcha ID if present on the current page
try:
    element = browser.find_element_by_css_selector('iframe[src*="https://www.google.com/recaptcha/api2/anchor?k"]')
    captchaID = re.findall(regexCaptcha, element.get_attribute("src"))[0].replace("k=", "").replace("&co=", "")
    captchaFound = True
    print "Captcha found !", captchaID
except Exception, ex:
    print "No captcha found !"
    captchaFound = False

# Treat captcha
# --> Your treatment code

# Enter Captcha Response on page
captchResponse = browser.find_element_by_id('captcha-response')
captchResponse.send_keys(captcha_answer)

# Validate the form
validateButton = browser.find_element_by_id('submitButton')
validateButton.click()

# --> Analysis of returned page if needed

python - 在 POST 之前读取页面源

1 回答 1

Related

Reference