1

我正在尝试使用 selenium 和 python 来实现 2captcha。

我只是从他们的文档中复制了示例:
https ://github.com/2captcha/2captcha-api-examples/blob/master/ReCaptcha%20v2%20API%20Examples/Python%20Example/2captcha_python_api_example.py

这是我的代码:

from selenium import webdriver
from time import sleep
from selenium.webdriver.support.select import Select
import requests

driver = webdriver.Chrome('chromedriver.exe')
driver.get('the_url')

current_url = driver.current_url



captcha = driver.find_element_by_id("captcha-box")
captcha2 = captcha.find_element_by_xpath("//div/div/iframe").get_attribute("src")
captcha3 = captcha2.split('=')
#print(captcha3[2])

# Add these values
API_KEY = 'my_api_key'  # Your 2captcha API KEY
site_key = captcha3[2]  # site-key, read the 2captcha docs on how to get this
url = current_url  # example url
proxy = 'Myproxy'  # example proxy

proxy = {'http': 'http://' + proxy, 'https': 'https://' + proxy}

s = requests.Session()

# here we post site key to 2captcha to get captcha ID (and we parse it here too)
captcha_id = s.post("http://2captcha.com/in.php?key={}&method=userrecaptcha&googlekey={}&pageurl={}".format(API_KEY, site_key, url), proxies=proxy).text.split('|')[1]
# then we parse gresponse from 2captcha response
recaptcha_answer = s.get("http://2captcha.com/res.php?key={}&action=get&id={}".format(API_KEY, captcha_id), proxies=proxy).text
print("solving ref captcha...")
while 'CAPCHA_NOT_READY' in recaptcha_answer:
    sleep(5)
    recaptcha_answer = s.get("http://2captcha.com/res.php?key={}&action=get&id={}".format(API_KEY, captcha_id), proxies=proxy).text
recaptcha_answer = recaptcha_answer.split('|')[1]

# we make the payload for the post data here, use something like mitmproxy or fiddler to see what is needed
payload = {
    'key': 'value',
    'gresponse': recaptcha_answer  # This is the response from 2captcha, which is needed for the post request to go through.
    }


# then send the post request to the url
response = s.post(url, payload, proxies=proxy)

# And that's all there is to it other than scraping data from the website, which is dynamic for every website.

这是我的错误:

解决 ref captcha...
Traceback(最近一次调用最后一次):
文件“main.py”,第 38 行,
recaptcha_answer = recaptcha_answer.split('|')[1]
IndexError: list index out of range

验证码正在得到解决,因为我可以在 2captcha 仪表板上看到它,所以如果它是官方文档,那么错误是什么?

编辑: 对于一些未经修改的我得到验证码解决形式 2captcha 但后来我得到这个错误:

solving ref captcha...
OK|this_is_the_2captch_answer
Traceback (most recent call last):
  File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connectionpool.py", line 594, in urlopen
    self._prepare_proxy(conn)
  File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connectionpool.py", line 805, in _prepare_proxy
    conn.connect()
  File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connection.py", line 308, in connect
    self._tunnel()
  File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\http\client.py", line 906, in _tunnel
    (version, code, message) = response._read_status()
  File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\http\client.py", line 278, in _read_status
    raise BadStatusLine(line)
http.client.BadStatusLine: <html>


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\adapters.py", line 449, in send
    timeout=timeout
  File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connectionpool.py", line 638, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\util\retry.py", line 368, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\packages\six.py", line 685, in reraise
    raise value.with_traceback(tb)
  File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connectionpool.py", line 594, in urlopen
    self._prepare_proxy(conn)
  File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connectionpool.py", line 805, in _prepare_proxy
    conn.connect()
  File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\urllib3\connection.py", line 308, in connect
    self._tunnel()
  File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\http\client.py", line 906, in _tunnel
    (version, code, message) = response._read_status()
  File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\http\client.py", line 278, in _read_status
    raise BadStatusLine(line)
urllib3.exceptions.ProtocolError: ('Connection aborted.', BadStatusLine('<html>\r\n'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "main.py", line 49, in <module>
    response = s.post(url, payload, proxies=proxy)
  File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\sessions.py", line 581, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "C:\Users\Usuari\AppData\Local\Programs\Python\Python37-32\lib\site-packages\requests\adapters.py", line 498, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', BadStatusLine('<html>\r\n'))

为什么我收到此错误?

我设置为 site_key = current_url_where_captcha_is_located

这个对吗??

4

2 回答 2

1

看起来您没有提供任何有效的代理连接参数,而是requests在连接到 API 时将此代理传递给。

只需评论这两行:

#proxy = 'Myproxy'  # example proxy
#proxy = {'http': 'http://' + proxy, 'https': 'https://' + proxy}

然后proxies=proxy从四行中删除:

captcha_id = s.post("http://2captcha.com/in.php?key={}&method=userrecaptcha&googlekey={}&pageurl={}".format(API_KEY, site_key, url)).text.split('|')[1]
recaptcha_answer = s.get("http://2captcha.com/res.php?key={}&action=get&id={}".format(API_KEY, captcha_id)).text
recaptcha_answer = s.get("http://2captcha.com/res.php?key={}&action=get&id={}".format(API_KEY, captcha_id)).text
response = s.post(url, payload, proxies=proxy)
于 2019-08-01T13:42:55.770 回答
1

在您尝试调用它之前,使用您的调试器或print(recaptcha_answer)在错误行之前放置一个以查看它的值是什么。字符串中没有,因此当您尝试使用它获取结果列表的第二个元素时会失败。recaptcha_answer.split('|')|[1]

于 2019-04-27T14:11:42.010 回答