32

这是脚本:

import requests
import json
import urlparse
from requests.adapters import HTTPAdapter

s = requests.Session()
s.mount('http://', HTTPAdapter(max_retries=1))

with open('proxies.txt') as proxies:
    for line in proxies:
        proxy=json.loads(line)

    with open('urls.txt') as urls:
        for line in urls:

            url=line.rstrip()
            data=requests.get(url, proxies=proxy)
            data1=data.content
            print data1
            print {'http': line}

如您所见,它试图通过代理列表访问 url 列表。这是 urls.txt 文件:

http://api.exip.org/?call=ip

这是 proxies.txt 文件:

{"http":"http://107.17.92.18:8080"}

我在 www.hidemyass.com 获得了这个代理。它可能是一个糟糕的代理吗?我已经尝试了几次,这就是结果。注意:如果您尝试复制此内容,您可能需要在 hidemyass.com 上将代理更新为最近的代理。他们似乎最终停止工作。这是完整的错误和回溯:

Traceback (most recent call last):
  File "test.py", line 17, in <module>
    data=requests.get(url, proxies=proxy)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 55, in get
    return request('get', url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 44, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 335, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 454, in send
    history = [resp for resp in gen] if allow_redirects else []
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 144, in resolve_redirects
    allow_redirects=False,
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 438, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 327, in send
    raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPConnectionPool(host=u'219.231.143.96', port=18186): Max retries exceeded with url: http://www.google.com/ (Caused by <class 'httplib.BadStatusLine'>: '')
4

4 回答 4

36

查看您提供的堆栈跟踪,您的错误是由httplib.BadStatusLine异常引起的,根据docs,它是:

如果服务器以我们不理解的 HTTP 状态代码响应,则引发。

换句话说,代理服务器返回的东西(如果返回的话)不能被执行实际请求的 httplib 解析。

根据我在(编写)http 代理方面的经验,我可以说某些实现可能不会过于严格地遵循规范(http 上的 rfc 规范实际上并不容易阅读),或者使用 hack 来修复在其实现中存在缺陷的旧浏览器。

所以,回答这个问题:

它可能是一个糟糕的代理吗?

...我会说 - 这是可能的。唯一确定的方法是查看代理服务器返回的内容。

尝试使用调试器或抓取数据包嗅探器(例如WiresharkNetwork Monitor)对其进行调试,以分析网络中发生的情况。了解代理服务器返回的确切信息应该可以为您提供解决此问题的关键。

于 2013-09-07T20:48:20.993 回答
8

也许您在短时间内发送太多请求而使代理服务器超载,您说您从一个流行的免费代理网站获得代理,这意味着您不是唯一使用该服务器的人,而且它通常很重加载。

如果您在请求之间添加一些延迟,如下所示:

from time import sleep

[...]

data=requests.get(url, proxies=proxy)
data1=data.content
print data1
print {'http': line}
sleep(1)

(注意sleep(1)它会暂停代码的执行一秒钟)

行得通吗?

于 2013-09-12T13:55:50.310 回答
2
def hello(self):
    self.s = requests.Session()
    self.s.headers.update({'User-Agent': self.user_agent})
    return True

试试这个,它对我有用:)

于 2018-11-20T10:23:10.907 回答
0

当您向 的公共 IP 地址发送太多请求时,就会发生这种情况https://anydomainname.example.com/。如您所见,它是由于某些原因导致的,该原因不允许/阻止使用https://anydomainname.example.com/. 一个更好的解决方案是以下 python 脚本,它计算任何域的公共 IP 地址并创建到 /etc/hosts 文件的映射。

import re
import socket
import subprocess
from typing import Tuple

ENDPOINT = 'https://anydomainname.example.com/'

def get_public_ip() -> Tuple[str, str, str]:
    """
    Command to get public_ip address of host machine and endpoint domain
    Returns
    -------
    my_public_ip : str
        Ip address string of host machine.
    end_point_ip_address : str
        Ip address of endpoint domain host.
    end_point_domain : str
        domain name of endpoint.

    """
    # bash_command = """host myip.opendns.com resolver1.opendns.com | \
    #     grep "myip.opendns.com has" | awk '{print $4}'"""
    # bash_command = """curl ifconfig.co"""
    # bash_command = """curl ifconfig.me"""
    bash_command = """ curl icanhazip.com"""
    my_public_ip = subprocess.getoutput(bash_command)
    my_public_ip = re.compile("[0-9.]{4,}").findall(my_public_ip)[0]
    end_point_domain = (
        ENDPOINT.replace("https://", "")
        .replace("http://", "")
        .replace("/", "")
    )
    end_point_ip_address = socket.gethostbyname(end_point_domain)
    return my_public_ip, end_point_ip_address, end_point_domain


def set_etc_host(ip_address: str, domain: str) -> str:
    """
    A function to write mapping of ip_address and domain name in /etc/hosts.
    Ref: https://stackoverflow.com/questions/38302867/how-to-update-etc-hosts-file-in-docker-image-during-docker-build

    Parameters
    ----------
    ip_address : str
        IP address of the domain.
    domain : str
        domain name of endpoint.

    Returns
    -------
    str
        Message to identify success or failure of the operation.

    """
    bash_command = """echo "{}    {}" >> /etc/hosts""".format(ip_address, domain)
    output = subprocess.getoutput(bash_command)
    return output


if __name__ == "__main__":
    my_public_ip, end_point_ip_address, end_point_domain = get_public_ip()
    output = set_etc_host(ip_address=end_point_ip_address, domain=end_point_domain)
    print("My public IP address:", my_public_ip)
    print("ENDPOINT public IP address:", end_point_ip_address)
    print("ENDPOINT Domain Name:", end_point_domain )
    print("Command output:", output)

您可以在运行所需的功能之前调用上述脚本:)

于 2021-09-26T12:14:42.083 回答