Before laying my question bare, some context is needed. I'm trying to issue HTTP GET and POST requests to a website, with the following caveats:

  • Redirects are expected
  • Cookies are required
  • Requests must pass through a SOCKS proxy (v4a)

Up until now, I've been using twisted.web.client.Agent and it's subclasses (e.g. BrowserLikeRedirectAgent), but unfortunately it seems as though SOCKS proxies are not supported yet (and ProxyAgent is a no-go because this class is for HTTP proxies).

I stumbled upon twisted-socks, which seems to allow me to do what I want, but I noticed that it uses HttpClientFactory instead of agent, hence my question: what is the difference between HttpClientFactory and Agent and when should I use each one?

Below is some example code using twisted-socks. I have two additional questions:

  1. How can I use cookies in this example? I tried passing a dict and a cookielib.CookieJar instance to HttpClientFactory's cookies kwarg, but this raises an error (something about a string being expected... how on earth do I send cookies as a string?)

  2. Can this code be refactored to use Agent? This would be ideal, as I already have a reasonably large codebase that is written with Agent in mind.


import sys
from urlparse import urlparse
from twisted.internet import reactor, endpoints
from socksclient import SOCKSv4ClientProtocol, SOCKSWrapper
from twisted.web import client

class mything:
    def __init__(self):
        self.npages = 0
        self.timestamps = {}

    def wrappercb(self, proxy):
        print "connected to proxy", proxy

    def clientcb(self, content):
        print "ok, got: %s" % content[:120]
        print "timetamps " + repr(self.timestamps)
        self.npages -= 1
        if self.npages == 0:

    def sockswrapper(self, proxy, url):
        dest = urlparse(url)
        assert dest.port is not None, 'Must specify port number.'
        endpoint = endpoints.TCP4ClientEndpoint(reactor, dest.hostname, dest.port)
        return SOCKSWrapper(reactor, proxy[1], proxy[2], endpoint, self.timestamps)

def main():
    thing = mything()

    # Mandatory first argument is a URL to fetch over Tor (or whatever
    # SOCKS proxy that is running on localhost:9050).
    url = sys.argv[1]
    proxy = (None, 'localhost', 9050, True, None, None)

    f = client.HTTPClientFactory(url)
    sw = thing.sockswrapper(proxy, url)
    d = sw.connect(f)
    thing.npages += 1


if '__main__' == __name__:



我认为您通常不会使用 a HTTPClientFactory,因为它似乎只是一个执行 HTTP 请求的东西,仅此而已。这是相当低级的。

如果你只是想触发一个请求,有一些函数 (twisted.web.client.getPage.downloadPage) 可以为你构建工厂,同时处理 HTTP 和 HTTPS。

Agent是一个给你更高层次的抽象的东西:它保持一个连接池,处理基于 url 的 HTTP/HTTPS 选择,处理代理等。没错,这就是你通常想要使用的东西。

似乎他们没有共享太多代码,并且 Agent与旧的HTTP11ClientProtocol(及其协议, )HTTP11ClientFactory一样(和)。所以有一个vs (作为它的公共 API)的二元性。我猜是历史原因和向后兼容性。getPageHTTPClientFactoryHTTPPageGettertwisted.web.client._newclientAgent

无论如何,这个库不能很好地与Agent开箱即用的混合,因为 API 被破坏了。twisted-socksSOCKSWrapper声明它实现了IStreamClientEndpoint接口,但是接口要求该.connect方法返回一个将与IProtocol提供者一起触发的延迟(请参阅文档),同时SOCKSWrapper返回一个与地址触发的延迟(这是执行此操作的行)。看来您可以轻松地将其更改为:


一旦你这样做了,你应该能够使用Agent. 这是一个示例:(使用inlineCallbacks和 new react,但您也可以使用标准的 .addCallback 和 deferreds 和reactor.run()

from twisted.internet.endpoints import TCP4ClientEndpoint
from twisted.internet.defer import inlineCallbacks
from twisted.internet.task import react
from twisted.web.client import ProxyAgent, readBody

from socksclient import SOCKSWrapper

def main(reactor):
    target = TCP4ClientEndpoint(reactor, 'example.com', 80)
    proxy = SOCKSWrapper(reactor, 'localhost', 9050, target)
    agent = ProxyAgent(proxy)
    request = yield agent.request('GET', 'http://example.com/')
    print (yield readBody(request))

此外,还有一个似乎更好用的txsocksx库(并且可以通过 pip 安装!)。API 几乎相同,但是您传递了之前将传递代理端点的目标端点:

from twisted.internet.endpoints import TCP4ClientEndpoint
from twisted.internet.defer import inlineCallbacks
from twisted.internet.task import react
from twisted.web.client import ProxyAgent, readBody

from txsocksx.client import SOCKS5ClientEndpoint

def main(reactor):
    proxy = TCP4ClientEndpoint(reactor, 'localhost', 9050)
    proxied_endpoint = SOCKS5ClientEndpoint('example.com', 80, proxy)
    agent = ProxyAgent(proxied_endpoint)
    request = yield agent.request('GET', 'http://example.com/')
    print (yield readBody(request))
