Before laying my question bare, some context is needed. I'm trying to issue HTTP GET
and POST
requests to a website, with the following caveats:
- Redirects are expected
- Cookies are required
- Requests must pass through a SOCKS proxy (v4a)
Up until now, I've been using twisted.web.client.Agent
and it's subclasses (e.g. BrowserLikeRedirectAgent
), but unfortunately it seems as though SOCKS proxies are not supported yet (and ProxyAgent
is a no-go because this class is for HTTP proxies).
I stumbled upon twisted-socks, which seems to allow me to do what I want, but I noticed that it uses HttpClientFactory
instead of agent, hence my question: what is the difference between HttpClientFactory
and Agent
and when should I use each one?
Below is some example code using twisted-socks. I have two additional questions:
How can I use cookies in this example? I tried passing a
dict
and acookielib.CookieJar
instance toHttpClientFactory
'scookies
kwarg, but this raises an error (something about a string being expected... how on earth do I send cookies as a string?)Can this code be refactored to use
Agent
? This would be ideal, as I already have a reasonably large codebase that is written withAgent
in mind.
```
import sys
from urlparse import urlparse
from twisted.internet import reactor, endpoints
from socksclient import SOCKSv4ClientProtocol, SOCKSWrapper
from twisted.web import client
class mything:
def __init__(self):
self.npages = 0
self.timestamps = {}
def wrappercb(self, proxy):
print "connected to proxy", proxy
def clientcb(self, content):
print "ok, got: %s" % content[:120]
print "timetamps " + repr(self.timestamps)
self.npages -= 1
if self.npages == 0:
reactor.stop()
def sockswrapper(self, proxy, url):
dest = urlparse(url)
assert dest.port is not None, 'Must specify port number.'
endpoint = endpoints.TCP4ClientEndpoint(reactor, dest.hostname, dest.port)
return SOCKSWrapper(reactor, proxy[1], proxy[2], endpoint, self.timestamps)
def main():
thing = mything()
# Mandatory first argument is a URL to fetch over Tor (or whatever
# SOCKS proxy that is running on localhost:9050).
url = sys.argv[1]
proxy = (None, 'localhost', 9050, True, None, None)
f = client.HTTPClientFactory(url)
f.deferred.addCallback(thing.clientcb)
sw = thing.sockswrapper(proxy, url)
d = sw.connect(f)
d.addCallback(thing.wrappercb)
thing.npages += 1
reactor.run()
if '__main__' == __name__:
main()
```