Before laying my question bare, some context is needed. I'm trying to issue HTTP GET and POST requests to a website, with the following caveats:
- Redirects are expected
- Cookies are required
- Requests must pass through a SOCKS proxy (v4a)
Up until now, I've been using twisted.web.client.Agent and it's subclasses (e.g. BrowserLikeRedirectAgent), but unfortunately it seems as though SOCKS proxies are not supported yet (and ProxyAgent is a no-go because this class is for HTTP proxies).
I stumbled upon twisted-socks, which seems to allow me to do what I want, but I noticed that it uses HttpClientFactory instead of agent, hence my question: what is the difference between HttpClientFactory and Agent and when should I use each one?
Below is some example code using twisted-socks. I have two additional questions:
How can I use cookies in this example? I tried passing a
dictand acookielib.CookieJarinstance toHttpClientFactory'scookieskwarg, but this raises an error (something about a string being expected... how on earth do I send cookies as a string?)Can this code be refactored to use
Agent? This would be ideal, as I already have a reasonably large codebase that is written withAgentin mind.
```
import sys
from urlparse import urlparse
from twisted.internet import reactor, endpoints
from socksclient import SOCKSv4ClientProtocol, SOCKSWrapper
from twisted.web import client
class mything:
def __init__(self):
self.npages = 0
self.timestamps = {}
def wrappercb(self, proxy):
print "connected to proxy", proxy
def clientcb(self, content):
print "ok, got: %s" % content[:120]
print "timetamps " + repr(self.timestamps)
self.npages -= 1
if self.npages == 0:
reactor.stop()
def sockswrapper(self, proxy, url):
dest = urlparse(url)
assert dest.port is not None, 'Must specify port number.'
endpoint = endpoints.TCP4ClientEndpoint(reactor, dest.hostname, dest.port)
return SOCKSWrapper(reactor, proxy[1], proxy[2], endpoint, self.timestamps)
def main():
thing = mything()
# Mandatory first argument is a URL to fetch over Tor (or whatever
# SOCKS proxy that is running on localhost:9050).
url = sys.argv[1]
proxy = (None, 'localhost', 9050, True, None, None)
f = client.HTTPClientFactory(url)
f.deferred.addCallback(thing.clientcb)
sw = thing.sockswrapper(proxy, url)
d = sw.connect(f)
d.addCallback(thing.wrappercb)
thing.npages += 1
reactor.run()
if '__main__' == __name__:
main()
```