2

出于个人兴趣,我试图回答以下问题: 在 Python 中发送 100,000 个 HTTP 请求的最快方法是什么?

这就是我到目前为止所提出的,但我正在经历一些非常奇怪的事情。

installSignalHandlersTrue时,它​​只是挂起。我可以看到DelayedCall实例在reactor._newTimedCalls,但从processResponse未被调用。

installSignalHandlersFalse时,它​​会引发错误并正常工作。

from twisted.internet import reactor
from twisted.web.client import Agent
from threading import Semaphore, Thread
import time

concurrent = 100
s = Semaphore(concurrent)
reactor.suggestThreadPoolSize(concurrent)
t=Thread(
    target=reactor.run,
    kwargs={'installSignalHandlers':True})
t.daemon=True
t.start()


agent = Agent(reactor)


def processResponse(response,url):
    print response.code, url
    s.release()

def processError(response,url):
    print "error", url
    s.release()

def addTask(url):
    req = agent.request('HEAD', url)
    req.addCallback(processResponse, url)
    req.addErrback(processError, url)


for url in open('urllist.txt'):
    addTask(url.strip())    
    s.acquire()
while s._Semaphore__value!=concurrent:
    time.sleep(0.1)     

reactor.stop()

这是当 installSignalHandlers 为 True 时引发的错误:(注意:这是预期的行为!问题是为什么当 installSignalHandlers 为 False 时它不起作用。)

Traceback (most recent call last):
  File "/usr/lib/python2.6/dist-packages/twisted/internet/base.py", line 396, in fireEvent
    DeferredList(beforeResults).addCallback(self._continueFiring)
  File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 224, in addCallback
    callbackKeywords=kw)
  File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 213, in addCallbacks
    self._runCallbacks()
  File "/usr/lib/python2.6/dist-packages/twisted/internet/defer.py", line 371, in _runCallbacks
    self.result = callback(self.result, *args, **kw)
--- <exception caught here> ---
  File "/usr/lib/python2.6/dist-packages/twisted/internet/base.py", line 409, in _continueFiring
    callable(*args, **kwargs)
  File "/usr/lib/python2.6/dist-packages/twisted/internet/base.py", line 1165, in _reallyStartRunning
    self._handleSignals()
  File "/usr/lib/python2.6/dist-packages/twisted/internet/base.py", line 1105, in _handleSignals
    signal.signal(signal.SIGINT, self.sigInt)
exceptions.ValueError: signal only works in main thread

我做错了什么,正确的方法是什么?我是扭曲的新手。

@moshez:谢谢。它现在有效:

from twisted.internet import reactor, threads
from urlparse import urlparse
import httplib
import itertools


concurrent = 100
finished=itertools.count(1)
reactor.suggestThreadPoolSize(concurrent)

def getStatus(ourl):
    url = urlparse(ourl)
    conn = httplib.HTTPConnection(url.netloc)   
    conn.request("HEAD", url.path)
    res = conn.getresponse()
    return res.status

def processResponse(response,url):
    print response, url
    processedOne()

def processError(error,url):
    print "error", url#, error
    processedOne()

def processedOne():
    if finished.next()==added:
        reactor.stop()

def addTask(url):
    req = threads.deferToThread(getStatus, url)
    req.addCallback(processResponse, url)
    req.addErrback(processError, url)   

added=0
for url in open('urllist.txt'):
    added+=1
    addTask(url.strip())

try:
    reactor.run()
except KeyboardInterrupt:
    reactor.stop()
4

1 回答 1

6

您从主线程中使用了太多的“反应器调用”(例如,agent.request 很有可能调用反应器)。我不确定这是否是您的问题,但它仍然不受支持——从非反应器线程进行的唯一反应器调用是 reactor.callFromThread。

此外,整个架构看起来很奇怪。为什么不在主线程上运行反应器?读取包含 10,000 个请求的整个文件并拆分它们,从反应器中执行应该不是问题,即使您一次完成所有操作。

您可能会遇到不使用任何线程的纯 Twisted 解决方案。

于 2010-04-14T01:27:21.443 回答