python - 限制 Python 上每秒的 HTTP 请求数

Question

我编写了一个从文件中获取 URL 并同时向所有 URL 发送 HTTP 请求的脚本。我现在想限制会话中每秒 HTTP 请求的数量和每个接口（、等）的eth0带宽eth1。有没有办法在 Python 上实现这一点？

score 3 · Accepted Answer

您可以使用 Semaphore 对象，它是标准 Python 库的一部分： python doc

或者，如果您想直接使用线程，您可以使用 wait([timeout])。

没有与 Python 捆绑在一起的库可以在以太网或其他网络接口上工作。你可以去的最低的是socket。

根据您的回复，这是我的建议。注意active_count。仅使用它来测试您的脚本是否只运行两个线程。那么在这种情况下，它们将是三个，因为第一个是您的脚本，然后您有两个 URL 请求。

import time
import requests
import threading

# Limit the number of threads.
pool = threading.BoundedSemaphore(2)

def worker(u):
    # Request passed URL.
    r = requests.get(u)
    print r.status_code
    # Release lock for other threads.
    pool.release()
    # Show the number of active threads.
    print threading.active_count()

def req():
    # Get URLs from a text file, remove white space.
    urls = [url.strip() for url in open('urllist.txt')]
    for u in urls:
        # Thread pool.
        # Blocks other threads (more than the set limit).
        pool.acquire(blocking=True)
        # Create a new thread.
        # Pass each URL (i.e. u parameter) to the worker function.
        t = threading.Thread(target=worker, args=(u, ))
        # Start the newly create thread.
        t.start()

req()

score 0 · Accepted Answer

您可以使用文档中描述的工作人员概念： https ://docs.python.org/3.4/library/queue.html

在您的工作人员中添加一个 wait() 命令，让他们在请求之间等待（在文档中的示例中：在 task_done 之后的“while true”内）。

示例：5 个“Worker”-请求之间等待时间为 1 秒的线程每秒执行的抓取次数将少于 5 次。

score 0 · Accepted Answer

请注意，以下解决方案仍以串行方式发送请求，但会限制 TPS（每秒事务数）

TLDR；有一个类可以计算当前秒内仍然可以进行的呼叫次数。每秒每拨打一次电话并重新填充一次，它就会递减。

import time
from multiprocessing import Process, Value

# Naive TPS regulation

# This class holds a bucket of tokens which are refilled every second based on the expected TPS
class TPSBucket:

    def __init__(self, expected_tps):
        self.number_of_tokens = Value('i', 0)
        self.expected_tps = expected_tps
        self.bucket_refresh_process = Process(target=self.refill_bucket_per_second) # process to constantly refill the TPS bucket

    def refill_bucket_per_second(self):
        while True:
            print("refill")
            self.refill_bucket()
            time.sleep(1)

    def refill_bucket(self):
        self.number_of_tokens.value = self.expected_tps
        print('bucket count after refill', self.number_of_tokens)

    def start(self):
        self.bucket_refresh_process.start()

    def stop(self):
        self.bucket_refresh_process.kill()

    def get_token(self):
        response = False
        if self.number_of_tokens.value > 0:
            with self.number_of_tokens.get_lock():
                if self.number_of_tokens.value > 0:
                    self.number_of_tokens.value -= 1
                    response = True

        return response

def test():
    tps_bucket = TPSBucket(expected_tps=1) ## Let's say I want to send requests 1 per second
    tps_bucket.start()
    total_number_of_requests = 60 ## Let's say I want to send 60 requests
    request_number = 0
    t0 = time.time()
    while True:
        if tps_bucket.get_token():
            request_number += 1

            print('Request', request_number) ## This is my request

            if request_number == total_number_of_requests:
                break

    print (time.time() - t0, ' time elapsed') ## Some metrics to tell my how long every thing took
    tps_bucket.stop()


if __name__ == "__main__":
    test()

python - 限制 Python 上每秒的 HTTP 请求数

3 回答 3

Related

Reference