12

我有一个函数 download_all 遍历硬编码的页面列表以按顺序下载它们。但是如果我想根据页面的结果动态添加到列表中,我该怎么做呢?例如下载第一个页面,解析它,然后根据结果将其他页面添加到事件循环中。

@asyncio.coroutine
def download_all():
    first_page = 1
    last_page = 100
    download_list = [download(page_number) for page_number in range(first_page, last_page)]
    gen = asyncio.wait(download_list)
    return gen

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    futures = loop.run_until_complete(download_all())
4

2 回答 2

10

实现此目的的一种方法是使用队列。

#!/usr/bin/python3

import asyncio

try:  
    # python 3.4
    from asyncio import JoinableQueue as Queue
except:  
    # python 3.5
    from asyncio import Queue

@asyncio.coroutine
def do_work(task_name, work_queue):
    while not work_queue.empty():
        queue_item = work_queue.get_nowait()

        # simulate condition where task is added dynamically
        if queue_item % 2 != 0:
            work_queue.put_nowait(2)
            print('Added additional item to queue')

        print('{0} got item: {1}'.format(task_name, queue_item))
        yield from asyncio.sleep(queue_item)
        print('{0} finished processing item: {1}'.format(task_name, queue_item))

if __name__ == '__main__':

    queue = Queue()

    # Load initial jobs into queue
    [queue.put_nowait(x) for x in range(1, 6)] 

    # use 3 workers to consume tasks
    taskers = [ 
        do_work('task1', queue),
        do_work('task2', queue),
        do_work('task3', queue)
    ]   

    loop = asyncio.get_event_loop()
    loop.run_until_complete(asyncio.wait(taskers))
    loop.close()

使用来自 asyncio 的队列,您可以确保工作的“单元”与最初提供给 asyncio 事件循环的任务/期货是分开的。基本上,这允许在某些条件下添加额外的工作“单元”。

请注意,在上面的示例中,偶数编号的任务是终端,因此如果是这种情况,则不会添加额外的任务。这最终会导致所有任务的完成,但在您的情况下,您可以轻松地使用另一个条件来确定是否将另一个项目添加到队列中。

输出:

Added additional item to queue
task2 got item: 1
task1 got item: 2
Added additional item to queue
task3 got item: 3
task2 finished processing item: 1
task2 got item: 4
task1 finished processing item: 2
Added additional item to queue
task1 got item: 5
task3 finished processing item: 3
task3 got item: 2
task3 finished processing item: 2
task3 got item: 2
task2 finished processing item: 4
task2 got item: 2
task1 finished processing item: 5
task3 finished processing item: 2
task2 finished processing item: 2
于 2016-08-30T13:10:51.490 回答
0

请查看Web Crawler 示例

它使用asyncio.JoinableQueue队列来存储获取任务的 url,但也展示了许多有用的技术。

于 2015-01-24T23:20:37.343 回答