1

我正在学习python中的异步编程,并且知道asyncio是我们应该使用的最新包,所以我尝试编写一个简单的脚本来异步生成一些文件(我使用线程来并行生成文件,并且效果很好)。由于在生成文件时写入 IO 花费大部分时间,所以这是我的脚本。

版本

(py37) C:\Users\Hong\Desktop>
(py37) C:\Users\Hong\Desktop>pip freeze
aiofiles==0.4.0
asn1crypto==0.24.0
certifi==2018.8.24
cffi==1.11.5
chardet==3.0.4
cryptography==2.3.1
idna==2.7
pycparser==2.18
pyOpenSSL==18.0.0
PySocks==1.6.8
requests==2.19.1
six==1.11.0
urllib3==1.23
win-inet-pton==1.0.1
wincertstore==0.2

(py37) C:\Users\Hong\Desktop>python --version
Python 3.7.0

(py37) C:\Users\Hong\Desktop>

异步方式

import os
import asyncio
import aiofiles
import time
import datetime
import urllib

async def produce_content(c):
    return c*1000

async def create_file(file_name):
    tmp_file = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'tmp.txt')
    try:
        async with aiofiles.open(tmp_file, mode='r') as rf:
            print('--> read start', datetime.datetime.now(), file_name)
            content = await rf.read()
            print('--> read end', datetime.datetime.now(), file_name)
            content = await produce_content(content)
            async with aiofiles.open(file_name, mode='w') as wf:
                print('--> write start', datetime.datetime.now(), file_name)
                await wf.write(content)
                await wf.flush()
                print('--> write end', datetime.datetime.now(), file_name)
    except Exception as e:
        print(e)
        raise e
    #return file_name

id = 0
async def my_action(file_name):
    global id
    id += 1
    local_id = id
    print('start to run %s'%local_id, datetime.datetime.now())
    await create_file(file_name)
    print('end to run %s'%local_id, datetime.datetime.now())

def run():
    files = [
        os.path.join(os.path.dirname(os.path.abspath(__file__)), 'f%s.txt'%i) for i in range(0,3)
    ]

    start_ts = datetime.datetime.now()
    print('start', start_ts)

    loop = asyncio.get_event_loop()
    tasks = [asyncio.ensure_future(my_action(f)) for f in files]
    try:
        loop.run_until_complete(asyncio.wait(tasks))
    finally:
        loop.close()

    end_ts = datetime.datetime.now()
    print('end', end_ts)
    print('time elapse', end_ts-start_ts)


if __name__=='__main__':
    run()

在我的示例中,tmp.txt 是一个大小为 240K 的文件,我使用它作为基础并创建比它大 1000 倍的目标文件。为了比较异步方式和同步方式的时间成本,这里是同步方式,将create_file body替换为下面的(只需使用常规方法而不是aiofiles)

async def create_file(file_name):
    tmp_file = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'tmp.txt')
    try:
        with open(tmp_file, mode='r') as rf:
            print('--> read start', datetime.datetime.now(), file_name)
            content = rf.read()
            print('--> read end', datetime.datetime.now(), file_name)
            content = produce_content(content)
            with open(file_name, mode='w') as wf:
                print('--> write start', datetime.datetime.now(), file_name)
                wf.write(content)
                print('--> write end', datetime.datetime.now(), file_name)
    except Exception as e:
        print(e)
        raise e

有结果

(py37) C:\Users\Hong\Desktop>python non_async.py
start 2018-09-16 22:33:12.929901
start to run 1 2018-09-16 22:33:12.929901
--> read start 2018-09-16 22:33:12.929901 C:\Users\Hong\Desktop\f0.txt
--> read end 2018-09-16 22:33:12.945520 C:\Users\Hong\Desktop\f0.txt
--> write start 2018-09-16 22:33:13.531200 C:\Users\Hong\Desktop\f0.txt
--> write end 2018-09-16 22:33:19.701563 C:\Users\Hong\Desktop\f0.txt
end to run 1 2018-09-16 22:33:19.831177
start to run 2 2018-09-16 22:33:19.831177
--> read start 2018-09-16 22:33:19.831177 C:\Users\Hong\Desktop\f1.txt
--> read end 2018-09-16 22:33:19.846803 C:\Users\Hong\Desktop\f1.txt
--> write start 2018-09-16 22:33:20.483649 C:\Users\Hong\Desktop\f1.txt
--> write end 2018-09-16 22:33:26.917791 C:\Users\Hong\Desktop\f1.txt
end to run 2 2018-09-16 22:33:27.073904
start to run 3 2018-09-16 22:33:27.073904
--> read start 2018-09-16 22:33:27.075903 C:\Users\Hong\Desktop\f2.txt
--> read end 2018-09-16 22:33:27.085896 C:\Users\Hong\Desktop\f2.txt
--> write start 2018-09-16 22:33:27.807891 C:\Users\Hong\Desktop\f2.txt
--> write end 2018-09-16 22:33:34.627992 C:\Users\Hong\Desktop\f2.txt
end to run 3 2018-09-16 22:33:34.746507
end 2018-09-16 22:33:34.762129
time elapse 0:00:21.832228

(py37) C:\Users\Hong\Desktop>
(py37) C:\Users\Hong\Desktop>
(py37) C:\Users\Hong\Desktop>
(py37) C:\Users\Hong\Desktop>python async.py
start 2018-09-16 22:33:50.945612
start to run 1 2018-09-16 22:33:50.948609
start to run 2 2018-09-16 22:33:50.953824
start to run 3 2018-09-16 22:33:50.953824
--> read start 2018-09-16 22:33:50.953824 C:\Users\Hong\Desktop\f0.txt
--> read start 2018-09-16 22:33:50.953824 C:\Users\Hong\Desktop\f1.txt
--> read start 2018-09-16 22:33:50.969449 C:\Users\Hong\Desktop\f2.txt
--> read end 2018-09-16 22:33:50.985078 C:\Users\Hong\Desktop\f0.txt
--> read end 2018-09-16 22:33:51.525238 C:\Users\Hong\Desktop\f1.txt
--> read end 2018-09-16 22:33:52.057857 C:\Users\Hong\Desktop\f2.txt
--> write start 2018-09-16 22:33:52.643887 C:\Users\Hong\Desktop\f0.txt
--> write start 2018-09-16 22:33:57.036816 C:\Users\Hong\Desktop\f1.txt
--> write start 2018-09-16 22:34:01.509756 C:\Users\Hong\Desktop\f2.txt
--> write end 2018-09-16 22:34:05.952100 C:\Users\Hong\Desktop\f0.txt
--> write end 2018-09-16 22:34:05.952100 C:\Users\Hong\Desktop\f1.txt
end to run 1 2018-09-16 22:34:06.105765
end to run 2 2018-09-16 22:34:06.206030
--> write end 2018-09-16 22:34:07.393667 C:\Users\Hong\Desktop\f2.txt
end to run 3 2018-09-16 22:34:07.525176
end 2018-09-16 22:34:07.525176
time elapse 0:00:16.579564

(py37) C:\Users\Hong\Desktop>

异步方法确实比同步方法运行得更快(16.6s vs 21.8s),但我的期望是异步应该运行得更快......当我们查看日志时,我们可以看到读取 tmp 文件实际上非常接近。

--> read end 2018-09-16 22:33:50.985078 C:\Users\Hong\Desktop\f0.txt
--> read end 2018-09-16 22:33:51.525238 C:\Users\Hong\Desktop\f1.txt
--> read end 2018-09-16 22:33:52.057857 C:\Users\Hong\Desktop\f2.txt

但是写开始没有关闭

--> write start 2018-09-16 22:33:52.643887 C:\Users\Hong\Desktop\f0.txt
--> write start 2018-09-16 22:33:57.036816 C:\Users\Hong\Desktop\f1.txt
--> write start 2018-09-16 22:34:01.509756 C:\Users\Hong\Desktop\f2.txt

我期望的是每个任务的“写开始”应该非常接近“读结束”,因为生产内容应该只需要很少的时间,但为什么每个任务的“写开始”如此不同?

谢谢,

4

0 回答 0