我正在学习python中的异步编程,并且知道asyncio是我们应该使用的最新包,所以我尝试编写一个简单的脚本来异步生成一些文件(我使用线程来并行生成文件,并且效果很好)。由于在生成文件时写入 IO 花费大部分时间,所以这是我的脚本。
版本
(py37) C:\Users\Hong\Desktop>
(py37) C:\Users\Hong\Desktop>pip freeze
aiofiles==0.4.0
asn1crypto==0.24.0
certifi==2018.8.24
cffi==1.11.5
chardet==3.0.4
cryptography==2.3.1
idna==2.7
pycparser==2.18
pyOpenSSL==18.0.0
PySocks==1.6.8
requests==2.19.1
six==1.11.0
urllib3==1.23
win-inet-pton==1.0.1
wincertstore==0.2
(py37) C:\Users\Hong\Desktop>python --version
Python 3.7.0
(py37) C:\Users\Hong\Desktop>
异步方式
import os
import asyncio
import aiofiles
import time
import datetime
import urllib
async def produce_content(c):
return c*1000
async def create_file(file_name):
tmp_file = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'tmp.txt')
try:
async with aiofiles.open(tmp_file, mode='r') as rf:
print('--> read start', datetime.datetime.now(), file_name)
content = await rf.read()
print('--> read end', datetime.datetime.now(), file_name)
content = await produce_content(content)
async with aiofiles.open(file_name, mode='w') as wf:
print('--> write start', datetime.datetime.now(), file_name)
await wf.write(content)
await wf.flush()
print('--> write end', datetime.datetime.now(), file_name)
except Exception as e:
print(e)
raise e
#return file_name
id = 0
async def my_action(file_name):
global id
id += 1
local_id = id
print('start to run %s'%local_id, datetime.datetime.now())
await create_file(file_name)
print('end to run %s'%local_id, datetime.datetime.now())
def run():
files = [
os.path.join(os.path.dirname(os.path.abspath(__file__)), 'f%s.txt'%i) for i in range(0,3)
]
start_ts = datetime.datetime.now()
print('start', start_ts)
loop = asyncio.get_event_loop()
tasks = [asyncio.ensure_future(my_action(f)) for f in files]
try:
loop.run_until_complete(asyncio.wait(tasks))
finally:
loop.close()
end_ts = datetime.datetime.now()
print('end', end_ts)
print('time elapse', end_ts-start_ts)
if __name__=='__main__':
run()
在我的示例中,tmp.txt 是一个大小为 240K 的文件,我使用它作为基础并创建比它大 1000 倍的目标文件。为了比较异步方式和同步方式的时间成本,这里是同步方式,将create_file body替换为下面的(只需使用常规方法而不是aiofiles)
async def create_file(file_name):
tmp_file = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'tmp.txt')
try:
with open(tmp_file, mode='r') as rf:
print('--> read start', datetime.datetime.now(), file_name)
content = rf.read()
print('--> read end', datetime.datetime.now(), file_name)
content = produce_content(content)
with open(file_name, mode='w') as wf:
print('--> write start', datetime.datetime.now(), file_name)
wf.write(content)
print('--> write end', datetime.datetime.now(), file_name)
except Exception as e:
print(e)
raise e
有结果
(py37) C:\Users\Hong\Desktop>python non_async.py
start 2018-09-16 22:33:12.929901
start to run 1 2018-09-16 22:33:12.929901
--> read start 2018-09-16 22:33:12.929901 C:\Users\Hong\Desktop\f0.txt
--> read end 2018-09-16 22:33:12.945520 C:\Users\Hong\Desktop\f0.txt
--> write start 2018-09-16 22:33:13.531200 C:\Users\Hong\Desktop\f0.txt
--> write end 2018-09-16 22:33:19.701563 C:\Users\Hong\Desktop\f0.txt
end to run 1 2018-09-16 22:33:19.831177
start to run 2 2018-09-16 22:33:19.831177
--> read start 2018-09-16 22:33:19.831177 C:\Users\Hong\Desktop\f1.txt
--> read end 2018-09-16 22:33:19.846803 C:\Users\Hong\Desktop\f1.txt
--> write start 2018-09-16 22:33:20.483649 C:\Users\Hong\Desktop\f1.txt
--> write end 2018-09-16 22:33:26.917791 C:\Users\Hong\Desktop\f1.txt
end to run 2 2018-09-16 22:33:27.073904
start to run 3 2018-09-16 22:33:27.073904
--> read start 2018-09-16 22:33:27.075903 C:\Users\Hong\Desktop\f2.txt
--> read end 2018-09-16 22:33:27.085896 C:\Users\Hong\Desktop\f2.txt
--> write start 2018-09-16 22:33:27.807891 C:\Users\Hong\Desktop\f2.txt
--> write end 2018-09-16 22:33:34.627992 C:\Users\Hong\Desktop\f2.txt
end to run 3 2018-09-16 22:33:34.746507
end 2018-09-16 22:33:34.762129
time elapse 0:00:21.832228
(py37) C:\Users\Hong\Desktop>
(py37) C:\Users\Hong\Desktop>
(py37) C:\Users\Hong\Desktop>
(py37) C:\Users\Hong\Desktop>python async.py
start 2018-09-16 22:33:50.945612
start to run 1 2018-09-16 22:33:50.948609
start to run 2 2018-09-16 22:33:50.953824
start to run 3 2018-09-16 22:33:50.953824
--> read start 2018-09-16 22:33:50.953824 C:\Users\Hong\Desktop\f0.txt
--> read start 2018-09-16 22:33:50.953824 C:\Users\Hong\Desktop\f1.txt
--> read start 2018-09-16 22:33:50.969449 C:\Users\Hong\Desktop\f2.txt
--> read end 2018-09-16 22:33:50.985078 C:\Users\Hong\Desktop\f0.txt
--> read end 2018-09-16 22:33:51.525238 C:\Users\Hong\Desktop\f1.txt
--> read end 2018-09-16 22:33:52.057857 C:\Users\Hong\Desktop\f2.txt
--> write start 2018-09-16 22:33:52.643887 C:\Users\Hong\Desktop\f0.txt
--> write start 2018-09-16 22:33:57.036816 C:\Users\Hong\Desktop\f1.txt
--> write start 2018-09-16 22:34:01.509756 C:\Users\Hong\Desktop\f2.txt
--> write end 2018-09-16 22:34:05.952100 C:\Users\Hong\Desktop\f0.txt
--> write end 2018-09-16 22:34:05.952100 C:\Users\Hong\Desktop\f1.txt
end to run 1 2018-09-16 22:34:06.105765
end to run 2 2018-09-16 22:34:06.206030
--> write end 2018-09-16 22:34:07.393667 C:\Users\Hong\Desktop\f2.txt
end to run 3 2018-09-16 22:34:07.525176
end 2018-09-16 22:34:07.525176
time elapse 0:00:16.579564
(py37) C:\Users\Hong\Desktop>
异步方法确实比同步方法运行得更快(16.6s vs 21.8s),但我的期望是异步应该运行得更快......当我们查看日志时,我们可以看到读取 tmp 文件实际上非常接近。
--> read end 2018-09-16 22:33:50.985078 C:\Users\Hong\Desktop\f0.txt
--> read end 2018-09-16 22:33:51.525238 C:\Users\Hong\Desktop\f1.txt
--> read end 2018-09-16 22:33:52.057857 C:\Users\Hong\Desktop\f2.txt
但是写开始没有关闭
--> write start 2018-09-16 22:33:52.643887 C:\Users\Hong\Desktop\f0.txt
--> write start 2018-09-16 22:33:57.036816 C:\Users\Hong\Desktop\f1.txt
--> write start 2018-09-16 22:34:01.509756 C:\Users\Hong\Desktop\f2.txt
我期望的是每个任务的“写开始”应该非常接近“读结束”,因为生产内容应该只需要很少的时间,但为什么每个任务的“写开始”如此不同?
谢谢,
洪