我有两个功能。每个函数都运行一个 for 循环。
def f1(df1, df2):
final_items = []
for ind, row in df1.iterrows():
id = row['Id']
some_num = row['some_num']
timestamp = row['Timestamp']
res = f2(df=df2, id=id, some_num=some_num, timestamp=timestamp))
final_items.append(res)
return final_items
def f2(df, id, some_num, timestamp):
for ind, row in df.iterrows():
filename = row['some_filename']
dfx = reader(key=filename) # User defined; object reader
# Assign variables
st_ID = dfx["Id"]
st_some_num = dfx["some_num"]
st_time_first = dfx['some_first_time_variable']
st_time_last = dfx['some_last_time_variable']
if device_id == st_ID and some_num == st_some_num:
if st_time_first <= timestamp and st_time_last >= timestamp:
return filename
else:
return None
else:
continue
如图所示,第一个函数调用第二个函数。第一个循环发生2000 次,即第一个数据帧中有 2000 行。
第二个函数(从 调用的函数f1()
)运行1000 万次。
我的目标是加速f2()
使用并行处理。我曾尝试使用像 Multiprocessing 和 Ray 这样的 python 包,但我是并行处理领域的新手,并且由于缺乏经验而遇到了很多障碍。
有人可以帮我加快函数速度,从而减少执行 1000 万行的时间吗?