简单的问题:我读过的所有教程都向您展示了如何使用 ipython.parallel 或多处理将并行计算的结果输出到列表(或最好是字典)。
您能否指出一个使用任一库将计算结果输出到共享熊猫数据框的简单示例?
http://gouthamanbalaraman.com/blog/distributed-processing-pandas.html - 本教程向您展示如何读取输入数据帧(下面的代码),但是我将如何将 4 个并行计算的结果输出到一个数据帧?
import pandas as pd
import multiprocessing as mp
LARGE_FILE = "D:\\my_large_file.txt"
CHUNKSIZE = 100000 # processing 100,000 rows at a time
def process_frame(df):
# process data frame
return len(df)
if __name__ == '__main__':
reader = pd.read_table(LARGE_FILE, chunksize=CHUNKSIZE)
pool = mp.Pool(4) # use 4 processes
funclist = []
for df in reader:
# process each data frame
f = pool.apply_async(process_frame,[df])
funclist.append(f)
result = 0
for f in funclist:
result += f.get(timeout=10) # timeout in 10 seconds
print "There are %d rows of data"%(result)