如何在 DASK 上获得与 pandas 相同的结果?
目标是为每个组设置一个统一的时间间隔,复制最后一个值,直到我们有一个新值。
import pandas as pd import numpy as np import datetime
data=pd.DataFrame([["AAAA","2020-01-15",2],
["AAAA","2020-02-15",9],
["AAAA","2020-02-20",2],
["AAAA","2020-02-25",9],
["AAAA","2020-04-18",2],
["BBBB","2020-01-01",5],
["BBBB","2020-02-15",5],
["BBBB","2020-02-20",4],
["BBBB","2020-02-25",4],
["BBBB","2020-04-15",2],
["CCCC","2020-01-01",9],
["CCCC","2020-02-15",5],
["CCCC","2020-03-20",7],
["CCCC","2020-04-25",4],
["CCCC","2020-05-15",2]])
data.columns=['Asset','Date','P']
data['Date']=pd.to_datetime(data['Date'])
data.index=data['Date'].values
temp=data.groupby('Asset').resample('2D').pad()
temp
** 这只是一个例子,现实世界的应用真的很大。
谢谢!