我有一个 MultiIndex ( Name, Date) DataFrame df,我需要对其进行迭代处理Date,以便分配一个基于当前和前一个日期组的值。
AFAIK 处理 DataFrame Groups 的最佳方式是.apply——例如,df.groupby('Date').apply(ifunc).
但是,在前一个组被处理之后,ifunc当需要引用前一个日期组的值时,我怎样才能最好地做到这一点?ifunc
这是一个使用列ifunc操作的示例:df['Dollars', 'Weight', 'Return', 'HaveMax']
# (This might not be great python; coding improvements welcome!)
# Lambda to add "AddDollars" to Names that don't already "HaveMax" "MaxDollars"
def ifunc(group, previous): # Arguments are df groups by Date
group['HaveMax'] = previous['HaveMax']
# Each Name's Dollars changed from the previous Date
avgWeights = group['Weight'].mean()
group['Dollars'] = group['Weight'] * previous['Dollars'] * group['Return'] / avgWeights
# Now add "AddDollars" to Names that were under
group.loc[group['HaveMax'] == False, 'Dollars'] = group[group['HaveMax'] == False]['Dollars'] + AddDollars
# Update HaveMax for any Names that reached MaxDollars on this Date
group.loc[group['HaveMax'] == False, 'HaveMax'] = group[group['HaveMax'] == False]['Dollars'] >= MaxDollars
return group
样本数据:
AddDollars = 1.0
MaxDollars = 10.0
df = pd.DataFrame(data=[('A', '20210101', 9.0, 1.0, 0, False),
('B', '20210101', 5.0, 1.0, 0, False),
('C', '20210101', 5.0, 1.0, 0, True),
('A', '20210102', 0.0, 1.0, 1.0, False),
('B', '20210102', 0.0, 1.0, 1.0, False),
('C', '20210102', 0.0, 1.0, 1.0, False)],
columns=('Name', 'Date', 'Dollars', 'Weight', 'Return', 'HaveMax')).set_index(['Name', 'Date'])
期望的输出:
Dollars Weight Return HaveMax
Name Date
A 20210101 9.0 1.0 0.0 False
B 20210101 5.0 1.0 0.0 False
C 20210101 5.0 1.0 0.0 True
A 20210102 10.0 1.0 1.0 True
B 20210102 6.0 1.0 1.0 False
C 20210102 5.0 1.0 1.0 True