我想区分一列,但基于我的数据框的另一列的值,该列指示该步骤。例如:
山口 | step_diff |
---|---|
0 | 3 |
13 | 3 |
28 | 3 |
45 | 3 |
45 | 3 |
45 | 1 |
50 | 1 |
输出应该是:
山口 | step_diff | col_dif |
---|---|---|
0 | 3 | 钠 |
13 | 3 | 钠 |
28 | 3 | 钠 |
45 | 3 | 45 |
45 | 3 | 32 |
45 | 1 | 0 |
50 | 1 | 5 |
你可以试试这个:
df['col_diff'] = df['col'] - df.reindex(df.index - df['step_diff'])['col'].to_numpy()
输出:
col step_diff col_diff
0 0 3 NaN
1 13 3 NaN
2 28 3 NaN
3 45 3 45.0
4 45 3 32.0
5 45 1 0.0
6 50 1 5.0
细节:
让我们使用reindex
当前索引和“step_diff”之间的差异来创建一个“移位”“col”数组以从当前“col”中减去。
d = {'col': {Timestamp('2021-01-10 00:00:00', freq='D'): 0,
Timestamp('2021-01-11 00:00:00', freq='D'): 13,
Timestamp('2021-01-12 00:00:00', freq='D'): 28,
Timestamp('2021-01-13 00:00:00', freq='D'): 45,
Timestamp('2021-01-14 00:00:00', freq='D'): 45,
Timestamp('2021-01-15 00:00:00', freq='D'): 45,
Timestamp('2021-01-16 00:00:00', freq='D'): 50},
'step_diff': {Timestamp('2021-01-10 00:00:00', freq='D'): 3,
Timestamp('2021-01-11 00:00:00', freq='D'): 3,
Timestamp('2021-01-12 00:00:00', freq='D'): 3,
Timestamp('2021-01-13 00:00:00', freq='D'): 3,
Timestamp('2021-01-14 00:00:00', freq='D'): 3,
Timestamp('2021-01-15 00:00:00', freq='D'): 1,
Timestamp('2021-01-16 00:00:00', freq='D'): 1}}
df = pd.DataFrame(d)
输入df,
col step_diff
2021-01-10 0 3
2021-01-11 13 3
2021-01-12 28 3
2021-01-13 45 3
2021-01-14 45 3
2021-01-15 45 1
2021-01-16 50 1
计算 col_diff,
df["col_diff"] = (
df["col"]
- df.reindex(df.index - pd.to_timedelta(df["step_diff"], unit="d"))["col"]
.to_numpy()
)
df
输出:
col step_diff col_diff
2021-01-10 0 3 NaN
2021-01-11 13 3 NaN
2021-01-12 28 3 NaN
2021-01-13 45 3 45.0
2021-01-14 45 3 32.0
2021-01-15 45 1 0.0
2021-01-16 50 1 5.0
利用:
df['col_diff'] = ([df.col.iloc[pos] - df.col.iloc[pos - step]
if pos - step >=0 else np.nan
for pos, step in enumerate(df.step_diff)])
输出
>>> df
col step_diff col_diff
0 0 3 NaN
1 13 3 NaN
2 28 3 NaN
3 45 3 45.0
4 45 3 32.0
5 45 1 0.0
6 50 1 5.0