1

我想区分一列,但基于我的数据框的另一列的值,该列指示该步骤。例如:

山口 step_diff
0 3
13 3
28 3
45 3
45 3
45 1
50 1

输出应该是:

山口 step_diff col_dif
0 3
13 3
28 3
45 3 45
45 3 32
45 1 0
50 1 5
4

2 回答 2

4

你可以试试这个:

df['col_diff'] = df['col'] - df.reindex(df.index - df['step_diff'])['col'].to_numpy()

输出:

   col  step_diff  col_diff
0    0          3       NaN
1   13          3       NaN
2   28          3       NaN
3   45          3      45.0
4   45          3      32.0
5   45          1       0.0
6   50          1       5.0

细节:

让我们使用reindex当前索引和“step_diff”之间的差异来创建一个“移位”“col”数组以从当前“col”中减去。

使用时间序列:

d = {'col': {Timestamp('2021-01-10 00:00:00', freq='D'): 0,
  Timestamp('2021-01-11 00:00:00', freq='D'): 13,
  Timestamp('2021-01-12 00:00:00', freq='D'): 28,
  Timestamp('2021-01-13 00:00:00', freq='D'): 45,
  Timestamp('2021-01-14 00:00:00', freq='D'): 45,
  Timestamp('2021-01-15 00:00:00', freq='D'): 45,
  Timestamp('2021-01-16 00:00:00', freq='D'): 50},
 'step_diff': {Timestamp('2021-01-10 00:00:00', freq='D'): 3,
  Timestamp('2021-01-11 00:00:00', freq='D'): 3,
  Timestamp('2021-01-12 00:00:00', freq='D'): 3,
  Timestamp('2021-01-13 00:00:00', freq='D'): 3,
  Timestamp('2021-01-14 00:00:00', freq='D'): 3,
  Timestamp('2021-01-15 00:00:00', freq='D'): 1,
  Timestamp('2021-01-16 00:00:00', freq='D'): 1}}

df = pd.DataFrame(d)

输入df,

            col  step_diff
2021-01-10    0          3
2021-01-11   13          3
2021-01-12   28          3
2021-01-13   45          3
2021-01-14   45          3
2021-01-15   45          1
2021-01-16   50          1

计算 col_diff,

df["col_diff"] = (
    df["col"]
    - df.reindex(df.index - pd.to_timedelta(df["step_diff"], unit="d"))["col"]
    .to_numpy()
)


df

输出:

            col  step_diff  col_diff
2021-01-10    0          3       NaN
2021-01-11   13          3       NaN
2021-01-12   28          3       NaN
2021-01-13   45          3      45.0
2021-01-14   45          3      32.0
2021-01-15   45          1       0.0
2021-01-16   50          1       5.0
于 2021-04-08T12:51:14.990 回答
2

利用:

df['col_diff'] = ([df.col.iloc[pos] - df.col.iloc[pos - step] 
                   if pos - step >=0 else np.nan 
                   for pos, step in enumerate(df.step_diff)])

输出

>>> df
   col  step_diff  col_diff
0    0          3       NaN
1   13          3       NaN
2   28          3       NaN
3   45          3      45.0
4   45          3      32.0
5   45          1       0.0
6   50          1       5.0
于 2021-04-08T12:27:39.920 回答