1

我的 pandas DataFrame 具有以下当前结构

{
'Temperature': [1,2,3,4,5,6,7,8,9],
'machining': [1,1,1,2,2,2,3,3,3],
'timestamp': [1560770645,1560770646,1560770647,1560770648,1560770649,1560770650,1560770651,1560770652,1560770653]
}

我想添加一个列,其中包含每个加工过程的相对时间,以便在每次“加工”列更改其值时刷新。
因此,所需的结构是:

{
'Temperature': [1,2,3,4,5,6,7,8,9],
'machining': [1,1,1,2,2,2,3,3,3],
'timestamp': [1560770645,1560770646,1560770647,1560770648,1560770649,1560770650,1560770651,1560770652,1560770653]
'timestamp_machining': [1,2,3,1,2,3,1,2,3]
}

我正在努力以一种干净的方式做到这一点:如果需要,如果没有熊猫,任何帮助都将不胜感激。

4

1 回答 1

1

减去由创建的每个组的第一个值GroupBy.transform

#if values are not sorted
df = df.sort_values(['machining','timestamp'])

print (df.groupby('machining')['timestamp'].transform('first'))
0    1560770645
1    1560770645
2    1560770645
3    1560770648
4    1560770648
5    1560770648
6    1560770651
7    1560770651
8    1560770651
Name: timestamp, dtype: int64

df['new'] = df['timestamp'].sub(df.groupby('machining')['timestamp'].transform('first')) + 1
print (df)

   Temperature  machining   timestamp  timestamp_machining  new
0            1          1  1560770645                    1    1
1            2          1  1560770646                    2    2
2            3          1  1560770647                    3    3
3            4          2  1560770648                    1    1
4            5          2  1560770649                    2    2
5            6          2  1560770650                    3    3
6            7          3  1560770651                    1    1
7            8          3  1560770652                    2    2
8            9          3  1560770653                    3    3

如果需要计数器只有GroupBy.cumcount你的朋友:

df['new'] = df.groupby('machining').cumcount() + 1
于 2019-06-17T11:35:26.443 回答