我将 Python3 与 pandas 版本“0.19.2”一起使用。
我有一个熊猫df,如下所示:
chat_id line
1 'Hi.'
1 'Hi, how are you?.'
1 'I'm well, thanks.'
2 'Is it going to rain?.'
2 'No, I don't think so.'
我想按 'chat_id' 分组,然后在 'line' 上执行滚动总和之类的操作以获得以下信息:
chat_id line conversation
1 'Hi.' 'Hi.'
1 'Hi, how are you?.' 'Hi. Hi, how are you?.'
1 'I'm well, thanks.' 'Hi. Hi, how are you?. I'm well, thanks.'
2 'Is it going to rain?.' 'Is it going to rain?.'
2 'No, I don't think so.' 'Is it going to rain?. No, I don't think so.'
我相信 df.groupby('chat_id')['line'].cumsum() 只适用于数字列。
我也试过 df.groupby(by=['chat_id'], as_index=False)['line'].apply(list) 来获取完整对话中所有行的列表,但后来我想不通了解如何解压该列表以创建“滚动总和”风格的对话栏。