python - 使用python根据第一天的值查找接下来几天重复id的记录

问问题 2021-03-09T03:57:14.657

46 次

我正在尝试根据第一天的值查找重复的 id。

例如，我有 4 天的记录：

import pandas as pd
df = pd.DataFrame({'id':['1','2','5','4','2','3','5','4','2','5','2','3','3','4'], 
                   'class':['1','1','0','0','1','1','1','1','0','0','0','0','1','1'],
                   'day':['1','1','1','1','1','1','1','2','2','3','3','3','4','4']})
df

鉴于上述数据，我想找到符合以下条件的记录：（1）day=1 中所有 class = 0 的记录；(2)在第2、3、4天，如果id满足条件(1)--第1天class=0，保留记录

所以结果应该是：

df = pd.DataFrame({'id':['5','4','4','5','4'], 
                   'class':['0','0','1','0','1'],
                   'day':['1','1','2','3','4']})
df

这种方法可以工作：

# 1. find unique id in day 1 that meet condition (1)
df1 = df[(df['day']=='1') & (df['class']=='0')] 

df1_id = df1.id.unique()

# 2. create a new dataframe for day 2,3,4 
df234=df[df['day']!='1'] 

# 3. create a new dataframe for day2,3,4 that contains the id in the unique list 
df234_new = df234[df234['id'].isin(df1_id)]

#4. append df234_new at the end of df1
df_new = df1.append(df234_new) 

df_new

但是我的完整数据集包含更多的列和行，使用上述方法听起来太乏味了。有谁知道如何更有效地做到这一点？非常感谢！！

python - 使用python根据第一天的值查找接下来几天重复id的记录

0 回答 0

Related

Reference