1

我有一个示例数据帧1

date           username         cities
2021-03-01     K John           New york
2021-03-01     K John           LA
2021-03-02     Ken Miles        Florida
2021-03-02     Ken Miles        LA

dataframe2 包含

date          username        planned_cities 
2021-03-01    K John             Alabama
2021-03-02    K John             LA
2021-03-02    Ken Miles          Florida
2021-03-02    Ken Miles          California

预期结果(仅考虑date username,删除不在 df1 中的列)

date         username        planned_cities
2021-03-01    K John             Alabama
2021-03-02    Ken Miles          Florida
2021-03-02    Ken Miles          California

由于2021-03-02 K Johndf1 的记录中没有,它被删除了。我怎么能做到这一点?

4

2 回答 2

2

您可以使用Index.isin您感兴趣的列,然后使用布尔索引:

cols = ['date','username']
idx1 = pd.MultiIndex.from_frame(df1[cols])
idx2 = pd.MultiIndex.from_frame(df2[cols])
out = df2[idx2.isin(idx1)]

       date   username planned_cities
  2021-03-01     K John        Alabama
  2021-03-02  Ken Miles        Florida
  2021-03-02  Ken Miles     California
于 2021-03-05T18:21:11.753 回答
2

使用内部merge删除重复项,以确保不会增长左侧 DataFrame。

df2.merge(df1[['date', 'username']].drop_duplicates())

         date   username planned_cities
0  2021-03-01     K John        Alabama
1  2021-03-02  Ken Miles        Florida
2  2021-03-02  Ken Miles     California
于 2021-03-05T18:28:25.420 回答