python - Python Pandas - 根据先前获得的子集从 DataFrame 中删除行

Question

我正在运行安装Python 2.7的Pandas 0.11.0库。

我一直在寻找这个问题的答案，所以我希望比我更有经验的人有一个解决方案。

假设我在 df1 中的数据如下所示：

df1=

  zip  x  y  access
  123  1  1    4
  123  1  1    6
  133  1  2    3
  145  2  2    3
  167  3  1    1
  167  3  1    2

例如，使用，df2 = df1[df1['zip'] == 123]然后df2 = df2.join(df1[df1['zip'] == 133])我得到以下数据子集：

df2=

 zip  x  y  access
 123  1  1    4
 123  1  1    6
 133  1  2    3

我想做的是：

1）从df1定义/连接的行中删除行df2

或者

2）df2创建后，删除由以下组成df1的行（差异？）df2

希望所有这些都是有道理的。如果需要更多信息，请告诉我。

编辑：

理想情况下，将创建如下所示的第三个数据框：

df2=

 zip  x  y  access
 145  2  2    3
 167  3  1    1
 167  3  1    2

也就是说，一切都来自df1not in df2。谢谢！

score 26 · Accepted Answer

我想到了两个选择。首先，使用isin和面具：

>>> df
   zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2
>>> keep = [123, 133]
>>> df_yes = df[df['zip'].isin(keep)]
>>> df_no = df[~df['zip'].isin(keep)]
>>> df_yes
   zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3
>>> df_no
   zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2

二、使用groupby：

>>> grouped = df.groupby(df['zip'].isin(keep))

然后任何一个

>>> grouped.get_group(True)
   zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3
>>> grouped.get_group(False)
   zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2
>>> [g for k,g in list(grouped)]
[   zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2,    zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3]
>>> dict(list(grouped))
{False:    zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2, True:    zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3}
>>> dict(list(grouped)).values()
[   zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2,    zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3]

哪个最有意义取决于上下文，但我认为你明白了。

python - Python Pandas - 根据先前获得的子集从 DataFrame 中删除行

1 回答 1

Related

Reference