2

我有两个数据框。这里是dwpjp.head()

jp_number
0 25146315052147720191
1 57225427599900052634
2 86076681691411639833
3 50491824499499656478
4 95588382889227620465

ct_data.head()

imjp_number imct_id
0 23605308039805192764 x1E5e3ukRyEFRT6SUAF6lg|d543d3d064da465b8576d87
1 57225427599900052634 aa0d2dac654d4154bf7c09f73faeaf62|-vf6738ee3bed
2 53733358271401869469 6FfHZRoiWs2VO02Pruk07A|__g3d877adf9d154637be26
3 50491824499499656478 __gbe204670ca784a01b7207b42a7e5a5d3|54e2c39cd3
4 82143248133286027306 __g1114a30c6ea548a2a83d5a51718ff0fd|773840905c

我想要两个新的数据框cct_data,并且dct_data来自ct_data. 如果数据帧中存在 ,则数据ct_data帧应在条件下拆分,然后放入,否则放入.jp_numberdwbjpcct_datadct_data

我试过这个常见的jp_number存在dwpjp

cct_data = ct_data[ct_data.isin(dwpjp).any(1).values]

而对于另一个我否定了如下条件:

dct_data = ct_data[~[ct_data.isin(dwpjp).any(1).values]]

但结果没有得到如下。

cct_data

imjp_number imct_id
0 57225427599900052634 aa0d2dac654d4154bf7c09f73faeaf62|-vf6738ee3bed
1 50491824499499656478 __gbe204670ca784a01b7207b42a7e5a5d3|54e2c39cd3

dct_data

imjp_number imct_id
0 23605308039805192764 x1E5e3ukRyEFRT6SUAF6lg|d543d3d064da465b8576d87
1 53733358271401869469 6FfHZRoiWs2VO02Pruk07A|__g3d877adf9d154637be26
2 82143248133286027306 __g1114a30c6ea548a2a83d5a51718ff0fd|773840905c

注意:jpnumber=imjp_number

4

2 回答 2

3

请注意以下事项:

  1. isin想要值,但它被赋予了整个数据框:更改.isin(dwpjp).isin(dwpjp.jp_number)
  2. 在预先编辑的问题中,每一行dwpjp实际上是一个包含 1 个值的列表,而不仅仅是 1 个值。如果确实如此,那么.isin(dwpjp.jp_number)实际上需要另一个步骤:将值分解为.isin(dwpjp.jp_number.explode())
  3. 您的否定被错误地应用于列表:更改~[ct_data...]~ct_data...

有了这些修复,它就在我的身边工作:

cct_data = ct_data[ct_data.isin(dwpjp.jp_number.explode()).any(1).values]
imjp_number imct_id
1 57225427599900052634 aa0d2dac654d4154bf7c09f73faeaf62|-vf6738ee3bed
3 50491824499499656478 __gbe204670ca784a01b7207b42a7e5a5d3|54e2c39cd3
dct_data = ct_data[~ct_data.isin(dwpjp.jp_number.explode()).any(1).values]
imjp_number imct_id
0 23605308039805192764 x1E5e3ukRyEFRT6SUAF6lg|d543d3d064da465b8576d87
2 53733358271401869469 6FfHZRoiWs2VO02Pruk07A|__g3d877adf9d154637be26
4 82143248133286027306 __g1114a30c6ea548a2a83d5a51718ff0fd|773840905c
于 2021-03-05T04:26:59.880 回答
0

修改你的公式如下

cct_data = ct_data[ct_data.imjp_number.isin(dwpjp.jp_number)]

dct_data = ct_data[~ct_data.imjp_number.isin(dwpjp.jp_number)]
于 2021-03-05T04:27:50.100 回答