python - Python dfply：无法在多个条件下屏蔽

Question

我是一名 R 用户，正在学习如何使用 Python's dfply，Python 相当于 R's dplyr。我的问题：在 dfply 中，我无法屏蔽管道中的多个条件。我寻求涉及 dfply 管道而不是多行子集的解决方案。

我的代码：

# Import
import pandas as pd
import numpy as np
from dfply import *

# Create data frame and mask it
df  = pd.DataFrame({'a':[np.nan,2,3,4,5],'b':[6,7,8,9,np.nan],'c':[5,4,3,2,1]})
df2 = (df >>
        mask((X.a.isnull()) | ~(X.b.isnull())))
print(df)
print(df2)

这是原始数据框df：

       a    b    c
    0  NaN  6.0  5
    1  2.0  7.0  4
    2  3.0  8.0  3
    3  4.0  9.0  2
    4  5.0  NaN  1

这是管道掩码 df2 的结果：

         a    b    c
      0  NaN  6.0  5
      4  5.0  NaN  1

但是，我希望这样：

         a    b    c
      0  NaN  6.0  5
      1  2.0  7.0  4
      2  3.0  8.0  3
      3  4.0  9.0  2

为什么不用“|” 和“~”运算符会导致“a”列是NaN或“b”列不是NaN 的行？

顺便说一句，我也试过np.logical_or()：

df  = pd.DataFrame({'a':[np.nan,2,3,4,5],'b':[6,7,8,9,np.nan],'c':[5,4,3,2,1]})
df2 = (df >>
        mask(np.logical_or(X.a.isnull(),~X.b.isnull())))
print(df)
print(df2)

但这导致了错误：

mask(np.logical_or(X.a.isnull(),~X.b.isnull())))
ValueError: invalid __array_struct__

score 0 · Accepted Answer

编辑：将第二个条件调整为“df.col2.notnull()”。不知道为什么在管道之后忽略波浪号。

df  = pd.DataFrame({'a':[np.nan,2,3,4,5],'b':[6,7,8,9,np.nan],'c':[5,4,3,2,1]})
df2 = (df >> mask((X.a.isnull()) | (X.b.notnull())))

print(df2)

     a    b  c
0  NaN  6.0  5
1  2.0  7.0  4
2  3.0  8.0  3
3  4.0  9.0  2

score 0 · Accepted Answer

0

怎么样filter_by？

df >> filter_by((X.a.isnull()) | (X.b.isnull()))

于 2021-04-06T06:18:27.807 回答

python - Python dfply：无法在多个条件下屏蔽

2 回答 2

Related

Reference