python - pandas 根据多个条件替换 np.nan

Question

我正在尝试向我的 df 添加一列，如果 B <A 返回“是”，如果 B>= A 则返回“否”。但是，如果 A 或 B 包含缺失值，则应返回 np.nan。

因此，我想要的输出看起来像这样：

一种	乙	is_less
np.nan	10	np.nan
10	np.nan	np.nan
1	5	不
5	1	是的

问题：我的代码在需要时不返回 np.nan。

我尝试过的：选项1：

df['is_less'] = np.where (df['B'] < df['A'], "yes", "no")
df['is_less'] = np.where (df['A'] == np.nan, np.nan,  df['is_less'])
df['is_less'] = np.where (df['B'] == np.nan, np.nan,  df['is_less'])

不幸的是，A 或 B 列中的 np.nans 被忽略，导致 'is_lessl.

选项 2：

def reduced_growth(x):
  if (x['A'] == np.nan or x['B']==np.nan):
    return np.nan
  elif (x['B'] < x['A']):
    return "yes"
  elif (x['B'] >= x['A']): 
    return "no"
  else:
    return "0"

#create new feature using function
df['is_less']= df.apply(reduced_growth, axis=1)

应用此函数会产生“yes”、“no”和 0 的混合结果，但不会返回 np.nans。

我该如何解决这个问题？

score 1 · Accepted Answer

您可以在 pandas（自 1.0 起）中使用正确处理缺失值的新 dtype：

df = pd.DataFrame({'a': [1, None, 3, 5], 'b': [2, 1, None, 2]})
df = df.convert_dtypes()
df['is_less'] = df['a'] < df['b']

print(df)

见https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html#missing-data-na

结果：

      a     b  is_less
0     1     2     True
1  <NA>     1     <NA>
2     3  <NA>     <NA>
3     5     2    False

您还可以使用pd.array新的 dtypes 直接创建数据框：

df = pd.DataFrame({
    'a': pd.array([1, None, 3, 5]),
    'b': pd.array([2, 1, None, 2]),
})

df['is_less'] = df['a'] < df['b']
print(df)

      a     b  is_less
0     1     2     True
1  <NA>     1     <NA>
2     3  <NA>     <NA>
3     5     2    False

score 0 · Accepted Answer

您可以使用简单的比较和掩码：

df['is_less'] = df['A'].gt(df['B']).mask(df[['A', 'B']].isna().any(1)).map({True: 'yes', False: 'no'})

输出：

      A     B is_less
0   NaN  10.0     NaN
1  10.0   NaN     NaN
2   1.0   5.0      no
3   5.0   1.0     yes

score 0 · Accepted Answer

您可以使用np.select使用条件列表和相应的值列表来执行此操作：

conditions = [ df['A'].isnull() | df['B'].isnull() , 
              df['B'] < df['A'] , 
              df['B'] >= df['A'] 
            ]
values = [np.nan, "yes", "no"]

df['is_less'] = np.select(conditions, values, "0")

score 0 · Accepted Answer

尝试重写你的np.where陈述：

df['is_less'] = np.where( (df['A'].isnull()) | (df['B'].isnull() ),np.nan, # check if A or B are np.nan
                         np.where(df['B'].ge(df['A']),'no','yes'))        # check if B >= A

印刷：

      A     B is_less
0   NaN  10.0     nan
1  10.0   NaN     nan
2   1.0   5.0      no
3   5.0   1.0     yes

大于或等于

pandas.ge

python - pandas 根据多个条件替换 np.nan

4 回答 4

Related

Reference