你可以滚动你自己的函数,以一种简洁的矢量化方法来解决这个问题:
def na_randomfill(series):
na_mask = pd.isnull(series) # boolean mask for null values
n_null = na_mask.sum() # number of nulls in the Series
if n_null == 0:
return series # if there are no nulls, no need to resample
# Randomly sample the non-null values from our series
# only sample this Series as many times as we have nulls
fill_values = series[~na_mask].sample(n=n_null, replace=True, random_state=0)
# This ensures our new values will replace NaNs in the correct locations
fill_values.index = series.index[na_mask]
return series.fillna(fill_values)
此解决方案一次适用于 1 系列,可以这样调用:
out = na_randomfill(df["Apple_cat"])
print(out)
0 cat_1
1 cat_2
2 cat_3
3 cat_3
4 cat_2
5 cat_2
Name: Apple_cat, dtype: object
或者,您可以使用 apply 在每个列上调用它。请注意,由于if
我们函数中的语句,我们不需要在调用之前提前指定包含空的列apply
:
out = df.apply(na_randomfill)
print(out)
ClientId Apple_cat Region Price
0 21 cat_1 Reg_A 5
1 15 cat_2 Reg_A 6
2 6 cat_3 Reg_B 7
3 91 cat_3 Reg_A 3
4 45 cat_2 Reg_C 7
5 89 cat_2 Reg_C 6