5

我有一个表示对象状态的数组,其中 0 - 对象关闭,1 - 对象打开。

import pandas as pd
import numpy as np

s = [np.nan, 0, np.nan, np.nan, 1, np.nan, np.nan, 0, np.nan, 1, np.nan]
df = pd.DataFrame(s, columns=["s"])
df
      s
0   NaN
1   0.0
2   NaN
3   NaN
4   1.0
5   NaN
6   NaN
7   0.0
8   NaN
9   1.0
10  NaN

我只需要转发其中的 0 值,如下所示。

>>> df_wanted
      s
0   NaN
1   0.0
2   0.0
3   0.0
4   1.0
5   NaN
6   NaN
7   0.0
8   0.0
9   1.0
10  NaN

在这里浏览了类似的问题后,我只是比较了ffill-ed 和bfill-ed 的值并用掩码赋值:

mask = (df.ffill() == 0) & (df.bfill() == 1)
df[mask] = 0
df
      s
0   NaN
1   0.0
2   0.0
3   0.0
4   1.0
5   NaN
6   NaN
7   0.0
8   0.0
9   1.0
10  NaN

但是,如果任何 0 值后面不跟 1,这将无济于事。考虑到这种情况,还有什么更优雅的解决方案?

4

2 回答 2

3

mask = (df.ffill() == 0) should only be suffice to fulfill your usecase.

Firstly, df.ffill will propagate the last valid observation forward. So rows followed by 0 will be filled by 0s, and rows followed by 1 will be filled by 1s. Compare that to 0 to select rows with 0s only and use it as mask to get your final df.

Example: (Added a 0 and few NaNs to the end of your df)

>>> s = [np.nan, 0, np.nan, np.nan, 1, np.nan, np.nan, 0, np.nan, 1, np.nan, np.nan, 0, np.nan, np.nan, np.nan]
>>> df = pd.DataFrame(s, columns=["s"])
>>> df
      s
0   NaN
1   0.0
2   NaN
3   NaN
4   1.0
5   NaN
6   NaN
7   0.0
8   NaN
9   1.0
10  NaN
11  NaN
12  0.0
13  NaN
14  NaN
15  NaN
>>> 
>>> 
>>> df[df.ffill() == 0] = 0
>>> df
      s
0   NaN
1   0.0
2   0.0
3   0.0
4   1.0
5   NaN
6   NaN
7   0.0
8   0.0
9   1.0
10  NaN
11  NaN
12  0.0
13  0.0
14  0.0
15  0.0
于 2021-05-22T10:04:27.673 回答
3

一种方法,也许不是很优雅,但对你有用,就是填充所有东西,然后从中选择你的原始系列是 NaN 并且你的填充系列是 0。

sf = df.ffill().values[:, 0]
desired = np.where(np.isnan(s) & (sf==0), sf, s)

pandas 也有一个where功能,我只是更喜欢 numpy,因为它更通用。

于 2021-05-22T08:23:31.493 回答