python - 如果系列是全南，如何有效地填充na（0），否则剩余的非南条目为零？

Question

鉴于我有一个熊猫系列，如果所有值都是 NaN 或者所有值都是零或 NaN，我想用零填充 NaN。

例如，我想用零填充以下系列中的 NaN。

0       0
1       0
2       NaN
3       NaN
4       NaN
5       NaN
6       NaN
7       NaN
8       NaN

但是，我不想填写以下系列：

0       0
1       0
2       2
3       0
4       NaN
5       NaN
6       NaN
7       NaN
8       NaN

我正在查看文档，似乎我可以使用 pandas.Series.value_counts 来确保值只有 0 和 NaN，然后只需调用 fillna(0)。换句话说，我正在寻找是否设置（s .unique().astype(str)).issubset(['0.0','nan']), THEN fillna(0)，否则不要。

考虑到 pandas 的强大功能，似乎有更好的方法来做到这一点。有没有人有任何建议可以干净有效地做到这一点？

潜在的解决方案感谢 cᴏʟᴅsᴘᴇᴇᴅ

if s.dropna().eq(0).all():
    s = s.fillna(0)

score 8 · Accepted Answer

您可以通过0and isnaif only NaNs 0and then进行比较fillna：

if ((s == 0) | (s.isna())).all():
    s = pd.Series(0, index=s.index)

或比较唯一值：

if pd.Series(s.unique()).fillna(0).eq(0).all():
    s = pd.Series(0, index=s.index)

@cᴏʟᴅsᴘᴇᴇᴅ 解决方案，谢谢 - 比较没有NaNs 的系列与dropna：

 if s.dropna().eq(0).all():
    s = pd.Series(0, index=s.index)

问题的解决方案 - 需要转换为strings，因为与NaNs比较的问题：

if set(s.unique().astype(str)).issubset(['0.0','nan']):

    s = pd.Series(0, index=s.index)

时间：

s = pd.Series(np.random.choice([0,np.nan], size=10000))

In [68]: %timeit ((s == 0) | (s.isna())).all()
The slowest run took 4.85 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 574 µs per loop

In [69]: %timeit pd.Series(s.unique()).fillna(0).eq(0).all()
1000 loops, best of 3: 587 µs per loop

In [70]: %timeit s.dropna().eq(0).all()
The slowest run took 4.65 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 774 µs per loop

In [71]: %timeit set(s.unique().astype(str)).issubset(['0.0','nan'])
The slowest run took 5.78 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 157 µs per loop

score 2 · Accepted Answer

为空值创建掩码。检查掩码的长度是否等于系列的长度（在这种情况下，系列要么全为空值，要么为空），或者非空值是否全为零。如果是这样，请使用系列中的原始索引创建一系列新的零值。

nulls = s.isnull()
if len(nulls) == len(s) or s[~nulls].eq(0).all():
    s = pd.Series(0, index=s.index)

时间安排

%%timeit s_ = pd.concat([s] * 100000)
nulls = s_.isnull()
if len(nulls) == len(s_) or s_[~nulls].eq(0).all():
    s_ = pd.Series(0, index=s_.index)
# 100 loops, best of 3: 2.33 ms per loop

# OP's solution:
%%timeit s_ = pd.concat([s] * 100000)
if s_.dropna().eq(0).all():
    s_ = s_.fillna(0)
# 10 loops, best of 3: 19.7 ms per loop

# @Jezrael's fastest solution:
%%timeit s_ = pd.concat([s] * 100000)
if set(s_.unique().astype(str)).issubset(['0.0','nan']):
    s_ = pd.Series(0, index=s_.index)
# 1000 loops, best of 3: 4.58 ms per loop

python - 如果系列是全南，如何有效地填充na（0），否则剩余的非南条目为零？

2 回答 2

Related

Reference