2

我尝试了几种方法来用另一行中的值替换一行中的 NaN,但它们都没有按预期工作。这是我的数据框:

test = pd.DataFrame(
    {
        "a": [1, 2, 3, 4, 5], 
        "b": [4, 5, 6, np.nan, np.nan], 
        "c": [7, 8, 9, np.nan, np.nan], 
        "d": [7, 8, 9, np.nan, np.nan]
     }
)

   a    b    c    d
0  1   4.0  7.0  7.0
1  2   5.0  8.0  8.0
2  3   6.0  9.0  9.0
3  4   NaN  NaN  NaN
4  5   NaN  NaN  NaN

我需要用第一行的值替换第 4 行中的 NaN,即

   a     b     c     d
0  1   **4.0   7.0   7.0**
1  2    5.0   8.0   8.0
2  3    6.0   9.0   9.0
3  4   **4.0   7.0   7.0**
4  5    NaN   NaN   NaN

第二个问题是如何将一行中的一些/部分值乘以一个数字,例如,当列为 时,我需要将第二个中的值加倍['b', 'c', 'd'],那么结果是:

   a     b     c     d
0  1    4.0   7.0   7.0
1  2   **10.0  16.0  16.0**
2  3    6.0   9.0   9.0
3  4    NaN   NaN   NaN
4  5    NaN   NaN   NaN
4

2 回答 2

2

首先,我建议您阅读有关在Pandas 中索引和选择数据的内容。关于第一个问题,您可以使用.loc()withisnull()对列值执行布尔索引:

mask_nans = test.loc[3,:].isnull()
test.loc[3, mask_nans] = test.loc[0, mask_nans]

并且要将值加倍,您可以直接乘以2切片数据帧,也可以使用.loc()

test.loc[1,'b':] *= 2

   a     b     c     d
0  1   4.0   7.0   7.0
1  2  10.0  16.0  16.0
2  3   6.0   9.0   9.0
3  4   4.0   7.0   7.0
4  5   NaN   NaN   NaN
于 2018-12-24T19:44:58.163 回答
1

Indexing with labels

If you wish to filter by a, and a values are unique, consider making it your index to simplify your logic and make it more efficient:

test = test.set_index('a')
test.loc[4] = test.loc[4].fillna(test.loc[1])
test.loc[2] *= 2

Boolean masks

If a is not unique and Boolean masks are required, you can still use fillna with an additional step::

mask = test['a'].eq(4)
test.loc[mask] = test.loc[mask].fillna(test.loc[test['a'].eq(1).idxmax()])
test.loc[test['a'].eq(2)] *= 2
于 2018-12-24T19:56:28.593 回答