python - 如何用 numpy 的行平均值替换丢失/屏蔽的数据

Question

如何将下面的“b”数组中的缺失值替换为“c”中相应的行平均值？

a=numpy.arange(24).reshape(4,-1)
b=numpy.ma.masked_where(numpy.remainder(a,5)==0,a);b
Out[46]: 
 masked_array(data =
 [[-- 1 2 3 4 --]
 [6 7 8 9 -- 11]
 [12 13 14 -- 16 17]
 [18 19 -- 21 22 23]],
         mask =
 [[ True False False False False  True]
 [False False False False  True False]
 [False False False  True False False]
 [False False  True False False False]],
       fill_value = 999999)

c=b.mean(axis=1);c
Out[47]: 
masked_array(data = [2.5 8.2 14.4 20.6],
         mask = [False False False False],
   fill_value = 1e+20)

score 2 · Accepted Answer

您可以使用where和take：

inds = np.where(b.mask)

b[inds] = np.take(c,inds[0])

b
masked_array(data =
 [[2 1 2 3 4 2]
 [6 7 8 9 8 11]
 [12 13 14 14 16 17]
 [18 19 20 21 22 23]],
             mask =
 [[False False False False False False]
 [False False False False False False]
 [False False False False False False]
 [False False False False False False]],
       fill_value = 999999)

在这个特定示例中，您遇到了dtypeof 的问题a。如果您a = a.astype(np.float)在创建b它之前添加它就可以了。那么可能有一种更快的方法来创建索引np.where。

score 2 · Accepted Answer

尝试这个：

np.copyto(b, c[...,None], where=b.mask)

您必须添加额外的轴，c以便它知道将其应用于每一行。（如果np.mean有keepdims类似的选项np.sum，则没有必要：P

import numpy as np

a = np.arange(24).reshape(4,-1).astype(float)   # I changed your example to be a float
b = np.ma.masked_where(numpy.remainder(a,5)==0,a)
c = b.mean(1)

np.copyto(b, c[...,None], where=b.mask)

In [189]: b.data
Out[189]: 
array([[  2.5,   1. ,   2. ,   3. ,   4. ,   2.5],
       [  6. ,   7. ,   8. ,   9. ,   8.2,  11. ],
       [ 12. ,  13. ,  14. ,  14.4,  16. ,  17. ],
       [ 18. ,  19. ,  20.6,  21. ,  22. ,  23. ]])

这比创建inds数组要快：

In [169]: %%timeit
   .....: inds = np.where(b.mask)
   .....: b[inds] = np.take(c, inds[0])
   .....: 
10000 loops, best of 3: 81.2 µs per loop


In [173]: %%timeit
   .....: np.copyto(b, c[...,None], where=b.mask)
   .....: 
10000 loops, best of 3: 45.1 µs per loop

另一个优点是它会警告您有关 dtype 问题：

a = np.arange(24).reshape(4,-1)    # still an int
b = np.ma.masked_where(numpy.remainder(a,5)==0,a)
c = b.mean(1)

In [193]: np.copyto(b, c[...,None], where=b.mask)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-193-edc7f01f3f89> in <module>()
----> 1 np.copyto(b, c[...,None], where=b.mask)

TypeError: Can not cast scalar from dtype('float64') to dtype('int64') according to the rule 'same_kind'

顺便说一句，这样的任务有一组函数，取决于你有什么不同的源格式，比如

np.put
按顺序将输入数组放入索引给定位置的输出数组中，并且会像@Ophion 的答案一样工作。

np.place
依次将输入（列表或一维数组）中的每个元素分配到输出数组中掩码为真的位置（不与输入数组对齐，因为它们的形状不必匹配）。

np.copyto
将始终将输入数组中的值放入输出数组中的相同（广播）位置。形状必须匹配（或可广播）。它有效地取代了旧功能np.putmask。

python - 如何用 numpy 的行平均值替换丢失/屏蔽的数据

2 回答 2

Related

Reference