python - 在“空”行处切片 numpy recarray

Question

我numpy.recarray使用 -Method 从 .csv-Inputfile 创建了一个csv2rec()。Inputfile 和因此的 recarray 有没有数据的空行（resp. nan-values）。我想将 -rows 处的这个 recarray 切片nan成多个子数组，不包括nan最终数组中的 -rows，如下所示。

具有 2 列的原始记录：

[(1,2)
(2,2)
(nan,nan)
(nan,nan)
(4,4)
(4,3)]

2 个没有 nan 值的子数组：

[(1,2)
(2,2)]

和

[(4,4)
(4,3)]

我知道这可以使用循环来管理，但也许有一种更简单、更优雅的方式？另外：是否可以保留每列的标题信息，以便我可以通过参数名称引用列，而不仅仅是切片后的 col-index？

score 1 · Accepted Answer

对于一个2D-array：

a[~np.all(np.isnan(a),axis=1)]

对于结构化数组（recarray），您可以这样做：

def remove_nan(a, split=True):
    cols = [i[0] for i in eval(str(a.dtype))]
    col = cols[0]
    test = ~np.isnan(a[col])
    if not split:
        new_len = len(a[col][test])
        new = np.empty((new_len,), dtype=a.dtype)
        for col in cols:
            new[col] = a[col][~np.isnan(a[col])]
        return new
    else:
        indices = [i for i in xrange(len(a)-1) if test[i+1]!=test[i]]
        return [i for i in np.split(a, indices) if not np.isnan(i[col][0])]

只得到不nan使用的行split=False。例子：

a = np.array([(1,2),(2,2),(nan,nan),(nan,nan),(4,4),(4,3)], dtype=[('test',float),('col2',float)])

remove_nan(a)

#[array([(1.0, 2.0), (2.0, 2.0)],
#      dtype=[('test', '<f8'), ('col2', '<f8')]),
# array([(4.0, 4.0), (4.0, 3.0)],
#      dtype=[('test', '<f8'), ('col2', '<f8')])]

score 0 · Accepted Answer

您可以使用 scipy.ndimage.label 获取 0 和 1 数组中的区域：

>>> import numpy as np
>>> from scipy import ndimage
>>> nan = np.nan
>>> a = np.array([(1,2),(2,2),(nan,nan),(nan,nan),(4,4),(4,3)], dtype=[('test',float),('col2',float)])
>>> non_nan = np.logical_not(np.isnan(a['test'])).astype(int)
>>> labeled_array, num_features = ndimage.label(non_nan)
>>> for region in range(1,num_features+1):
...     #m = a[np.where(labeled_array==region)]
...     m = a[labeled_array==region]
...     print(region)
...     print(m)
...     print(m['col2'])
...
1
[(1.0, 2.0) (2.0, 2.0)]
[ 2.  2.]
2
[(4.0, 4.0) (4.0, 3.0)]
[ 4.  3.]

如果您知道您将始终有两个区域，那么您不需要循环，只需参考：

m1 = a[labeled_array==1]
m2 = a[labeled_array==2]

score 0 · Accepted Answer

如果您只是想摆脱空白，而不是对它们进行切片，那么只需压缩您的数组，选择标准是检查 not nan。提示，nan <> nan。

如果您真的希望在 nans 处切片，则使用一些这样的循环来生成 Non-Nan 索引列表，然后使用 choose 生成子数组 - 它们应该以这种方式保留 col 名称。

python - 在“空”行处切片 numpy recarray

3 回答 3

Related

Reference