python - 返回数组中子数组的索引

Question

我将 Python 与numpy.

我有一个 numpy 数组may_a：

may_a = numpy.array([False, True, False, True, True, False, True, False, True, True, False])

我有一个 numpy 数组may_b：

may_b = numpy.array([False,True,True,False])

我需要在数组may_b中找到数组may_a。

在输出中，我需要获取出现的索引。

out_index=[2,7]

有人可以建议，我怎么得到out_index？

score 5 · Accepted Answer

编辑以下代码确实允许执行基于卷积的相等性检查。它映射True到1和False。-1它还 reverses b，这是它正常工作所必需的：

def search(a, b) :
    return np.where(np.round(fftconvolve(a * 2 - 1, (b * 2 - 1)[::-1],
                                         mode='valid') - len(b)) == 0)[0]

as_strided我已经检查过它是否提供与各种随机输入的方法相同的输出，确实如此。我也对这两种方法都进行了计时，卷积仅在大约 256 个项目的大型搜索标记中开始获得回报。

这似乎有点矫枉过正，但您可以使用布尔数据（滥用？）卷积：

In [8]: np.where(np.convolve(may_a, may_b.astype(int),
   ...:                      mode='valid') == may_b.sum())[0]
Out[8]: array([2, 7])

对于较大的数据集，使用它可能会更快scipy.signal.fftconvolve：

In [13]: np.where(scipy.signal.fftconvolve(may_a, may_b,
   ....:                                   mode='valid') == may_b.sum())[0]
Out[13]: array([2, 7])

不过你必须小心，因为现在的输出是浮点数，四舍五入可能会破坏相等性检查：

In [14]: scipy.signal.fftconvolve(may_a, may_b, mode='valid')
Out[14]: array([ 1.,  1.,  2.,  1.,  1.,  1.,  1.,  2.])

因此，您可能会更好地使用以下内容：

In [15]: np.where(np.round(scipy.signal.fftconvolve(may_a, may_b, mode='valid') -
   ....:                   may_b.sum()) == 0)[0]
Out[15]: array([2, 7])

score 5 · Accepted Answer

一种更酷的方法，它可能表现不佳，但适用于任何 dtype，是使用as_strided：

In [2]: from numpy.lib.stride_tricks import as_strided

In [3]: may_a = numpy.array([False, True, False, True, True, False,
   ...:                      True, False, True, True, False])

In [4]: may_b = numpy.array([False,True,True,False])

In [5]: a = len(may_a)

In [6]: b = len(may_b)

In [7]: a_view = as_strided(may_a, shape=(a - b + 1, b),
   ...:                     strides=(may_a.dtype.itemsize,) * 2)

In [8]: a_view
Out[8]: 
array([[False,  True, False,  True],
       [ True, False,  True,  True],
       [False,  True,  True, False],
       [ True,  True, False,  True],
       [ True, False,  True, False],
       [False,  True, False,  True],
       [ True, False,  True,  True],
       [False,  True,  True, False]], dtype=bool)

In [9]: numpy.where(numpy.all(a_view == may_b, axis=1))[0]
Out[9]: array([2, 7])

不过你必须小心，因为即使a_view是may_a's 数据的视图，当将它与may_b临时数组进行比较时，(a - b + 1) * b也会创建一个临时数组，这可能是大as 和bs 的问题。

score 3 · Accepted Answer

这看起来非常类似于字符串搜索问题。如果你想避免实现这些字符串搜索算法，你可以通过执行以下操作来滥用字符串搜索中内置的 python，这非常快：

# I've added [True, True, True] at the end.
may_a = numpy.array([False, True, False, True, True, False, True, False, True, True, False, True, True, True])
may_b = numpy.array([False,True,True,False])

may_a_str = may_a.tostring()
may_b_str = may_b.tostring()

idx = may_a_str.find(may_b_str)
out_index = []
while idx >= 0:
    out_index.append(idx)
    idx = may_a_str.find(may_b_str, idx+1)

这应该适用于布尔数组。如果您想将此方法用于另一种数组类型，则需要确保两个数组的步幅匹配并将 out_index 除以该步幅。

您还可以使用正则表达式模块而不是循环来进行字符串搜索。

score 2 · Accepted Answer

这也应该适用于其他布尔数据：

In [1]: import numpy as np

In [2]: a = np.array([False, True, False, True, True, False, True, False, True, True, False])

In [3]: b = np.array([False,True,True,False])

In [4]: def get_indices(a, b):
   ...:     window = len(b)
   ...:     shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
   ...:     strides = a.strides + (a.strides[-1],)
   ...:     w = np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
   ...:     return np.where(np.all(np.equal(w,b),1) == True)[0]

In [5]: get_indices(a,b)
Out[5]: array([2, 7])

score 1 · Accepted Answer

我不确定 numpy 是否为此提供了功能。如果没有，这里有一个解决方案：

import numpy

def searchListIndexs(array, target):
    ret = []
    iLimit = len(array)-len(target)+1
    jLimit = len(target)
    for i in range(iLimit):
        for j in range(jLimit):
            if array[i+j] != target[j]:
                break
        else:
            ret.append(i)
    return ret


may_a = numpy.array([False, True, False, True, True, False, True, False, True, True, False])
may_b = numpy.array([False,True,True,False])
out_index = searchListIndexs(may_a, may_b)
print out_index #If you are using Python 3, then use print(out_index) instead.

python - 返回数组中子数组的索引

5 回答 5

Related

Reference