numpy - Optimal way to find index of first occurrence of subarray in each frame of batch data without for loop

Question

I have to find the index of first occurrence of a sub array in each frame.The data is of size (batch_size,400). i need to find the index of occurrence of three consecutive ones in each frame of size 400. Data-> [0 0 0 1 1 1 0 1 1 1 1 1][0 0 0 0 1 1 1 0 0 1 1 1] [0 1 1 1 0 0 0 1 1 1 1 1]

output should be [3 4 1]

The native solution is using for loop but since data is large it is very time consuming.

any implementation in numpy or tensorflow which is fast and efficient

score 0 · Accepted Answer

There is no simple numpy solution for this. However what you can do if you really need it to be fast is the following using numba:

The function find_first does basically what you would do with the for loop. But since you are using numba, the method is compiled, thus much faster. Then you just apply the method to each batch using np.apply_along_axis:

import numpy as np
from numba import jit


@jit(nopython=True)
def find_first(seq, arr):
    """return the index of the first occurence of item in arr"""
    for i in range(len(arr)-2):
        if np.all(seq == arr[i:i+3]):
            return i
    return -1

# construct test array
test = np.round(np.random.random((64,400)))

# this will give you the array of indices
np.apply_along_axis(lambda m: find_first(np.array([1,1,1]), m), axis=1, arr = test)

I modified the method from this answer

numpy - Optimal way to find index of first occurrence of subarray in each frame of batch data without for loop

1 回答 1

Related

Reference