python - Implementation of numpy in1d for 2D arrays?

Question

I have a 2D numpy array S representing a state space, with 80000000 rows (as states) and 5 columns (as state variables).

I initialize K0 with S, and at each iteration, I apply a state transition function f(x) on all of the states in Ki, and delete states whose f(x) is not in Ki, resulting Ki+1. Until it converges i.e. Ki+1 = Ki.

Going like this would take ages:

K = S
to_delete = [0]
While to_delete:
    to_delete = []
    for i in xrange(len(K)):
        if not f(i) in K:
        to_delete.append(K(i))
    K = delete(K,to_delete,0)

So I wanted to make a vectorized implementation :

slice K in columns, apply f and, join them once again, thus obtaining f(K) somehow.

The question now is how to get an array of length len(K), say Sel, where each row Sel[i] determine whether f(K[i]) is in K. Exactly like the function in1d works.

Then it would be simple to make

K=K[Sel]]

score 5 · Accepted Answer

您的问题很难理解，因为它包含无关信息并包含拼写错误。K如果我理解正确，您只需要一种有效的方法来对 2D 数组的行（在本例中为和的行的交集f(K)）执行集合操作。

如果您创建结构化数组视图，则可以使用numpy.in1d执行此操作。

代码：

如果这是K：

In [50]: k
Out[50]:
array([[6, 6],
       [3, 7],
       [7, 5],
       [7, 3],
       [1, 3],
       [1, 5],
       [7, 6],
       [3, 8],
       [6, 1],
       [6, 0]])

这是f(K)（对于这个例子，我从第一个 col 中减去 1，然后在第二个 col 中加 1）：

In [51]: k2
Out[51]:
array([[5, 7],
       [2, 8],
       [6, 6],
       [6, 4],
       [0, 4],
       [0, 6],
       [6, 7],
       [2, 9],
       [5, 2],
       [5, 1]])

K然后您可以通过执行以下操作找到所有行f(K)：

In [55]: k[np.in1d(k.view(dtype='i,i').reshape(k.shape[0]),k2.view(dtype='i,i').
reshape(k2.shape[0]))]
Out[55]: array([[6, 6]])

view并reshape创建平面结构化视图，以便每一行显示为单个元素in1d。 in1d创建k匹配项的布尔索引，用于指定索引k并返回过滤后的数组。

score 1 · Accepted Answer

不确定我是否完全理解您的问题，但如果 Paul 的解释是正确的，则可以使用numpy_indexed包在单个可读行中有效地解决并完全矢量化：

import numpy_indexed as npi
K = npi.intersection(K, f(K))

此外，这适用于任何类型或形状的行。

score 0 · Accepted Answer

上面的答案很棒。

但是，如果不想与结构化数组混合并想要一个不关心数组类型或数组元素尺寸的解决方案，我想出了这个：

k[np.in1d(list(map(np.ndarray.dumps, k)), list(map(np.ndarray.dumps, k2)))]

基本上，list(map(np.ndarray.dumps, k))而不是k.view(dtype='f8,f8').reshape(k.shape[0]).

考虑到这个解决方案要慢约 50 倍。

k = np.array([[6.5, 6.5],
       [3.5, 7.5],
       [7.5, 5.5],
       [7.5, 3.5],
       [1.5, 3.5],
       [1.5, 5.5],
       [7.5, 6.5],
       [3.5, 8.5],
       [6.5, 1.5],
       [6.5, 0.5]])
k = np.tile(k, (1000, 1))

k2 = np.c_[k[:, 0] - 1, k[:, 1] + 1]


In [132]: k.shape, k2.shape
Out[132]: ((10000, 2), (10000, 2))

In [133]: timeit k[np.in1d(k.view(dtype='f8,f8').reshape(k.shape[0]),k2.view(dtype='f8,f8').reshape(k2.shape[0]))]
10 loops, best of 3: 22.2 ms per loop

In [134]: timeit k[np.in1d(list(map(np.ndarray.dumps, k)), list(map(np.ndarray.dumps, k2)))]
1 loop, best of 3: 892 ms per loop

对于小输入来说它可能是微不足道的，但对于操作来说，它需要 1 小时 20 分钟而不是 2 分钟。

python - Implementation of numpy in1d for 2D arrays?

3 回答 3

Related

Reference