python - 将张量数据裁剪到包围体

Question

我有 2 个关于 tensorflow 2.0 的问题，重点是 tensorflow 如何在其操作图中处理组合条件测试。

任务：将大量数据点切成块，并将索引存储到属于该卷的样本（而不是样本本身）。

我最初的方法：循环所有元素并收集“边界体积”内的数据点的索引。这非常慢，无论我如何重新排序坐标上的比较。

    # X.shape == [elements,features]
    # xmin.shape == xmax.shape == [features]

    def getIndices(X, xmin, xmax):
        i = 0
        indices = tf.zero(shape[0], dtype = tf.int32)
        for x in X:
            if (x[0] > xmin[0]):
                if (x[1] > xmin[1]):
                    if (x[2] <= xmax[2]):

                        # ...and so on...

                        indices = tf.concat([indices, i], axis = 0)
            i = i + 1
        return indices

然后我想出了产生布尔张量并在逻辑上“和”它们以获得indices我需要的元素的想法。快了很多，如下一个示例所示：

    # X.shape == [elements,features]
    # xmin.shape == xmax.shape == [features]

    def getIndices(X, xmin, xmax):
        # example of 3 different conditions to clip to (a part of) the bounding volume 
        # X is the data and xmin and xmax are tensors containing the bounding volume

        c0 = (X[:,0] >   xmin[0])
        c1 = (X[:,1] >   xmin[1]) # processing all elements
        c2 = (X[:,2] <=  xmax[2]) # idem

        # ... there could be many more conditions, you get the idea..

        indices = tf.where(tf.math.logical_and(c1, tf.math.logical_and(c2, c3) )

        return indices

    #    ...

    indices = getIndices(X, xmin, xmax)
    trimmedX = tf.gather(X, indices)

此代码产生正确的结果，但我想知道它是否是最佳的。

第一个问题是关于调度的：

如果包含操作的张量流图知道一些（块）已经测试过的元素，它是否会剔除（块）条件测试 False。由于logical_and组合了逻辑条件，因此对这些元素的后续条件测试将永远不会产生True.

实际上，在上面的示例中c1，并且正在就可能已经从集合中排除的c2元素提出问题。c0尤其是当您有大量元素要测试时，这可能会浪费时间，即使在并行硬件平台上也是如此

那么，如果我们根据之前的测试结果级联测试呢？虽然这看起来像是一个已解决的问题，但这个解决方案是不正确的，因为最终indices张量将引用一个子集_X，而不是总集X：

    # X.shape == [elements,features]
    # xmin.shape == xmax.shape == [features]

    def getIndices(X, xmin, xmax):
        c0 = (X[:,0] >   xmin[0])
        indices = tf.where(c0)
        _X = tf.gather(X, indices)

        c1 = (_X[:,1] >   xmin[1]) # processing only trimmed elements
        indices = tf.where(c1)
        _X = tf.gather(_X, indices)

        c2 = (_X[:,2] <=  xmax[2]) # idem
        indices = tf.where(c2)
        return indices

    ...
    indices = getIndices(X, xmin, xmax)
    trimmedX = tf.gather(X, indices)  # fails: indices refer to a trimmed subset, not X

我当然可以通过简单地扩展来“解决”这个问题X，这样每个元素也可以在原始列表中包含自身的索引，然后像以前一样继续。

所以我的第二个问题是关于功能的：

tf 是否有一种方法可以让 GPU/张量基础设施提供簿记，而无需在这个看似简单的问题上花费内存/时间？

score 0 · Accepted Answer

这将返回大于minimum和小于maximum当这两个索引具有相同数量的特征时的所有索引X

import tensorflow as tf

minimum = tf.random.uniform((1, 5), 0., 0.5)
maximum = tf.random.uniform((1, 5), 0.5, 1.)

x = tf.random.uniform((10, 5))

indices = tf.where(
    tf.logical_and(
        tf.greater(x, minimum),
        tf.less(x, maximum)
        )
    )

<tf.Tensor: shape=(22, 2), dtype=int64, numpy=
array([[0, 3],
       [0, 4],
       [1, 1],
       [1, 2],
       [1, 3],
       [1, 4],
       [3, 1],
       [3, 3],
       [3, 4],
       [4, 0],
       [4, 4],
       [5, 3],
       [6, 2],
       [6, 3],
       [7, 1],
       [7, 4],
       [8, 2],
       [8, 3],
       [8, 4],
       [9, 1],
       [9, 3],
       [9, 4]], dtype=int64)>

python - 将张量数据裁剪到包围体

1 回答 1

Related

Reference