python - 正确使用 TensorFlow Transform apply_buckets

Question

这是在 TensorFlow 1.11.0 上。的文档描述性tft.apply_buckets不是很强。具体来说，我读到：“bucket_boundaries：桶边界表示为 2 阶张量。”

我认为这必须是存储桶索引和存储桶边界？

当我尝试使用下面的玩具示例时：

import tensorflow as tf
import tensorflow_transform as tft
import numpy as np

tf.enable_eager_execution()

x = np.array([-1,9,19, 29, 39])
xt = tf.cast(
        tf.convert_to_tensor(x),
        tf.float32
        )

boundaries = tf.cast(
                tf.transpose(
                    tf.convert_to_tensor([[0, 1, 2, 3], [10, 20, 30, 40]])
                    ),
                tf.float32
                )

buckets = tft.apply_buckets(xt, boundaries)

我得到：

InvalidArgumentError: Expected sorted boundaries [Op:BucketizeWithInputBoundaries] name: assign_buckets

请注意，在这种情况下x，bucket_boundaries参数是：

tf.Tensor([-1.  9. 19. 29. 39.], shape=(5,), dtype=float32)
tf.Tensor(
[[ 0. 10.]
 [ 1. 20.]
 [ 2. 30.]
 [ 3. 40.]], shape=(4, 2), dtype=float32)

因此，似乎bucket_boundaries不应该是索引和边界。有谁知道如何正确使用这种方法？

score 2 · Accepted Answer

在玩了一些之后，我发现它bucket_boundaries应该是一个二维数组，其中条目是存储桶边界，并且数组被包装，所以它有两列。请参见下面的示例：

import tensorflow as tf
import tensorflow_transform as tft
import numpy as np

tf.enable_eager_execution()

x = np.array([-1,9,19, 29, 39])
xt = tf.cast(
        tf.convert_to_tensor(x),
        tf.float32
        )

boundaries = tf.cast(
                tf.transpose(
                    tf.convert_to_tensor([[0, 20, 40, 60], [10, 30, 50, 70]])
                    ),
                tf.float32
                )

buckets = tft.apply_buckets(xt, boundaries)

因此，预期的输入是：

print (xt)
print (buckets)
print (boundaries)

tf.Tensor([-1.  9. 19. 29. 39.], shape=(5,), dtype=float32)
tf.Tensor([0 1 2 3 4], shape=(5,), dtype=int64)
tf.Tensor(
[[ 0. 10.]
 [20. 30.]
 [40. 50.]
 [60. 70.]], shape=(4, 2), dtype=float32)

score 1 · Accepted Answer

想补充一下，因为这是谷歌搜索“tft.apply_buckets”的唯一结果:)

我的示例在最新版本的 TFT 中不起作用。以下代码确实对我有用。

请注意，桶被指定为 2 阶张量，但内部维度中只有一个元素。

（我用错了词，但希望我下面的例子能澄清）

import tensorflow as tf
import tensorflow_transform as tft
import numpy as np

tf.enable_eager_execution()

xt = tf.cast(tf.convert_to_tensor(np.array([-1,9,19, 29, 39])),tf.float32)
bds = [[0],[10],[20],[30],[40]]
boundaries = tf.cast(tf.convert_to_tensor(bds),tf.float32)
buckets = tft.apply_buckets(xt, boundaries)

感谢您的帮助，因为这个答案让我大部分时间都在那里！

我从 TFT 源代码中找到的其余部分： https ://github.com/tensorflow/transform/blob/deb198d59f09624984622f7249944cdd8c3b733f/tensorflow_transform/mappers.py#L1697-L1698

python - 正确使用 TensorFlow Transform apply_buckets

2 回答 2

Related

Reference