python - 如何在 .map 函数中访问张量形状？

Question

我有多个长度的音频数据集，我想在 5 秒的窗口中裁剪所有音频（这意味着 240000 个元素和 48000 个采样率）。因此，在加载 .tfrecord 后，我正在执行以下操作：

audio, sr = tf.audio.decode_wav(image_data)

它返回一个具有音频长度的张量。如果这个长度小于 240000，我想重复音频内容，直到它是 240000。所以我正在处理所有音频，具有以下tf.data.Dataset.map()功能：

audio = tf.tile(audio, [5])

因为这就是将我最短的音频填充到所需长度的方法。

但是为了提高效率，我只想对需要它的元素进行操作：

if audio.shape[0] < 240000:
  pad_num = tf.math.ceil(240000 / audio.shape[0]) #i.e. if the audio is 120000 long, the audio will repeat 2 times
  audio = tf.tile(audio, [pad_num])

但我无法访问 shape 属性，因为它是动态的并且在音频中有所不同。我试过使用tf.shape(audio), audio.shape, audio.get_shape()，但我得到None了像形状这样的值，这不允许我进行比较。

是否有可能做到这一点？

score 3 · Accepted Answer

你可以使用这样的函数：

import tensorflow as tf

def enforce_length(audio):
    # Target shape
    AUDIO_LEN = 240_000
    # Current shape
    current_len = tf.shape(audio)[0]
    # Compute number of necessary repetitions
    num_reps = AUDIO_LEN // current_len
    num_reps += tf.dtypes.cast((AUDIO_LEN % current_len) > 0, num_reps.dtype)
    # Do repetitions
    audio_rep = tf.tile(audio, [num_reps])
    # Trim to required size
    return audio_rep[:AUDIO_LEN]

# Test
examples = tf.data.Dataset.from_generator(lambda: iter([
    tf.zeros([100_000], tf.float32),
    tf.zeros([300_000], tf.float32),
    tf.zeros([123_456], tf.float32),
]), output_types=tf.float32, output_shapes=[None])
result = examples.map(enforce_length)
for item in result:
    print(item.shape)

输出：

(240000,)
(240000,)
(240000,)

python - 如何在 .map 函数中访问张量形状？

1 回答 1

Related

Reference