0

我正在尝试tf.data.Dataset.group_by_window()在我拥有的数据集上运行petastorm make_tf_dataset并不断收到错误

ValueError: Invalid `key_func`. `key_func` must return a single `tf.int64` scalar tensor but its return type is TensorSpec(shape=(None,), dtype=tf.int64, name=None).

编码:

with test_converter.make_tf_dataset(batch_size=BATCH_SIZE, num_epochs=1) as test_dataset:

  tf_test = test_dataset.map(row_generator, num_parallel_calls=tf.data.AUTOTUNE, deterministic=False)  

  key_func = lambda x: x["my_id_int"]
  reduce_func = lambda key, dataset: dataset.batch(100)
  tf_test_grp = tf_test.group_by_window(
      key_func=key_func, reduce_func=reduce_func, window_size=100)

行生成器是

def row_generator(x):
  d = {'my_id_int':x.my_id_int, ...}
  return d

和 test_converter 是

test_converter = make_spark_converter(df_test.select(all_fields))

任何人都知道如何解决它?键或映射应该有不同的值吗?

4

0 回答 0