tensorflow2.0 - TensorFlow Extended：在 Schema 中指定特征的效价

Question

我目前正在尝试通过 TensorFlow Extended (TFX) 管道为数据集提供一些多价特征列。这是我的示例数据中的一行：

user_id                     29601
product_id                     28
touched_product_id     [2435, 28]
liked_product_id       [2435, 28]
disliked_product_id            []
target                          1

如您所见，列（特征）touched_product_id、、、liked_product_id是disliked_product_id多价的。

现在，为了通过 TFX 的验证层提供这些数据，我遵循以下指南：

https://www.tensorflow.org/tfx/tutorials/tfx/components_keras

根据指南，我TFRecord使用的实例生成一些文件CSVExampleGen，然后继续生成统计信息和模式，如下所示：

# create train and eval records
c = CsvExampleGen(input_base='sample_train')
context.run(c)

# generate statistics
statistics_gen = StatisticsGen(
    examples=c.outputs['examples']
)
context.run(statistics_gen)

# generate schema
schema_gen = SchemaGen(
    statistics=statistics_gen.outputs['statistics'],
    infer_feature_shape=False)
context.run(schema_gen)
context.show(schema_gen.outputs['schema'])

上述代码显示的最终架构是：

                        Type  Presence Valency Domain
Feature name                                         
'disliked_product_id'  BYTES  required  single      -
'liked_product_id'     BYTES  required  single      -
'product_id'             INT  required  single      -
'target'                 INT  required  single      -
'touched_product_id'   BYTES  required  single      -
'user_id'                INT  required  single      -

显然，多价特征被错误地推断为单价。为了解决这个问题，我Schema手动加载了原型并尝试调整valence属性。

schema_path = os.path.join(schema_gen.outputs['schema'].get()[0].uri, 'schema.pbtxt')
schema = schema_pb2.Schema()
contents = file_io.read_file_to_string(schema_path)
schema = text_format.Parse(contents, schema)

# THIS LINE DOES NOT WORK
tfdv.get_feature(schema, 'user_id').valence = 'multiple'

显然，最后一行不起作用，因为令我惊讶的是，没有valence属性。我尝试查看Schema原型的规范，但没有找到valence属性。任何人都知道我该如何解决这个问题？任何指导都会令人难以置信。

score 0 · Accepted Answer

0

尝试将 feature.value_count.min 或 feature.value_count.max 设置为大于 1 的值

于 2020-11-02T18:20:05.833 回答

tensorflow2.0 - TensorFlow Extended：在 Schema 中指定特征的效价

1 回答 1

Related

Reference