I'm trying to use forward_features to get instance keys for cloudml, but I always get errors that I'm not sure how to fix. The preprocessing section that uses tf.Transform is a modification of https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/reddit_tft where the instance key is a string and everything else is a bunch of floats.
def gzip_reader_fn():
return tf.TFRecordReader(options=tf.python_io.TFRecordOptions(
compression_type=tf.python_io.TFRecordCompressionType.GZIP))
def get_transformed_reader_input_fn(transformed_metadata,
transformed_data_paths,
batch_size,
mode):
"""Wrap the get input features function to provide the runtime arguments."""
return input_fn_maker.build_training_input_fn(
metadata=transformed_metadata,
file_pattern=(
transformed_data_paths[0] if len(transformed_data_paths) == 1
else transformed_data_paths),
training_batch_size=batch_size,
label_keys=[],
#feature_keys=FEATURE_COLUMNS,
#key_feature_name='example_id',
reader=gzip_reader_fn,
reader_num_threads=4,
queue_capacity=batch_size * 2,
randomize_input=(mode != tf.contrib.learn.ModeKeys.EVAL),
num_epochs=(1 if mode == tf.contrib.learn.ModeKeys.EVAL else None))
estimator = KMeansClustering(num_clusters=8,
initial_clusters=KMeansClustering.KMEANS_PLUS_PLUS_INIT,
kmeans_plus_plus_num_retries=32,
relative_tolerance=0.0001)
estimator = tf.contrib.estimator.forward_features(
estimator,
'example_id')
train_input_fn = get_transformed_reader_input_fn(
transformed_metadata, args.train_data_paths, args.batch_size,
tf.contrib.learn.ModeKeys.TRAIN)
estimator.train(input_fn=train_input_fn)
If I were to pass in the keys column along side the training features, then I get the error Tensors in list passed to 'values' of 'ConcatV2' Op have types [float32, float32, string, float32, float32, float32, float32, float32, float32, f
loat32, float32, float32, float32, float32, float32, float32, float32, float32, float32, float32, float32, float32, float32, float32] that don't all match.
However, if I were to not pass in the instance keys during training, then I get the value error saying that the key doesn't exist in the features. Also, if I were to change the key column name in the forward_features section from 'example_id' to some random name that isn't a column, then I still get the former error instead of the latter. Can anyone help me make sense of this?