python - 如何正确使用 tft.compute_and_apply_vocabulary 和 tft.tfidf？

Question

我尝试使用 tft.compute_and_apply_vocabulary 和 tft.tfidf 在我的 jupyter notebook 中计算 tfidf。但是我总是收到以下错误：

tensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'compute_and_apply_vocabulary/vocabulary/Placeholder' with dtype string
     [[node compute_and_apply_vocabulary/vocabulary/Placeholder (defined at C:\Users\secsi\Anaconda3\envs\tf2\lib\site-packages\tensorflow_

但占位符类型实际上是字符串。

这是我的代码：

import tensorflow as tf
import tensorflow_transform as tft

with tf.Session() as sess:
    documents = [
        "a b c d e",
        "f g h i j",
        "k l m n o",
        "p q r s t",
    ]
    documents_tensor = tf.placeholder(tf.string)
    tokens = tf.compat.v1.string_split(documents_tensor)
    compute_vocab = tft.compute_and_apply_vocabulary(tokens, vocab_filename='vocab.txt')

    global_vars_init = tf.global_variables_initializer()
    tabel_init = tf.tables_initializer()


    sess.run([global_vars_init, tabel_init])
    token2ids = sess.run(tfidf, feed_dict={documents_tensor: documents})
    print(f"token2ids: {token2ids}")

版本：

张量流：1.14
张量流变换：0.14

提前致谢！

score 2 · Accepted Answer

我们不能直接使用like的Operations Tensorflow Transform，tft.compute_and_apply_vocabulary不像TensorflowOperations可以直接在a中使用Session。

为了我们使用的操作Tensorflow Transform，我们必须在preprocessing_fn应该传递给的 a 中运行它们tft_beam.AnalyzeAndTransformDataset。

在您的情况下，由于您有文本数据，您的代码可以修改如下所示：

def preprocessing_fn(inputs):

    """inputs is our dataset"""
    documents = inputs['documents']

    tokens = tf.compat.v1.string_split(documents)
    compute_vocab = tft.compute_and_apply_vocabulary(tokens)
    # Add one for the oov bucket created by compute_and_apply_vocabulary.
    review_bow_indices, review_weight = tft.tfidf(compute_vocab,
                                                  VOCAB_SIZE + 1)
    return {
        REVIEW_KEY: review_bow_indices,
        REVIEW_WEIGHT_KEY: review_weight,
        LABEL_KEY: inputs[LABEL_KEY]
    }

(transformed_train_data, transformed_metadata), transform_fn = 
((train_data, RAW_DATA_METADATA) | 'AnalyzeAndTransform' >>
tft_beam.AnalyzeAndTransformDataset(preprocessing_fn))

您可以参考此链接Tensorflow Transform以获取有关如何使用文本数据集（情感分析）执行数据预处理的示例。

如果您觉得此答案有用，请接受此答案和/或投票。谢谢。

python - 如何正确使用 tft.compute_and_apply_vocabulary 和 tft.tfidf？

1 回答 1

Related

Reference