python - Textsum（tensorflow）：使用从数据集生成的词汇文件时出现断言错误

Question

我在 CNN 数据上运行时遇到了小问题。使用上面的代码生成的词汇文件给出了断言错误。我无法理解是什么导致了这个问题。

这是我得到的错误：

Traceback (most recent call last):
File “/home/umair/sumModel/bazel-bin/textsum/seq2seq_attention.runfiles/__main__/textsum/seq2seq_attention.py”, line 213, in <module>
tf.app.run()
File “/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py”, line 30, in run
sys.exit(main(sys.argv))
File “/home/umair/sumModel/bazel-bin/textsum/seq2seq_attention.runfiles/__main__/textsum/seq2seq_attention.py”, line 165, in main
assert vocab.CheckVocab(data.SENTENCE_START) > 0
AssertionError

seq2seq_attention.py 中的函数：

def main(unused_argv): vocab = data.Vocab(FLAGS.vocab_path, 10000000) 检查是否存在所需的特殊标记。断言 vocab.CheckVocab(data.PAD_TOKEN) > 0 断言 vocab.CheckVocab(data.UNKNOWN_TOKEN) >= 0 断言 vocab.CheckVocab(data.SENTENCE_START) > 0 断言 vocab.CheckVocab(data.SENTENCE_END) > 0 –</p>

score 0 · Accepted Answer

这些怎么样？您在词汇表中错过了其中的一些，即 SENTENSE_START。

# Special tokens
PARAGRAPH_START = '<p>'
PARAGRAPH_END = '</p>'
SENTENCE_START = '<s>'
SENTENCE_END = '</s>'
UNKNOWN_TOKEN = '<UNK>'
PAD_TOKEN = '<PAD>'
DOCUMENT_START = '<d>'
DOCUMENT_END = '</d>'

来源：https ://github.com/tensorflow/models/blob/master/textsum/data.py

python - Textsum（tensorflow）：使用从数据集生成的词汇文件时出现断言错误

1 回答 1

Related

Reference