python - 如何仅从单词中生成有意义的句子？

Question

我想从单词列表中生成一个句子。我尝试过 n-gram 模型，但它只从已经存在的句子生成文本，即我们输入一个句子，它根据 n 的值输出下一个生成的单词。哪个模型将有助于仅从单词列表中生成有意义的句子，以及应该使用哪个数据集来训练模型？

score 1 · Accepted Answer

您可以使用 GPT-J。它是一个免费的 GPT 模型，其性能可与 GPT-3 媲美。该模型接受您提供的输入，并尝试完成它。

我如何使用 GPT-J 从一组关键字生成句子：

输入：

Make a sentence with the following words: earth, dirt, alligator
Sentence: While the alligator is a species which mainly lives in the water, the earth is not uncommon territory and they like to dig through the dirt.

Make a sentence with the following words: shape, lantern, hair
Sentence:

输出：

Make a sentence with the following words: earth, dirt, alligator
Sentence: While the alligator is a species which mainly lives in the water, the earth is not uncommon territory and they like to dig through the dirt.

Make a sentence with the following words: shape, lantern, hair
Sentence: The hair is so thick on the lantern that it is almost like a shape.

如何调整到某个用例？

在输入中给出您想要的示例（示例关键字 + 句子）可以帮助 GPT 理解所需输出的结构。明确解释 GPT 在输入中期望的任务是什么（造句……）可以帮助它理解我的经验中的任务。

您可以通过将例句更改为以下内容来更改输出句子的复杂性：An alligator likes to dig dirt out of the earth.

如何使用？

Git 仓库：https ://github.com/kingoflolz/mesh-transformer-jax

如repo所示，可以使用模型的web demo进行测试，也可以使用Colab实现。

网页演示：https ://6b.eleuther.ai/

Colab 笔记本：http ://colab.research.google.com/github/kingoflolz/mesh-transformer-jax/blob/master/colab_demo.ipynb

我不建议尝试在本地运行它。

score 1 · Accepted Answer

数据集： 只需获取一个由句子组成的数据集。标记每个句子并打乱句子。这些打乱的标记是你的输入，你的句子是输出。因此，您可以根据需要生成任意数量的样本：

def create_input(sentence):
    tokens = nltk.word_tokenize(sentence)
    shuffle(tokens)
    return tokens

模型更难：您可以尝试微调 BERT 模型，我想它可能会起作用。

python - 如何仅从单词中生成有意义的句子？

2 回答 2

我如何使用 GPT-J 从一组关键字生成句子：

如何调整到某个用例？

如何使用？

Related

Reference