python - Python 中的列表（使用 NLTK）

Question

我正在尝试以 [[(the, cat), (cat, with), (with, fur)] [(the, dog), (dog, with), (with, ball)........etc] 来自一个文本文件，其中包含以下行中的句子：

有毛的猫\n 有球的狗\n

我遇到的问题是，当我以某种方式逐字读取文件中的行，制作元组（变量标签）并创建最终列表（变量连接）时，有空实例连接为 0 . 嗯，实际上不是 0，但列表显示为 [[], [], []]

这是程序那部分的代码： with open('corpus.txt', 'r') as f:

with open('corpus.txt', 'r') as f:
    for line in f:
        cnt = 0
        sa = nltk.word_tokenize(line)
        label[:] = []

        for i in sa:
            words.append(i)
            if cnt>0:
                try: label +=[(prev , i)]
                except: NameError
            prev = i 
            cnt = cnt + 1

        if label != []:
            connection += [label]
            print connection

我希望有人能理解我的问题，因为它让我发疯，我的时间不多了。我只想知道我在这里做错了什么，这样我就可以在每个循环中更新我的连接列表，而不会丢失我之前保存的内容。

谢谢你的帮助

score 2 · Accepted Answer

您可以使用nltk.bigrams来获取元组，而不必担心边界条件是否正确。如果words是一个句子中的单词列表，你会得到所有的二元组

bigrams = nltk.bigrams(words)

score 1 · Accepted Answer

我没有安装 NLTK，但看看这是否适合你：

with open('corpus.txt', 'r') as f:
    answer = []
    for line in f:
        cnt = 0
        sa = nltk.word_tokenize(line)
        answer.append([tuple([char, sa[i+1]]) for i,char in enumerate(sa[:-1])])

python - Python 中的列表（使用 NLTK）

2 回答 2

Related

Reference