4

鉴于我有一个像这样的字符串:

 'velvet evening purse bags'

我怎样才能得到这个的所有单词对?换句话说,所有这两个词的组合:

'velvet evening'
'velvet purse'
'velvet bags'
'evening purse'
'evening bags'
'purse bags'

我知道 python 的nltk包可以提供二元组,但我正在寻找超出该功能的东西。还是我必须用 Python 编写自己的自定义函数?

4

3 回答 3

7

您可以itertools.combinations为此使用:

s = 'velvet evening purse bags'

from nltk import word_tokenize

words = word_tokenize(s)

from itertools import combinations

pairs = [' '.join(comb) for comb in combinations(words, 2)]

print(pairs)

输出:

['velvet evening', 'velvet purse', 'velvet bags', 'evening purse', 'evening bags', 'purse bags']
于 2019-08-18T16:48:09.097 回答
1

这应该很有趣=)

如果输入是velvet evening purse bags并且所需的输出是 @MrGeek 使用 产生itertools.combinations的,那实际上是skipgrams来自https://tedboy.github.io/nlps/generated/generated/nltk.skipgrams.html的定义

因此,您可以通过以下方式实现相同的目标:

from nltk import skipgrams

s = 'velvet evening purse bags'
tokens = word_tokenize(s)
list(skipgrams(tokens, n=2, k=len(tokens)-1))

[出去]:

[('velvet', 'evening'),
 ('velvet', 'purse'),
 ('velvet', 'bags'),
 ('evening', 'purse'),
 ('evening', 'bags'),
 ('purse', 'bags')]

在这种情况下,每个单词只能与它右侧的另一个单词组合;这有点符合人类的英语语言。

在这种情况下,单词的所有“排列”都会配对,甚至与它自己配对:

from itertools import product
s = 'velvet evening purse bags'
tokens = set(word_tokenize(s))
list(product(tokens, tokens))

[出去]:

[('velvet', 'velvet'),
 ('velvet', 'evening'),
 ('velvet', 'purse'),
 ('velvet', 'bags'),
 ('evening', 'velvet'),
 ('evening', 'evening'),
 ('evening', 'purse'),
 ('evening', 'bags'),
 ('purse', 'velvet'),
 ('purse', 'evening'),
 ('purse', 'purse'),
 ('purse', 'bags'),
 ('bags', 'velvet'),
 ('bags', 'evening'),
 ('bags', 'purse'),
 ('bags', 'bags')]
于 2019-08-29T08:33:45.693 回答
0

你也可以去老学校...

text =  'velvet evening purse bags'

n = []
ans = []
for i in text.split():
    for j in text.split():
        if j != i:
             if (i, j) not in n:
                ans.append((i, j))
                n.append((i, j))
                n.append((j, i))

输出

[('velvet', 'evening'),
 ('velvet', 'purse'),
 ('velvet', 'bags'),
 ('evening', 'purse'),
 ('evening', 'bags'),
 ('purse', 'bags')]
于 2019-08-18T17:10:17.780 回答