-1

给定两个单词列表dictionarysentence,我试图创建一个基于包含单词的二进制表示,dictionary例如sentence其中 [1,0,0,0,0,0,1,...,0]1 表示字典中的第 i 个单词出现在句子中。

我能做到这一点的最快方法是什么?

示例数据:

dictionary =  ['aardvark', 'apple','eat','I','like','maize','man','to','zebra', 'zed']
sentence = ['I', 'like', 'to', 'eat', 'apples']
result = [0,0,1,1,1,0,0,1,0,0]

考虑到我正在处理大约 56'000 个元素的非常大的列表,是否有比以下更快的方法?

x = [int(i in sentence) for i in dictionary]
4

3 回答 3

1
set2 = set(list2)
x = [int(i in set2) for i in list1]
于 2013-05-06T07:21:35.940 回答
0

使用sets,总时间复杂度O(N)

>>> sentence = ['I', 'like', 'to', 'eat', 'apples']
>>> dictionary =  ['aardvark', 'apple','eat','I','like','maize','man','to','zebra', 'zed']
>>> s= set(sentence)
>>> [int(word in s) for word in dictionary]
[0, 0, 1, 1, 1, 0, 0, 1, 0, 0]

如果您的句子列表包含实际句子而不是单词,请尝试以下操作:

>>> sentences= ["foobar foo", "spam eggs" ,"monty python"]
>>> words=["foo", "oof", "bar", "pyth" ,"spam"]
>>> from itertools import chain

# fetch words from each sentence and create a flattened set of all words
>>> s = set(chain(*(x.split() for x in sentences)))

>>> [int(x in s) for x in words]
[1, 0, 0, 0, 1]
于 2013-05-06T07:06:58.390 回答
0

我会建议这样的事情:

words = set(['hello','there']) #have the words available as a set
sentance = ['hello','monkey','theres','there']
rep = [ 1 if w in words else 0 for w in sentance ]
>>> 
[1, 0, 0, 1]

我会采用这种方法,因为集合有 O(1) 查找时间,检查是否wwords需要一个恒定的时间。这导致列表理解为 O(n),因为它必须访问每个单词一次。我相信这接近或与您将获得的一样有效。

您还提到了创建一个“布尔”数组,这将允许您简单地使用以下内容:

rep = [ w in words for w in sentance ]
>>> 
[True, False, False, True]
于 2013-05-06T07:12:07.400 回答