python - 对元组列表执行设置操作差异

Question

我试图找出 2 个容器之间的区别，但容器的结构很奇怪，所以我不知道对其进行区别的最佳方法是什么。一种容器类型和结构我无法更改，但其他容器类型和结构我可以（可变分隔符）。

delims = ['on','with','to','and','in','the','from','or']
words = collections.Counter(s.split()).most_common()
# words results in [("the",2), ("a",9), ("diplomacy", 1)]

#I want to perform a 'difference' operation on words to remove all the delims words
descriptive_words = set(words) - set(delims)

# because of the unqiue structure of words(list of tuples) its hard to perform a difference
# on it. What would be the best way to perform a difference? Maybe...

delims = [('on',0),('with',0),('to',0),('and',0),('in',0),('the',0),('from',0),('or',0)]
words = collections.Counter(s.split()).most_common()
descriptive_words = set(words) - set(delims)

# Or maybe
words = collections.Counter(s.split()).most_common()
n_words = []
for w in words:
   n_words.append(w[0])
delims = ['on','with','to','and','in','the','from','or']
descriptive_words = set(n_words) - set(delims)

score 3 · Accepted Answer

words仅通过删除所有分隔符进行修改怎么样？

words = collections.Counter(s.split())
for delim in delims:
    del words[delim]

score 1 · Accepted Answer

我将如何做到这一点：

delims = set(['on','with','to','and','in','the','from','or'])
# ...
descriptive_words = filter(lamdba x: x[0] not in delims, words)

使用过滤方法。一个可行的替代方案是：

delims = set(['on','with','to','and','in','the','from','or'])
# ...
decsriptive_words = [ (word, count) for word,count in words if word not in delims ]

确保delims它们在一个集合中以允许O(1) 查找。

score 1 · Accepted Answer

最简单的答案是：

import collections

s = "the a a a a the a a a a a diplomacy"
delims = {'on','with','to','and','in','the','from','or'}
// For older versions of python without set literals:
// delims = set(['on','with','to','and','in','the','from','or'])
words = collections.Counter(s.split())

not_delims = {key: value for (key, value) in words.items() if key not in delims}
// For older versions of python without dict comprehensions:
// not_delims = dict(((key, value) for (key, value) in words.items() if key not in delims))

这给了我们：

{'a': 9, 'diplomacy': 1}

另一种选择是先发制人：

import collections

s = "the a a a a the a a a a a diplomacy"
delims = {'on','with','to','and','in','the','from','or'}
counted_words = collections.Counter((word for word in s.split() if word not in delims))

在这里，您在将单词列表提供给计数器之前对其应用过滤，这会产生相同的结果。

score 0 · Accepted Answer

如果您无论如何都在迭代它，为什么还要麻烦将它们转换为集合？

dwords = [delim[0] for delim in delims]
words  = [word for word in words if word[0] not in dwords]

score 0 · Accepted Answer

0

为了提高性能，您可以使用lambda函数

filter(lambda word: word[0] not in delim, words)

于 2012-03-29T09:51:16.523 回答

python - 对元组列表执行设置操作差异

5 回答 5

Related

Reference