python - 通过将它们转换为 set 来查找嵌套列表和另一个列表之间的公共项目

Question

我有以下两个列表，我正在尝试找到它们之间的常用词。我正在尝试从中提取单词l2（忽略数字）并将它们存储在中l3，但我不断收到错误消息：

列表索引必须是整数或切片，而不是元组

我对修复或是否有更好的解决方案感兴趣。

l1=['the', 'and', 'to', 'of', 'a', 'in', 'is', 'that']
l2=[('the', 637), ('of', 252), ('a', 208), ('to', 207), ('in', 147), 
    ('and', 134), ('that', 134), ('was', 133)]


l3= list(map(lambda x: set(l2[x][x]), l2[0:6]))

print(set(l1 & l3))

score 2 · Accepted Answer

您可以使用列表推导，并检查哪个元组的第一个元素包含在l1. set您可以通过构造from来降低操作的复杂性l1：

s1 = set(l1)

l3 = [s for s,*_ in l2 if s in s1]
# ['the', 'of', 'a', 'to', 'in', 'and', 'that']

或者我们也可以zip在第一个元素上使用和索引：

set(l1).intersection(list(zip(*l2))[0])

请注意，您的方法不起作用，因为您尝试使用 tuples 进行索引。lambda x每次都收到一个元组，因为您l2直接迭代。如果您有长度2子列表，您还可以考虑使用可以使用给定键访问的字典。鉴于您的数据结构，看起来这对您来说可能是一个不错的选择：

d = dict(l2)

[i for i in l1 if i in d]
# ['the', 'and', 'to', 'of', 'a', 'in', 'that']

score 2 · Accepted Answer

使用集合交集：

s1 = set(l1)

i = s1.intersection( e[0] for e in l2 )

print(i) # set(['a', 'and', 'that', 'of', 'to', 'in', 'the'])

集合交集（方法）可以采用任何可迭代的方法来找到与您调用它的集合的交集。

您的错误源于错误地使用 lambda：

map(lambda x: set(l2[x][x]), l2[0:6]))

eachx是 l2 的一个元素（你只取 . 的前六个元素，l2取map输入迭代的每个元素并应用你提供的函数。对于这个的第一个元素l2是：

set(l2[('the', 637)][('the', 637)])

这显然是错误的。

score 1 · Accepted Answer

要修复您自己的方法：

l3 = set(map(lambda x: x[0], l2))  # first element from each pair in l2

print(set(l1) & l3)  # must intersect set and set, not list and set

score 0 · Accepted Answer

您可以将列表转换l1为 aset然后您可以使用列表理解：

l1= ['the', 'and', 'to', 'of', 'a', 'in', 'is', 'that']
l1 = set(l1)

l2=[('the', 637), ('of', 252), ('a', 208), ('to', 207), ('in', 147), ('and', 134), ('that', 134), ('was', 133)]

l3 = [t[0] for t in l2 if t[0] in l1]

python - 通过将它们转换为 set 来查找嵌套列表和另一个列表之间的公共项目

4 回答 4

Related

Reference