python - 迭代并将第一项与字典中的所有项进行比较

Question

请帮忙，我似乎找不到办法做到这一点。我正在做一个网络科学项目，这是我的第三个 python 项目。

我需要将字典中的第一项与同一字典中的所有其他项进行比较，但我的其他项是字典。

例如，我有一个具有以下值的字典：

{'25': {'Return of the Jedi (1983)': 5.0},
 '42': {'Batman (1989)': 3.0, 'E.T. the Extra-Terrestrial (1982)': 5.0},
 '8': {'Return of the Jedi (1983)': 5.0 },'542': {'Alice in Wonderland (1951)': 3.0, 'Blade Runner (1982)': 4.0}, '7': {'Alice in Wonderland (1951)': 3.0,'Blade Runner (1982)': 4.0}}

所以我需要看看'25'和'42'在这种情况下是否包含相同的电影“绝地归来”，然后'25'和'8'是否有相同的电影等等。他们这样做了，我需要知道有多少电影重叠。

这是字典的一个例子，整个字典包含 1000 个键，子字典也更大。

我尝试迭代、比较字典、制作副本、合并、加入，但我似乎无法掌握如何做到这一点。

请帮忙！

问题是我仍然无法比较两个子词典，因为我需要找到至少有 2 部相同电影的键。

score 2 · Accepted Answer

您可以使用collections.Counter：

>>> dic={'25': {'Return of the Jedi (1983)': 5.0}, '42': {'Batman (1989)': 3.0, 'E.T. the Extra-Terrestrial (1982)': 5.0}, '8': {'Return of the Jedi (1983)': 5.0 }}
>>> from collections import Counter
>>> c=Counter(movie  for v in dic.values() for movie in v)

>>> [k for k,v in c.items() if v>1] #returns the name of movies repeated more than once
['Return of the Jedi (1983)']
>>> c
Counter({'Return of the Jedi (1983)': 2,
         'Batman (1989)': 1,
         'E.T. the Extra-Terrestrial (1982)': 1})

要获取与每部电影相关的密钥，您可以使用collections.defaultdict：

>>> from collections import defaultdict
>>> movie_keys=defaultdict(list)
>>> for k,v in dic.items(): 
    for movie in v:
        movie_keys[movie].append(k)
...         
>>> movie_keys
defaultdict(<type 'list'>, {'Batman (1989)': ['42'], 'Return of the Jedi (1983)': ['25', '8'], 'E.T. the Extra-Terrestrial (1982)': ['42']})

score 0 · Accepted Answer

字典中并没有真正的“第一个”项，但您可以找到包含给定电影的所有键，如下所示：

movies = {}
for k in data:
    for movie in data[k]:
        movies.setdefault(movie, []).append(k)

输出电影如下所示：

{'Return of the Jedi (1983)': [25, 8], 'Batman (1989)': [42], ...}

score 0 · Accepted Answer

我的回答只会返回一个字典，其中包含'title',['offender1',...]不止一次看过的电影对，即不会 'E.T. the Extra-Terrestrial (1982)'但'Return of the Jedi (1983)'会被报道。这可以通过简单地返回overlaps解决方案而不是字典理解的结果来改变。

其中 d 是：

d = {'25': {'Return of the Jedi (1983)': 5.0},
     '42': {'Batman (1989)': 3.0, 'E.T. the Extra-Terrestrial (1982)': 5.0},
     '8': {'Return of the Jedi (1983)': 5.0 },
     '542': {'Alice in Wonderland (1951)': 3.0, 'Blade Runner (1982)': 4.0},
     '7': {'Alice in Wonderland (1951)': 3.0,'Blade Runner (1982)': 4.0}
     }

以下：

from collections import defaultdict
import itertools
def findOverlaps(d):
    overlaps = defaultdict(list)
    for (parentKey,children) in d.items(): #children is the dictionary containing movie_title,rating pairs
        for childKey in children.keys(): #we're only interested in the titles not the ratings, hence keys() not items()
            overlaps[childKey].append(parentKey) #add the parent 'id' where the movie_title came from
    return dict(((overlap,offenders) for (overlap,offenders) in overlaps.items() if len(offenders) > 1)) #return a dictionary, only if the movie title had more than one 'id' associated with it
print(findOverlaps(d))

产生：

>>> 
{'Blade Runner (1982)': ['7', '542'], 'Return of the Jedi (1983)': ['25', '8'], 'Alice in Wonderland (1951)': ['7', '542']}

代码背后的原因：

d 中的每个条目代表id : { movie_title1: rating, movie_title2: rating }. 现在说movie_title1发生在与两个或多个单独键关联的值中。我们想要存储 id

move_title看过两次或多次的电影。
的id键，与观看电影的值相关联。

因此我们想要一个像这样的结果字典

{ move_title1: {'id1','id2'}, movie_title2: {'id2','id5'}

python - 迭代并将第一项与字典中的所有项进行比较

3 回答 3

Related

Reference