python - 根据现有列表过滤字典

Question

仍然是 Python 新手，所以请放轻松...

我已经设置了字典：

new_dict

我想过滤以返回键，其中每个键附加的任何值都与我设置的现有列表中的值匹配：

list(data.Mapped_gene)

有任何想法吗？

编辑：我仍然无法完成这项工作。

如果有帮助，csv 表和键都是字符串。

这是扩大理解的完整代码：

import csv    
new_dict = {}
with open(raw_input("Enter csv file (including path)"), 'rb') as f:
  reader = csv.reader(f)
  for row in reader:
    if row[0] in new_dict:
      new_dict[row[0]].append(row[1:])
    else:
      new_dict[row[0]] = row[1:]
print new_dict

#modified from: http://bit.ly/1iOS7Gu
import pandas
colnames = ['Date Added to Catalog',    'PUBMEDID', 'First Author', 'Date',     'Journal',  'Link', 'Study',    'DT',   'Initial Sample Size',  'Replication Sample Size',  'Region',   'Chr_id',   'Chr_pos',  'Reported Gene(s)', 'Mapped_gene',  'p-Value',  'Pvalue_mlog',  'p-Value (text)',   'OR or beta',   '95% CI (text)',    'Platform [SNPs passing QC]',   'CNV']
data = pandas.read_csv('C:\Users\Chris\Desktop\gwascatalog.csv', names=colnames)


my_list = list(data.Mapped_gene)
my_set = set(my_list)

[k for k, v in new_dict.items() if any(x in my_set for x in v)]

错误消息：“TypeError：不可散列的类型：'list'”

score 3 · Accepted Answer

使用any和列表理解：

my_list = list(data.Mapped_gene)
keys = [k for k, v in new_dict.items() if any(x in my_list for x in v)]

如果情况my_list很大，则将其转换为set第一个，因为它提供O(1)查找。

score 2 · Accepted Answer

2

geneset = set(data.Mapped_gene)
[k for k, v in new_dict.items() if geneset.intersection(v)]

于 2014-02-12T16:15:55.123 回答

score 0 · Accepted Answer

为了提高查找的性能，将列表转换为集合。

gene_set = set(data.Mapped_gene)

然后，如果您也对该值感兴趣，请使用其他示例中所示的列表推导或字典推导。

{k:v for k, v in my_dict.iteritems() if v in gene_set}

如果是巨大的，方法iteritems()方法 onmy_dict特别有用。my_dict为了使您的方法更节省内存，您可以使用生成器而不是列表或字典理解：

(k for k, v in my_dict.iteritems() if v in gene_set)

python - 根据现有列表过滤字典

3 回答 3

Related

Reference