0

需要帮助使用其中的键合并两个字典以查看另一个字典中的值。如果返回 true,它会将自己的值附加到另一个字典中(更新它..但不覆盖已经存在的值)

代码(对不起第一个自定义脚本):

otuid2clusteridlist = dict()
finallist = otuid2clusteridlist
clusterid2denoiseidlist = dict()

#first block, also = finallist we append all other blocks into.
for line in open('cluster_97.ucm', 'r'):
    lineArray = re.split('\s+',line)
    otuid = lineArray[0]
    clusterid = lineArray[3]
    if otuid in otuid2clusteridlist:
        otuid2clusteridlist[otuid].append(clusterid)
    else:
        otuid2clusteridlist[otuid] = list()
        otuid2clusteridlist[otuid].append(clusterid)

#second block, higher tier needs to expand previous blocks hash
for line in open('denoise.ucm_test', 'r'):
    lineArray = re.split('\s+', line)
    clusterid = lineArray[4]
    denoiseid = lineArray[3]
    if clusterid in clusterid2denoiseidlist:
        clusterid2denoiseidlist[clusterid].append(denoiseid)
    else:
        clusterid2denoiseidlist[clusterid] = list()
        clusterid2denoiseidlist[clusterid].append(denoiseid)  

#print/return function for testing (will convert to write out later)
for key in finallist:
    print "OTU:", key, "has", len(finallist[key]), "sequence(s) which", "=", finallist[key]

阻止一返回

OTU: 3 has 3 sequence(s) which = ['5PLAS.R2.h_35336', 'GG13_52054', 'GG13_798']
OTU: 5 has 1 sequence(s) which = ['DEX1.h_14175']
OTU: 4 has 1 sequence(s) which = ['PLAS.h_34150']
OTU: 7 has 1 sequence(s) which = ['DEX12.13.h_545']
OTU: 6 has 1 sequence(s) which = ['GG13_45705']

阻止两次退货

OTU: GG13_45705 has 4 sequence(s) which = ['GG13_45705', 'GG13_6312', 'GG13_32148', 'GG13_35246']

所以目标是将块二的输出添加到块一中。我希望它像这样添加

...
 OTU: 6 has 4 sequence(s) which = ['GG13_45705', 'GG13_6312', 'GG13_32148', 'GG13_35246']

我尝试dic.update过,但它只是将块二的内容添加到块一中,因为密钥不存在于块一中。

我认为我的问题更复杂,我需要第二个块在块一个的值中查找其键并将值附加到该列表中。

我一直在尝试 for 循环和 .append (类似于已经编写的代码),但我缺乏 python 的整体知识来解决这个问题。

想法?

补充,

数据的一些子集:

cluster_97.ucm(阻止一个文件):

5 376 * DEX1.h_14175 DEX1.h_14175
6 294 * GG13_45705 GG13_45705
0 447 98.7 DEX22.h_37221 DEX29.h_4583
1 367 98.9 DEX14.15.h_35477 DEX27.h_779
1 443 98.4 DEX27.h_3794 DEX27.h_779
0 478 97.9 DEX22.h_7519 DEX29.h_4583

denoise.ucm_test(块二的文件):

11 294 * GG13_45705 GG13_45705
11 278 99.6 GG13_6312 GG13_45705
11 285 99.6 GG13_32148 GG13_45705
11 275 99.6 GG13_35246 GG13_45705

我选择了这些子集,因为文件一中的第二行是文件二将要更新的内容。

如果有人想试一试。

4

1 回答 1

0

更新以反映值的匹配...

我认为您的问题的解决方案可以在以下事实中找到:在 Python 中列出一个可变变量,而具有可变值的变量只是引用。所以我们可以使用第二个字典将值映射到列表。

import re

otuid2clusteridlist = dict()
finallist = otuid2clusteridlist
clusterid2denoiseidlist = dict()
known_clusters = dict()

#first block, also = finallist we append all other blocks into.
for line in open('cluster_97.ucm', 'r'):
    lineArray = re.split('\s+',line)
    otuid = lineArray[0]
    clusterid = lineArray[3]
    if otuid in otuid2clusteridlist:
        otuid2clusteridlist[otuid].append(clusterid)
    else:
        otuid2clusteridlist[otuid] = list()
        otuid2clusteridlist[otuid].append(clusterid)

    # remeber the clusters
    known_clusters[clusterid] = otuid2clusteridlist[otuid]

#second block, higher tier needs to expand previous blocks hash
for line in open('denoise.ucm_test', 'r'):
    lineArray = re.split('\s+', line)
    clusterid = lineArray[4]
    denoiseid = lineArray[3]
    if clusterid in clusterid2denoiseidlist:
        clusterid2denoiseidlist[clusterid].append(denoiseid)
    else:
        clusterid2denoiseidlist[clusterid] = list()
        clusterid2denoiseidlist[clusterid].append(denoiseid)

    # match the cluster and update as needed
    matched_cluster = known_clusters.setdefault(clusterid, [])
    if denoiseid not in matched_cluster:
        matched_cluster.append(denoiseid)



#print/return function for testing (will convert to write out later)
for key in finallist:
    print "OTU:", key, "has", len(finallist[key]), "sequence(s) which", "=", finallist[key]

我不确定你是否需要clusterid2denoiseidlist,所以我添加了一个新known_clusters的来保存从值到列表的映射。

I'm not sure I covered all the edge cases in your real problem, but this generates the desired output given the supplied test inputs.

于 2012-12-02T02:40:44.563 回答