0

我有一个包含以下信息的字典列表(称为“primer names”):

{'part number': 1, 'notes': 'Fw Gibson primer on pEM113 to extract CmR resistance and pSC101 backbone and T7 promoter and term.', 'direction': 'fw primer', 'construct': '24', 'source': 'pEM113'}
{'part number': 1, 'notes': 'Re Gibson primer on pEM113 to extract CmR resistance and pSC101 backbone and T7 promoter and term.', 'direction': 're primer', 'construct': '24', 'source': 'pEM113'}
{'part number': 2, 'notes': 'Fw Gibson primer on BBa_K274100 to extract crtEBI operon', 'direction': 'fw primer', 'construct': '24', 'source': 'BBa_K274100'}
{'part number': 2, 'notes': 'Re Gibson primer on BBa_K274100 to extract crtEBI operon', 'direction': 're primer', 'construct': '24', 'source': 'BBa_K274100'}
{'part number': 1, 'notes': 'Fw Gibson primer on pEM114 to extract CmR resistance and pSC101 backbone and K1F promoter and term.', 'direction': 'fw primer', 'construct': '25', 'source': 'pEM114'}
{'part number': 1, 'notes': 'Re Gibson primer on pEM114 to extract CmR resistance and pSC101 backbone and K1F promoter and term.', 'direction': 're primer', 'construct': '25', 'source': 'pEM114'}

我有另一个包含以下信息的字典列表(称为“引物序列”):

{'Part Number': '1', 'Construct Number': '24', 'Direction': 're primer', 'Primer Sequence': 'agaccgtcatctagtacctcTCTCCCTATAGTGAGTCGTATTACTCTAGAAGCGGCCGCg'}
{'Part Number': '1', 'Construct Number': '24', 'Direction': 'fw primer', 'Primer Sequence': 'tggaggatctgatataataaTAGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGG'}
{'Part Number': '2', 'Construct Number': '24', 'Direction': 'fw primer', 'Primer Sequence': 'TACGACTCACTATAGGGAGAgaggtactagatgacggtctgcgcaaaaaaacacgttcat'}
{'Part Number': '2', 'Construct Number': '24', 'Direction': 're primer', 'Primer Sequence': 'GGCCCCAAGGGGTTATGCTAttattatatcagatcctccagcatcaaacctgctgtcgct'}
{'Part Number': '1', 'Construct Number': '25', 'Direction': 're primer', 'Primer Sequence': 'agaccgtcatctagtacctcTCTCCCTATAGTGATAGTTATTACTCTAGAAGCGGCCGCg'}
{'Part Number': '1', 'Construct Number': '25', 'Direction': 'fw primer', 'Primer Sequence': 'tggaggatctgatataataaTAGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGG'}
{'Part Number': '2', 'Construct Number': '25', 'Direction': 'fw primer', 'Primer Sequence': 'TAACTATCACTATAGGGAGAgaggtactagatgacggtctgcgcaaaaaaacacgttcat'}
{'Part Number': '2', 'Construct Number': '25', 'Direction': 're primer', 'Primer Sequence': 'GGCCCCAAGGGGTTATGCTAttattatatcagatcctccagcatcaaacctgctgtcgct'}

我的目标是结合两者中包含的信息,这样我得到一个输出,其中包含底部列表中每个引物(fw 或 re)的部件号、构造编号、方向、引物序列、注释、构造和来源。为了将“引物名称”与“引物序列”匹配,我必须检查以确保它们的“部件号”、“结构号”和“方向”都相同。

我尝试了以下代码来检查这一点,但它似乎不起作用:

for row in primers_names_list: #recall that primers_names_list is a list of dictionaries
    if any({x['Part Number'], x['Construct Number'], x['Direction']} == {row['part number'], row['construct number'], row['direction']} for x in primers_without_names):
        primers_with_names.append({'part number':row['part number'], 'construct number':row['construct number'], 'notes':row['notes'], 'primer sequence':x['Primer Sequence']})

任何人都可以提供关于我如何做到这一点的提示吗?

非常感谢!

4

2 回答 2

3

两个问题:

  1. part number引物名称int中的一个和引物序列中的一个。为了比较产量,您必须将 the 转换为 a (使用str(val))或转换为 int (使用int(val)strTrueintstrstr

  2. 您在循环中使用的键名会引发KeyError异常,因为它们不正确(注意引物序列Construct Number并且引物名称construct

这是一个工作代码示例:

primers_names_list = [
{'part number': 1, 'notes': 'Fw Gibson primer on pEM113 to extract CmR resistance and pSC101 backbone and T7 promoter and term.', 'direction': 'fw primer', 'construct': '24', 'source': 'pEM113'},
{'part number': 1, 'notes': 'Re Gibson primer on pEM113 to extract CmR resistance and pSC101 backbone and T7 promoter and term.', 'direction': 're primer', 'construct': '24', 'source': 'pEM113'},
{'part number': 2, 'notes': 'Fw Gibson primer on BBa_K274100 to extract crtEBI operon', 'direction': 'fw primer', 'construct': '24', 'source': 'BBa_K274100'},
{'part number': 2, 'notes': 'Re Gibson primer on BBa_K274100 to extract crtEBI operon', 'direction': 're primer', 'construct': '24', 'source': 'BBa_K274100'},
{'part number': 1, 'notes': 'Fw Gibson primer on pEM114 to extract CmR resistance and pSC101 backbone and K1F promoter and term.', 'direction': 'fw primer', 'construct': '25', 'source': 'pEM114'},
{'part number': 1, 'notes': 'Re Gibson primer on pEM114 to extract CmR resistance and pSC101 backbone and K1F promoter and term.', 'direction': 're primer', 'construct': '25', 'source': 'pEM114'},
]

primers_without_names = [
{'Part Number': '1', 'Construct Number': '24', 'Direction': 're primer', 'Primer Sequence': 'agaccgtcatctagtacctcTCTCCCTATAGTGAGTCGTATTACTCTAGAAGCGGCCGCg'},
{'Part Number': '1', 'Construct Number': '24', 'Direction': 'fw primer', 'Primer Sequence': 'tggaggatctgatataataaTAGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGG'},
{'Part Number': '2', 'Construct Number': '24', 'Direction': 'fw primer', 'Primer Sequence': 'TACGACTCACTATAGGGAGAgaggtactagatgacggtctgcgcaaaaaaacacgttcat'},
{'Part Number': '2', 'Construct Number': '24', 'Direction': 're primer', 'Primer Sequence': 'GGCCCCAAGGGGTTATGCTAttattatatcagatcctccagcatcaaacctgctgtcgct'},
{'Part Number': '1', 'Construct Number': '25', 'Direction': 're primer', 'Primer Sequence': 'agaccgtcatctagtacctcTCTCCCTATAGTGATAGTTATTACTCTAGAAGCGGCCGCg'},
{'Part Number': '1', 'Construct Number': '25', 'Direction': 'fw primer', 'Primer Sequence': 'tggaggatctgatataataaTAGCATAACCCCTTGGGGCCTCTAAACGGGTCTTGAGGGG'},
{'Part Number': '2', 'Construct Number': '25', 'Direction': 'fw primer', 'Primer Sequence': 'TAACTATCACTATAGGGAGAgaggtactagatgacggtctgcgcaaaaaaacacgttcat'},
{'Part Number': '2', 'Construct Number': '25', 'Direction': 're primer', 'Primer Sequence': 'GGCCCCAAGGGGTTATGCTAttattatatcagatcctccagcatcaaacctgctgtcgct'},
]


primers_with_names = []
for row in primers_names_list: #recall that primers_names_list is a list of dictionaries
    for x in primers_without_names:
        if (
            int(x['Part Number']) == row['part number'] and
            x['Construct Number'] == row['construct'] and
            x['Direction'] == row['direction']
        ):
            primers_with_names.append(
                {
                    'part number': row['part number'], 
                    'construct number': row['construct'], 
                    'notes': row['notes'], 
                    'primer sequence':x['Primer Sequence']
                }
            )
            # If you are only expecting one match from the primers_without_names
            # collection, or wish to enforce that, you can add a break statement after
            # the insertion here to break out of the inner comparison loop and move on
            # to the next row item


for p in primers_with_names:
    print p

print
print len(primers_with_names)

编辑:另一个选项,如果每个集合中的每一行的比较值都是唯一的,并且如果您有足够的内存并且不介意预处理列表,则将两个集合转换为字典,键入(part number , 构造数, 方向)元组。这将以后的查找工作减少到每行摊销 O(1)。总的来说,你会得到 O(3N) 而不是 O(N^2),这对于大型集合来说非常好。

# convert both lists to dictionaries
primers_names_dict = { 
    (str(p['part number']), str(p['construct']), str(p['direction'])): p
    for p in primers_names_list 
}
primers_sequence_dict = {
    (str(p['Part Number']), str(p['Construct Number']), str(p['Direction'])): p
    for p in primers_without_names
}


# now that we have two dicts, we can do a key<->key match between them, so each
# comparison op is just a dictionary key lookup, which is O(1) on average
matches = []
for key in primers_names_dict.keys():
    if key in primers_sequence_dict: # amortized O(1) lookup
        matches.append( {
            'part number': primers_names_dict[key]['part number'], 
            'construct number': primers_names_dict[key]['construct'], 
            'notes': primers_names_dict[key]['notes'], 
            'primer sequence': primers_sequence_dict[key]['Primer Sequence']
        } )

for m in matches:
    print m
print len(matches)
于 2013-01-23T20:24:28.973 回答
1

我在这里看到两个问题。

  1. 一个字典中的 Part Number 是一个整数,另一个字典中的 Part Number 是一个字符串。

  2. 你把row['construct number']我认为应该在的地方row['construct']

这里是固定的:

for row in primers_names_list: #recall that primers_names_list is a list of dictionaries
    for x in primers_without_names:
        if {x['Part Number'], x['Construct Number'], x['Direction']} == {str(row['part number']), row['construct'], row['direction']}:
            primers_with_names.append({'part number':row['part number'], 'construct number':row['construct'], 'notes':row['notes'], 'primer sequence':x['Primer Sequence']})
于 2013-01-23T20:25:30.003 回答