0

我试过这个不起作用的双循环。(见下文。)

基本上,我有一个构造列表和一个引物列表。引物通过“构建体编号”和“部件编号”与构建体相关联。(每个构造都由多个部分组成。)对于每个部分,都有一个“正向”和“反向”引物。对于那些倾向于分子生物学的 SO 成员,我基本上是在编写一个脚本来帮助我进行 PCR。

我想要做的是:我想在引物列表中搜索应该与构造部分相关联的引物,并将它们连接到一个主列表中。例如,如果我有一个包含 EMP792 (fw) 和 EMP793 (re) 的列表(它们位于不同的行上),并且它们与我的构造列表中构造 #1 的部分 #2 相关联,我希望能够在“primers_list”中搜索相应的 fw 和 re 引物。如果构造的部分在列表中没有关联的引物,我想先跳过这些构造。

我使用的策略是这样的:我做了一个嵌套的 for 循环。对于构造列表中的每个构造,我希望它在引物列表中搜索 fw 和 re 引物。我知道这是低效的,但作为一个初学者程序员,这是我能想出的唯一方法。我包括了一些条件来检查这些结构是否存在引物,方法是检查与引物相关的结构编号和部件号。

我面临的问题是:对于列表中的每个构造,循环不会搜索整个primer_list。它似乎会自动跳过之前比较的所有引物,只比较下一个尚未比较的引物。这会导致处理过程中出现问题,如果您使用关联的数据集(我也粘贴在代码下方)运行代码,您会发现应该打印出相关引物的构造没有其关联底漆,这让我很头疼,试图弄清楚出了什么问题(哈哈,哈哈......)!

我会很感激任何帮助!

代码:

with open('constructs-to-make-shortened2.csv', 'rU') as constructs:
    construct_list = csv.DictReader(constructs)

    with open('primers-with-notes-names.csv', 'rU') as primers:
        primers_list = csv.DictReader(primers)

        #make list of constructs for checking later on#
##        construct_numbers_list = []
##        for row in primers_list:
##            construct_numbers_list.append(row['construct number'])
##
##        print(construct_numbers_list)


        for construct in construct_list:
##            print('Currently at construct number ' + construct['Construct'])
##            print('Construct counter at ' + str(construct_counter))
##            print('Part number counter is at ' + str(part_number))
            master_row = {}
            master_row['construct'] = construct['Construct']
            master_row['strategy'] = construct['Strategy']
            master_row['construct name'] = construct['Construct Name']
            master_row['sequence'] = construct['Sequence']
            master_row['source'] = construct['Source']
            master_row['content'] = construct['Content']


            print('We are at construct number ' + str(construct['Construct']))
            print('Construct counter is at ' + str(construct_counter))
            is_next_construct = (int(construct['Construct']) > construct_counter)
            print('Are we at the next construct?')
            print(is_next_construct)

            if is_next_construct:
                part_number = 1
                construct_counter = int(construct['Construct'])
            print('Part number is now ' + str(part_number))

            for primer in primers_list:
                print(primer)


##                    print('Is primer ' + str(primer['name']) + ' associated with the construct?')
                is_associated_with_construct = bool(primer['construct number'] == construct['Construct'] and str(primer['part number']) == str(part_number))
##                    print(is_associated_with_construct)
                if(is_associated_with_construct == False):
                    break

                is_forward = bool(primer['construct number'] == construct['Construct'] and str(primer['part number']) == str(part_number) and primer['direction'] == 'fw primer')

                print('Primer ' + str(primer['name']) + ' is a forward primer?')
                print(is_forward)

                is_reverse = bool(primer['construct number'] == construct['Construct'] and str(primer['part number']) == str(part_number) and primer['direction'] == 're primer')

                print('Primer ' + str(primer['name']) + ' is a reverse primer?')
                print(is_reverse)

                if is_forward:
                    master_row['primer1'] = primer['name']
                    master_row['primer1 sequence'] = primer['primer sequence']
                    master_row['primer1 description'] = primer['notes']
                    master_row['primer1 length'] = primer['length']
##                        print(master_row)
                    continue

                elif is_reverse:
                    master_row['primer2'] = primer['name']
                    master_row['primer2 sequence'] = primer['primer sequence']
                    master_row['primer2 description'] = primer['notes']
                    master_row['primer2 length'] = primer['length']
##                        print(master_row)
                    part_number += 1
                    print('Part number now = ' + str(part_number) + '\n')
                    master_list.append(master_row)
                    break

DATA SUBSET(构造)(精确序列被消除以保持在 SO 字符限制内):

{'Sequence': '', 'Construct': '12', 'Strategy': 'Gibson', 'Content': 'Amp resistance marker', 'Source': 'pEM096', 'Construct Name': 'T7 RNAP core on BAC ori only with AmpR'}
{'Sequence': '', 'Construct': '12', 'Strategy': 'Gibson', 'Content': 'BAC origin and T7 RNAP core', 'Source': 'THSS301', 'Construct Name': 'T7 RNAP core on BAC ori only with AmpR'}
{'Sequence': '', 'Construct': '13', 'Strategy': 'Cut Gibson', 'Content': 'lycopene pathway (crtE.B.I.dxs.idi)', 'Source': 'KT-537', 'Construct Name': 'Combined vio and lyc plasmid'}
{'Sequence': '', 'Construct': '13', 'Strategy': 'Cut Gibson', 'Content': 'vioABE pathway and pSC101 ori and CmR;  digest with EcoRI and XbaI', 'Source': 'KT-587', 'Construct Name': 'Combined vio and lyc plasmid'}
{'Sequence': '', 'Construct': '14', 'Strategy': 'Cut Gibson', 'Content': 'lycopene pathway (crtE.B.I.dxs.idi)', 'Source': 'KT-537', 'Construct Name': 'Combined vio and lyc plasmid, with lyc in reverse direction'}
{'Sequence': '', 'Construct': '14', 'Strategy': 'Cut Gibson', 'Content': 'vioABE pathway and pSC101 ori and CmR;  digest with EcoRI and XbaI', 'Source': 'KT-587', 'Construct Name': 'Combined vio and lyc plasmid, with lyc in reverse direction'}
{'Sequence': '', 'Construct': '15', 'Strategy': 'Gibson', 'Content': 'vioABE pathway with random nucleotide spacers', 'Source': 'KT-587', 'Construct Name': 'Combined vio and lyc plasmid made by high GC polymerase'}
{'Sequence': '', 'Construct': '15', 'Strategy': 'Gibson', 'Content': 'lycopene pathway (crtE.B.I.dxs.idi)', 'Source': 'KT-537', 'Construct Name': 'Combined vio and lyc plasmid made by high GC polymerase'}
{'Sequence': '', 'Construct': '15', 'Strategy': 'Gibson', 'Content': 'pSC101 origin of replication and CmR resistance marker', 'Source': 'KT-537', 'Construct Name': 'Combined vio and lyc plasmid made by high GC polymerase'}
{'Sequence': '', 'Construct': '16', 'Strategy': 'Gibson', 'Content': 'P(tac)-SynZip18-T7 fragment', 'Source': 'THSS303', 'Construct Name': 'P(tac)-T7 fragment controller'}
{'Sequence': '', 'Construct': '16', 'Strategy': 'Gibson', 'Content': 'IncW backbone and TpR resistance and lacIq', 'Source': 'pEM103', 'Construct Name': 'P(tac)-T7 fragment controller'}
{'Sequence': '', 'Construct': '17', 'Strategy': 'Gibson', 'Content': 'P(tac)-SynZip18-T3 fragment', 'Source': 'THSS304', 'Construct Name': 'P(tac)-T3 fragment controller'}
{'Sequence': '', 'Construct': '17', 'Strategy': 'Gibson', 'Content': 'IncW backbone and TpR resistance and lacIq', 'Source': 'pEM103', 'Construct Name': 'P(tac)-T3 fragment controller'}

数据子集(引物):

{'part number': '1', 'direction': 'fw primer', 'name': 'EMP790', 'primer sequence': 'gtttgtcggtgaactaattCttattaccaatgcttaatcagggaggcacctatctcagcg', 'notes': 'Fw Gibson primer on pEM096 to extract Amp resistance marker', 'length': '60', 'construct number': '12'}
{'part number': '1', 'direction': 're primer', 'name': 'EMP787', 'primer sequence': 'gatgaggatcgtttcgcatgctaaatacattcaaatatctatccgctcatgagacaataa', 'notes': 'Re Gibson primer on pEM096 to extract Amp resistance marker', 'length': '60', 'construct number': '12'}
{'part number': '2', 'direction': 'fw primer', 'name': 'EMP788', 'primer sequence': 'agatatttgaatgtatttagcatgcgaaacgatcctcatcctgtctcttgatcagatctt', 'notes': 'Fw Gibson primer on THSS301 to extract BAC and R6K origins and T7 RNAP core', 'length': '60', 'construct number': '12'}
{'part number': '2', 'direction': 're primer', 'name': 'EMP791', 'primer sequence': 'tgattaagcattggtaataaGaattagttcaccgacaaacaacagataaaacgaaaggcc', 'notes': 'Re Gibson primer on THSS301 to extract BAC origin and T7 RNAP core', 'length': '60', 'construct number': '12'}
{'part number': '1', 'direction': 'fw primer', 'name': 'EMP792', 'primer sequence': 'aaggaatattcagcaatttgGTTGGGGATAGCGCTAGCTATAATAactaTCACTATAGGG', 'notes': 'Fw Gibson primer on KT-587 to extract vioABE pathway with random nucleotide spacers', 'length': '60', 'construct number': '15'}
{'part number': '1', 'direction': 're primer', 'name': 'EMP793', 'primer sequence': 'gggcctttcttcggcacgggGTTGTAGCAGGCGTCTTTGTCAAAAAACCCCTCAAGACCC', 'notes': 'Re Gibson primer on KT-587 to extract vioABE pathway with random nucleotide spacers', 'length': '60', 'construct number': '15'}
{'part number': '2', 'direction': 'fw primer', 'name': 'EMP794', 'primer sequence': 'ACAAAGACGCCTGCTACAACcccgtgccgaagaaaggcccacccgtgaaggtgagccagt', 'notes': 'Fw Gibson primer on KT-537 to extract lycopene pathway (crtE.B.I.dxs.idi)', 'length': '60', 'construct number': '15'}
{'part number': '2', 'direction': 're primer', 'name': 'EMP795', 'primer sequence': 'gaggtcattactggatctaTcccgtgccgaagaaaggcccacccgtgaaggtgagccagt', 'notes': 'Re Gibson primer on KT-537 to extract lycopene pathway (crtE.B.I.dxs.idi)', 'length': '60', 'construct number': '15'}
{'part number': '3', 'direction': 'fw primer', 'name': 'EMP796', 'primer sequence': 'gggcctttcttcggcacgggAtagatccagtaatgacctcagaactccatctggatttgt', 'notes': 'Fw Gibson primer on KT-537 to extract pSC101 origin of replication and CmR resistance marker', 'length': '60', 'construct number': '15'}
{'part number': '3', 'direction': 're primer', 'name': 'EMP797', 'primer sequence': 'TAGCTAGCGCTATCCCCAACcaaattgctgaatattccttttcttagacgtcaggtggca', 'notes': 'Re Gibson primer on KT-537 to extract pSC101 origin of replication and CmR resistance marker', 'length': '60', 'construct number': '15'}
{'part number': '1', 'direction': 'fw primer', 'name': 'EMP798', 'primer sequence': 'aaatattctgaaatgagctgttgacaattaatcatcggctcgtataatgtgtggaattgt', 'notes': 'Fw Gibson primer on THSS303 to extract P(tac)-SynZip18-T7 fragment', 'length': '60', 'construct number': '16'}
{'part number': '1', 'direction': 're primer', 'name': 'EMP799', 'primer sequence': 'attaccgcctttgagtgagccccaatgataaccccaagggaagttttagtcaaaagcctc', 'notes': 'Re Gibson primer on THSS303 to extract P(tac)-SynZip18-T7 fragment', 'length': '60', 'construct number': '16'}
{'part number': '2', 'direction': 'fw primer', 'name': 'EMP800', 'primer sequence': 'cccttggggttatcattggggctcactcaaaggcggtaatcagataaaaaaaatccttag', 'notes': 'Fw Gibson primer on pEM103 to extract IncW backbone and TpR resistance and lacIq', 'length': '60', 'construct number': '16'}
{'part number': '2', 'direction': 're primer', 'name': 'EMP801', 'primer sequence': 'agccgatgattaattgtcaacagctcatttcagaatatttgccagaaccgttatgatgtc', 'notes': 'Re Gibson primer on pEM103 to extract IncW backbone and TpR resistance and lacIq', 'length': '60', 'construct number': '16'}
{'part number': '1', 'direction': 'fw primer', 'name': 'EMP798', 'primer sequence': 'aaatattctgaaatgagctgttgacaattaatcatcggctcgtataatgtgtggaattgt', 'notes': 'Fw Gibson primer on THSS303 to extract P(tac)-SynZip18-T7 fragment', 'length': '60', 'construct number': '17'}
{'part number': '1', 'direction': 're primer', 'name': 'EMP799', 'primer sequence': 'attaccgcctttgagtgagccccaatgataaccccaagggaagttttagtcaaaagcctc', 'notes': 'Re Gibson primer on THSS303 to extract P(tac)-SynZip18-T7 fragment', 'length': '60', 'construct number': '17'}
{'part number': '2', 'direction': 'fw primer', 'name': 'EMP800', 'primer sequence': 'cccttggggttatcattggggctcactcaaaggcggtaatcagataaaaaaaatccttag', 'notes': 'Fw Gibson primer on pEM103 to extract IncW backbone and TpR resistance and lacIq', 'length': '60', 'construct number': '17'}
{'part number': '2', 'direction': 're primer', 'name': 'EMP801', 'primer sequence': 'agccgatgattaattgtcaacagctcatttcagaatatttgccagaaccgttatgatgtc', 'notes': 'Re Gibson primer on pEM103 to extract IncW backbone and TpR resistance and lacIq', 'length': '60', 'construct number': '17'}
4

1 回答 1

4

问题是您正在迭代一个csv.DictReader对象,该对象不是一个列表,而是一个迭代器。

两者的区别在于,使用迭代器,您不能“回到开头”。内部循环的每一步,您的迭代都primer_list从上次停止的地方开始。

如果您希望能够多次迭代所有项目并且如果您有足够的内存,请将它们存储在列表中:

primers_list = list(csv.DictReader(primers))

如果您想保持较低的内存使用率,您可以DictReader每次在循环内从头开始创建对象。但是,这会在执行时间上增加一些(可能很小的)开销,您应该通过将with语句移入循环来关闭文件。

另一种方法是primers.seek(0)在循环体的末尾执行,以便它在下一次迭代时从文件的开头开始读取,但我不确定这是否是一个好的 hack。

于 2013-01-26T19:42:48.893 回答