0

我正在尝试编写一个脚本,该脚本将获取 10 个数据文件,逐行读取它们,制作文件中第一个(基因)和第三个(值)项的字典,然后将所有这些组合到一个输出文件中第 12 列显示 10 次重复的平均值。(我将对几个不同的样本执行此操作,每个样本有 10 个重复。)我正在尝试使用 for 循环和 range 函数来自动生成文件名、字典名等,因为我很确定这是一个更清洁的完成这项工作的方法比写出代码来制作字典 10 倍。但是,我是编码新手,我在代码的第 11 行('gene{0}key'.format(i) = [])不断收到“SyntaxError:无法分配给函数调用”,我确信它将对结构相同的其他几行代码执行相同的操作。如果有人知道我做错了什么,我很想知道它是什么!先感谢您!

#!/usr/bin/env ipython

import re
import itertools
import numpy

sample = raw_input('sample: ')

for i in range(10):
    filename = sample+'_rand{0}.genes.tajD'.format(i)
    'gene{0}key'.format(i) = []
    'taj{0}value'.format(i) = []
    with open(filename, 'r') as data:
        for line in data:
            line = line.strip().split('\t')
            gene, taj = line[0], line[3]
            'gene{0}key'.format(i).append(gene)
            'taj{0}value'.format(i).append(taj)
    'dic{0}'.format(i) = dict(itertools.izip('gene{0}key'.format(i),'taj{0}value'.format(i)))

outfilename = sample+'_tajD.genes.all'

with open(sample+'_rand0.genes.tajD', 'r') as genes, open(outfilename, 'w') as outfile :
    for line in genes :
        line = line.strip().split('\t')
        mastergene = line[0]
        for i in range(10):
            'value{0}'.format(i) = 'dic{0}'.format(i)[mastergene]
        allrand = [value0, value1, value2, value3, value4, value5, value6, value7, value8, value9]
        avg = numpy.mean(allrand)
        outfile.write(mastergene + '\t' + value0 + '\t' + value1 + '\t' + value2 +'\t' + value3 + '\t' + value4 + '\t' + value5 + '\t' + value6 + '\t' + value7 + '\t' + value8 + '\t' + value9 + '\t' + avg + '\n')
4

1 回答 1

3
'gene{0}key'.format(i) = []

这是新编码人员尝试使用的典型反模式。您想创建十个名称gene1key为 、gene2keygene3key等的变量。执行此操作的正确方法是使用列表或字典,然后将i其作为键进行索引。重写代码以使其在语法上正确的最简单方法是:

genekeys = {}
tajvalues = {}
dictionaries = {}

for i in range(10):
    filename = sample+'_rand{0}.genes.tajD'.format(i)
    genekeys[i] = []
    tajvalues[i] = []
    with open(filename, 'r') as data:
        for line in data:
            line = line.strip().split('\t')
            gene, taj = line[0], line[3]
            genekeys[i].append(gene)
            tajvalues[i].append(taj)
    dictionaries[i] = dict(itertools.izip(genekeys[i],tajvalues[i]))


with open(sample+'_rand0.genes.tajD', 'r') as genes, open(outfilename, 'w') as outfile :
    for line in genes :
        values = {}
        line = line.strip().split('\t')
        mastergene = line[0]
        for i in range(10):
            values[i] = dictionaries[i][mastergene]
        allrand = [values[0], values[1], values[2], values[3], values[4], values[5], values[6], values[7], values[8], values[9]]
        avg = numpy.mean(allrand)
        outfile.write(mastergene + '\t' + values[0] + '\t' + values[1] + '\t' + values[2] + '\t' + values[3] + '\t' + values[4] + '\t' + values[5] + '\t' + values[6] + '\t' + values[7] + '\t' + values[8] + '\t' + values[9] + '\t' + avg + '\n')

代码可以进一步简化。例如,由于genekeys 和tajvalues 仅用于创建dictionaries,因此您可以直接填充该dict,而无需存储中间值。您可以values列出一个列表,因此您不必在numpy.mean和期间明确解决每个问题outfile.write

dictionaries = {}
for i in range(10):
    filename = sample+'_rand{0}.genes.tajD'.format(i)
    dictionaries[i] = {}
    with open(filename, 'r') as data:
        for line in data:
            line = line.strip().split('\t')
            gene, taj = line[0], line[3]
            dictionaries[i][gene] = taj

with open(sample+'_rand0.genes.tajD', 'r') as genes, open(outfilename, 'w') as outfile :
    for line in genes :
        line = line.strip().split('\t')
        mastergene = line[0]
        values = []
        for i in range(10):
            values.append(dictionaries[i][mastergene])
        #alternative to the above for loop: values = [dictionaries[i][mastergene] for i in range 10]
        avg = numpy.mean(values)
        outfile.write(mastergene + '\t' + '\t'.join(values) + '\t' + avg + '\n')
于 2013-10-04T17:43:28.630 回答