我正在尝试从多个打开的文件创建列表,但遇到了一些问题。我需要为每个文件创建两个单独的列表,现在我的代码只为最后一个迭代的文件创建两个列表。建议修复并为“file_list”中的每个文件创建唯一的“sample_genes”和“sample_values”?
或者,为所有文件中的“gene_names”和所有文件中的“sample_values”创建一个统一列表也可以。
# Parse csv files for samples, creating lists of gene names and expression values.
file_list = ['CRPC_278.csv', 'PCaP_470.csv', 'CRPC_543.csv', 'PCaN_5934.csv', 'PCaN_6102.csv', 'PCaP_17163.csv']
des_list = ['a', 'b', 'c', 'd', 'e', 'f']
for idx, (f_in, des) in enumerate(zip(file_list, des_list)):
with open(f_in) as des:
cread = list(csv.reader(des, delimiter = '\t'))
sample_genes = [i for i, j in (sorted([x for x in {i: float(j)
for i, j in cread}.items()], key = lambda v: v[1]))]
sample_values = [j for i, j in (sorted([x for x in {i: float(j)
for i, j in cread}.items()], key = lambda v: v[1]))]
# Compute row means.
mean_values = [((a + b + c + d + e + f)/len(file_list)) for i, (a, b, c, d, e, f) in enumerate(zip(sample_1_values, sample_2_values, sample_3_values, sample_4_values, sample_5_values, sample_6_values))]
# Provide proper gene names for mean values and replace original data values by corresponding means.
sample_genes_list = [i for i in sample_1_genes, sample_2_genes, sample_3_genes, sample_4_genes, sample_5_genes, sample_6_genes]
sample_final_list = [sorted(zip(sg, mean_values)) for sg in sample_genes_list]
下面的新代码:
# Parse csv files for samples, creating lists of gene names and expression values.
file_list = ['CRPC_278.csv', 'PCaP_470.csv', 'CRPC_543.csv', 'PCaN_5934.csv', 'PCaN_6102.csv', 'PCaP_17163.csv']
full_dict = {}
for path in file_list:
with open(path) as stream:
data = list(csv.reader(stream, delimiter = '\t'))
data = sorted([(i, float(j)) for i, j in data], key = lambda v: v[1])
sample_genes = [i for i, j in data]
sample_values = [j for i, j in data]
full_dict[path] = (sample_genes, sample_values)
在字典中解包字典的结果显示了一些深层嵌套结构:
for key in full_dict:
value = full_dict[key]
for key in full_dict[key]:
for idx, items in enumerate(key):
print idx