0

我用 python 编写了一个脚本,它适用于单个文件。我找不到让它在多个文件上运行并分别为每个文件提供输出的答案。

out = open('/home/directory/a.out','w')
infile = open('/home/directory/a.sam','r')

for line in infile:
    if not line.startswith('@'):
        samlist = line.strip().split()
        if 'I' or 'D' in samlist[5]:
            match = re.findall(r'(\d+)I', samlist[5]) # remember to chang I and D here aswell
            intlist = [int(x) for x in match]
##            if len(intlist) < 10:
            for indel in intlist:
                if indel >= 10:
##                    print indel
            ###intlist contains lengths of insertions in for each read
            #print intlist
                    read_aln_start = int(samlist[3])
                    indel_positions = []
                    for num1, i_or_d, num2, m in re.findall('(\d+)([ID])(\d+)?([A-Za-z])?', samlist[5]):
                        if num1:
                            read_aln_start += int(num1)
                        if num2:
                            read_aln_start += int(num2)
                        indel_positions.append(read_aln_start)
                #print indel_positions
                    out.write(str(read_aln_start)+'\t'+str(i_or_d) + '\t'+str(samlist[2])+ '\t' + str(indel) +'\n')
out.close()

我希望我的脚本能够获取多个名称为 a.sam、b.sam、c.sam 的文件,并为每个文件提供输出:aout.sam、bout.sam、cout.sam

您能否给我一个解决方案或提示。

问候, 艾瑞克

4

3 回答 3

4

循环文件名。

input_filenames = ['a.sam', 'b.sam', 'c.sam']
output_filenames = ['aout.sam', 'bout.sam', 'cout.sam']
for infn, outfn in zip(input_filenames, output_filenames):
    out = open('/home/directory/{}'.format(outfn), 'w')
    infile = open('/home/directory/{}'.format(infn), 'r')
    ...

更新

以下代码从给定的 input_filenames 生成 output_filenames。

import os

def get_output_filename(fn):
    filename, ext = os.path.splitext(fn)
    return filename + 'out' + ext

input_filenames = ['a.sam', 'b.sam', 'c.sam'] # or glob.glob('*.sam')
output_filenames = map(get_output_filename, input_filenames)
于 2013-07-18T09:30:02.763 回答
1

我建议将该脚本包装在一个函数中,使用def关键字,并将输入和输出文件的名称作为参数传递给该函数。

def do_stuff_with_files(infile, outfile):
    out = open(infile,'w')
    infile = open(outfile,'r')
    # the rest of your script

现在您可以为输入和输出文件名的任意组合调用此函数。

do_stuff_with_files('/home/directory/a.sam', '/home/directory/a.out')

如果要对某个目录中的所有文件glob执行此操作,请使用该库。要生成输出文件名,只需将最后三个字符(“sam”)替换为“out”。

import glob
indir, outdir = '/home/directory/', '/home/directory/out/'
files = glob.glob1(indir, '*.sam')
infiles  = [indir  + f              for f in files]
outfiles = [outdir + f[:-3] + "out" for f in files]
for infile, outfile in zip(infiles, outfiles):
    do_stuff_with_files(infile, outfile)
于 2013-07-18T09:36:41.793 回答
1

以下脚本允许使用输入和输出文件。它将遍历给定目录中扩展名为“.sam”的所有文件,对它们执行指定的操作,并将结果输出到单独的文件中。

Import os
# Define the directory containing the files you are working with
path = '/home/directory'
# Get all the files in that directory with the desired
# extension (in this case ".sam")
files = [f for f in os.listdir(path) if f.endswith('.sam')]
# Loop over the files with that extension
for file in files:
    # Open the input file
    with open(path + '/' + file, 'r') as infile:
        # Open the output file
        with open(path + '/' + file.split('.')[0] + 'out.' +
                               file.split('.')[1], 'a') as outfile:
            # Loop over the lines in the input file
            for line in infile:
                # If a line in the input file can be characterized in a
                # certain way, write a different line to the output file.
                # Otherwise write the original line (from the input file)
                # to the output file
                if line.startswith('Something'):
                    outfile.write('A different kind of something')
                else:
                    outfile.write(line)
    # Note the absence of either a infile.close() or an outfile.close()
    # statement. The with-statement handles that for you
于 2013-07-18T12:55:53.673 回答