1

我有一个 fastq 文件,其中包含我所有的序列堆叠,这是双端测序的结果。我需要将它们分成两个文件,所有反向序列都在一个文件中,而正向序列在第二个文件中。因此,我需要读取前四行并将它们写入文件“R”,然后读取接下来的四行并将它们写入文件“F”。之后,我需要以相同的方式阅读并保存以下几行。我想过这样的事情(下),但没有奏效。有什么帮助吗?请。

R = open("R.fastq","w+")
F = open("F.fastq","w+")

x = raw_input('type the name of the file you wanna split: ')   
with open (x, 'rt') as myfile:   
    for line in myfile:
        R.write (line)
        R.write (line)
        R.write (line)
        R.write (line)
        F.write (line)
        F.write (line)
        F.write (line)
        F.write (line)

R.close()
F.close()
4

4 回答 4

0

这应该这样做:

r = [] # List for the lines to be written into R
f = [] # List for the lines to be written into F

with open('text.txt','r') as myfile: # Open the original file 
    lines = myfile.readlines() # and store each line inside a list called lines

index = 0 # Index of the line

while index <= len(lines)-1:

    for n in range(4):
        if index <= len(lines)-1:
            r.append(lines[index]) # Append line to r
            index+=1

    for n in range(4):
        if index <= len(lines)-1:
            f.append(lines[index]) # Append line to f
            index+=1


with open('file1.txt','w') as R:
    for line in r:
        R.write(line) # Write each line from r into R

with open('file2.txt','w') as F:
    for line in f:
        F.write(line) # Write each line from f into F
于 2020-05-24T00:54:41.940 回答
0

这被称为“去交织”一个交织的 FASTQ。如果你用谷歌搜索,你会发现任何数量的预制解决方案,包括包的reformat命令BBmap/BBtoolshttp://seqanswers.com/forums/showthread.php?t=46174

于 2020-05-24T21:50:37.567 回答
0

我认为这会做你想做的——至少它似乎是我自己创建的一个测试文件。

它使用我命名的生成器函数grouper()将输入文件中的行分成 4 组,然后将它们输出到 2 个输出文件之一。它通过使用内置enumerate()函数对正在处理的组进行计数并使用产生模 2 ( % 2) 的计数器来选择其中一个或另一个来确定要使用的输出文件。

from itertools import zip_longest


def grouper(n, iterable):
    """ s -> (s0,s1,...sn-1), (sn,sn+1,...s2n-1), (s2n,s2n+1,...s3n-1), ... """
    FILLER = object()  # Value that couldn't be in data.
    for result in zip_longest(*[iter(iterable)]*n, fillvalue=FILLER):
        yield tuple(v for v in result if v is not FILLER)


input_filename = 'sequences.txt'
output_filename1 = 'R.fastq'
output_filename2 = 'F.fastq'

with open(input_filename) as inp, \
     open(output_filename1, 'w') as outp1, \
     open(output_filename2, 'w') as outp2:

    output_files = outp1, outp2
    for i, group in enumerate(grouper(4, inp)):
        outp = output_files[i % 2]
        for line in group:
            outp.write(line)

print('done')
于 2020-05-24T01:18:19.663 回答
0

您的问题是您在两个文件中写入了四次相同的行,对于循环中的每次迭代,程序无法确定哪一行应该写入哪个文件。试试这个代码,没有文件我无法测试它,但它的理论应该起作用。

这将跟踪它所在的每一行。如果该行是四的倍数,它将递增q,如果q是偶数,它将写入文件 R,如果q是奇数,它将写入文件 F。

R = open("R.fastq","w+") # open file R with write permissions
F = open("F.fastq","w+") #open file q with write permissions

x = raw_input('type the name of the file you wanna split: ')   #input file name
p = 0 #variable to increment, tracking which line you're at
q = 0 #variable to track when to switch files
with open (x, 'rt') as myfile:   #open input file with read permissions
    for line in myfile: # loop through file
        if q%2 == 0: #if q is even
            R.write (line) #write to file R
        elif q%2 == 1: #if q is odd
            F.write (line) #write to file F
        p+=1 #increment tracker to next line
        if p%4 == 0: # if line is a multiple of 4
            q+=1 #increment q to switch files

R.close() #close file R
F.close() #close file F
于 2020-05-24T00:45:16.003 回答