我有一个fasta文件。从该文件中,我需要获取在序列的末尾和/或开头包含GTACAGTAGG
and的唯一序列,并将它们放入新的 fasta 文件中。CAACGGTTTTGCC
所以这里有一个例子:
>m121012_054644_42133_c100390582550000001523038311021245_s1_p0/7/2516_3269
***GTACAGTAGG***GTACACACAGAACGCGACAAGGCCAGGCGCTGGAGGAACTCCAGCAGCTAGATGCAAGCGACTA
TCAGAGCGTTGGGTCCAGAACGAAGAACAGTCACTCAAGACTGCTTT***CAACGGTTTTGCC***
>m121012_054644_42133_c100390582550000001523038311021245_s1_p0/7/3312_3597
CGCGGCATCGAATTAATACGACTCACTATAGGTTTTTTTATTGGTATTTTCAGTTAGATTCTTTCTTCTTAGAGGGTACA
GAGAAAGGGAGAAAATAGCTACAGACATGGGAGTGAAAGGTAGGAAGAAGAGCGAAGCAGACATTATTCA
>m121012_054644_42133_c100390582550000001523038311021245_s1_p0/7/3708_4657
***CAACGGTTTTGCC***ACAAGATCAGGAACATAAGTCACCAGACTCAATTCATCCCCATAAGACCTCGGACCTCTCA
ATCCTCGAATTAGGATGTTCTCCCCATGGCGTACGGTCTATCAGTATATAAACCTGACATACTATAAAAAAGTATACCAT
TCTTATCATGTACAGTAGG***GTACAGTAGG***
>m121012_054644_42133_c100390582550000001523038311021245_s1_p0/7/4704_5021
***GTACAGTAGG***GTGGGAGAGATGGCAGAAAGGCAGAAAGGAGAAAGATTCAGGATAACTCTCCTGGAGGGGCGAG
GTGCCATTCCCTGTGGTCACTTATTCTAAAGGCCCCAACCCTTCAAC***CAACGGTTTTGCC***
>m121012_054644_42133_c100390582550000001523038311021245_s1_p0/8/4223_4358
AAATATTGGGTCAAAGAACCGTTACTTTTCTTATATATGCGGCGCGAGGTTTTATATACTGATAAGAACCTACGCCATGG
GACATCTAATTCAGAGGGAAGAAGGTCCATGTCTGTTTGGATGAAATTGAGTCTG
(*
添加用于突出显示)
我需要一些方法来获取在序列的末尾和/或开头包含 GTACAGTAGG 和 CAACGGTTTTGCC 的唯一序列,并将它们放在新的 fasta 文件中。我对此很陌生。我什至不确定是否可以做到。提前感谢您提供的任何帮助。