我在一个 .fastq 文件中包含 Illumina 双端读取,表示正向读取的“/1”和反向读取的“/2”。
我正在使用 grep 提取单个读取并将它们放入 2 个各自的文件中(一个用于正向读取,一个用于反向读取。
grep -A 3 "/1$" sample21_pe.unmapped.fq > sample21_1_rfa.fq
grep -A 3 "/2$" sample21_pe.unmapped.fq > sample21_2_rfa.fq
但是,当我尝试使用文件(fastqc、程序集等)时,它们不起作用。运行 fastqc 时出现以下错误:
Failed to process file sample21_1_rfa.fq
uk.ac.babraham.FastQC.Sequence.SequenceFormatException: ID line didn't start with '@'
at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:134)
at uk.ac.babraham.FastQC.Sequence.FastQFile.next(FastQFile.java:105)
at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:76)
at java.lang.Thread.run(Thread.java:662)
但是,如果您查看文件,它们的标识符确实以“@”开头。关于为什么这些文件不起作用的任何建议?我最初将 .bam 文件转换为 .fastq 文件
samtools bam2fq
以下是每个单独文件的示例:
- 合并.fastq
@HISEQ:534:CB14TANXX:4:1101:1091:2161/1
GAGAAGCTCGTCCGGCTGGAGAATGTTGCGCTTGCGGTCCGGAGAGGACAGAAATTCGTTGATGTTAACGGTGCGCTCGCCGCGGACGCTCTTGATGGTGACGTCGGCGTTGAGCGTGACGCACG
+
B/</<//B<BFF<FFFFFF/BFFFFFFB<BFFF<B/7FFF7B/B/FF/F/<<F/FFBFFFBBFFFBFB/FF<BBB<B/B//BBFFFFFFF/B/FF/B77B//B7B7F/7F###############
@HISEQ:534:CB14TANXX:4:1101:1091:2161/2
TGACGCCTGCCGTCAGGTAGGTTCTCCGCAGATCCGAAATCTCGCGACGCTCGGCGGCAACATCTGCCAGTCGTCCGTGGCGGGCGACGGTCTCGCGGCGTGCGTCACGCTCAACGCCGACGTAC
+
/B<B//F/F//B<///<FB/</F<<FFFFF<FFBF/FF<//FB/F//F7FBFFFF/B</7<F//<BB7/7BB7/B<F7BF<BFFFB7B#####################################
@HISEQ:534:CB14TANXX:4:1101:1637:2053/1
NGTTTACCATACAACAATCTTGCGACCTATTCAAATCATCTATATGCCTTATCAAGTTTTCATAGCTTTCAAGATTCTCAATTTCCTCACGTCTCGCTTTGCTCAACCTACAAAAACTCTCTTCT
+
#<<BBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FFFFFFFFFFFFFFFFFFFFFFB/BFBBFBB<<<<FFFFFFBB<FBFFBFF
@HISEQ:534:CB14TANXX:4:1101:1637:2053/2
TCGGTCGTTGGGAAAAGACCTGTGGTAAACATCCTACGCAAAAGCCATTGCGGTTACTCGTTCGTATGATTCTTGCATCAACTAATCAAGGCGATTGGGTTCTCGACCCATTTTGTGGAAGTTCG
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFBFFFFFB<FF<<BBFB
@HISEQ:534:CB14TANXX:4:1101:1792:2218/1
TCTATCGGCTGACCGATAAGCTGTCGCCTGCCGACCGTCCTGCCATGGGACGGCGCATCGCACAGCTCACCCTGGACTAACTCTCCAACACCATGATGCTGACACGCTCGGCAAAAACACCCGAT
+
<<B/<B</FF/<B/<//F<//FF<<<FF//</7/F<</FFF####################################################################################
@HISEQ:534:CB14TANXX:4:1101:1792:2218/2
TGCCGGAGGGCGTCGATGGTGGCATCGAGCTTTTTTGCCGAGCGTGTCAGCATGATGGTGTTGTAGAGATAGTCCATGGTGAGCTGTGCGATGCGCCGTTCCATGGCAGGACGGTCGGCAGGCGC
+
BBBBBFFFFFFFFBFFFBBFFFFFFFFFFFBBFFFF/FF<F7FF//F/FBB/FFBFFF/F7BFF<F/FFFFFFFFB/7BB<7BFFFFFFFFFFFFF<B///B/7B/7/B//77BB//7B/B7/B#
@HISEQ:534:CB14TANXX:4:1101:1903:2238/1
TATTCCAGCGACCGTTATAATCAAACTCAACTACATAGTCATTGCGGATTGCTTCAAGAAATTTTTTCCAGACTATTTCATCAATATTTATTTTGGGAACTGGTGCAACAGCAATTCTTTTTAAA
+
BBBBBFFFFFFFFFFFFFBFF/FFBFFBFFFFFFFF/FFFFFF<<FFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFBF/B/<B<B/FBF7/<FFFFFFF/BB/7///7FF<BFFF//B/FFF###
@HISEQ:534:CB14TANXX:4:1101:1903:2238/2
TAAGGTTGGAGAAGCAACAATTTACCGTGATATTGATTTGCTCCGAACATATTTTCATGCGCCACTCGAGTTTGACAGGGAGAAAGGCGGGTATTATTATTTTAATGAAAAATGGGATTTTGCCC
+
B<BBBFFFFFFF<FFFFFFFFFFFFFFFFFF/BFFFFFFF<<FF<F<FFF/FF/FFFFBFB</<//<B/////<<FFFFB/<F<BFF/7/</7/7FB/B/BFF<//7BFF###############
@HISEQ:534:CB14TANXX:4:1101:2107:2125/1
TGTAGTATTTATTACATCATATAGAATGTAGATATAAAAGATGAAAAAGCTATAATTTCTTTGATAATATAAGGAGGGAATAACACTATGAGGATTGATAGAGCAGGAATCGAGGATTTGCCGAT
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FFFFF/FFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBFFFFFFFBB<FBB7BFF#
@HISEQ:534:CB14TANXX:4:1101:2107:2125/2
TACCACTATCGGCAAATCCTCGATTCCTGCTCTATCAATCCTCATAGTGTTATTCCCTCCTTATATTATCAAAGAAATTATAGCTTTTTCATCTTTTATATCTACATTCTATATGATGTAATAAA
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FFFFFFFFFFFFFBFBFFFFFFFBBFFFFFFFBF7F/B/BBF7/</FF/77F/77BB#
@HISEQ:534:CB14TANXX:4:1101:2023:2224/1
TCACCAGCTCGGCACGCTTGTCCTTGACCTCCTGCTCGATCTGACCGTCCATCTTGGCTGCCACGGTGTTCTCCTCGGCGGAGTAGGCAAAGCAGCCCAGACGGTCGAACTGTATCTCCTTGACA
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FFFFFFFFFFFFB<<B7BBFBFFF<FFBBFFFBF/7B/<B<
@HISEQ:534:CB14TANXX:4:1101:2023:2224/2
TCGAGGATCTGTGCAACTTTGTCAAGGAGATACAGTTCGACCGTCTGGGCTGCTTTGCCTACTCCGCCGAGGAGAACACCGTGGCAGCCAAGATGGACGGTCAGATCGAGCAGGAGGTCAAGGAC
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFBFBFFFFFFFFFFFFFFFFFFFFFFFFFBBFFFFFFFFFFFFF<7BF/<<BB###
@HISEQ:534:CB14TANXX:4:1101:2038:2235/1
TTTATGCGAATGTAGAGTGGCTTCTCCACTGCCTCGGTGAAGCCCACGCGCGAGATGAGCGAATTAAGCTGCTTTGCAGTGAATTGCATTGCATATACACCTGCGTCGGCTTGAATACTTGTGCT
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFF//BFFFFFFFFFFFFF<B<BB###
@HISEQ:534:CB14TANXX:4:1101:2038:2235/2
AATCCGCTCGTGAAAGCTCCCGATAACGCCACAGTGAACACCGTGGAGTTCTCTGATACCGAAGATTTCGCACGCAGCACAAGTATTCAAGCCGACGCAGGTGTATATGCAATGCAATTCACTGC
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBFFFFFFFFFFFFFFFFFFFFFFF
@HISEQ:534:CB14TANXX:4:1101:2271:2041/1
NACACTTGTCGATGATCTTGCCAAGCTGCTTCTTGCCCACCAGGAAGCCGATCTCCAGATCAAACTCGTGGCCGGGAACACTCCGGTCCACAAAGCCCAGGTCCTGGGGAATGGGCTCATCGTAG
+
#<</BB/F/BB/F<FFFFFFFFF/<BFFFFFFFF<<FFBFFFFFFBFBFBBB<<FFFFBFFF/<B/FFFFFFFFFFFFFFFFF<FB<<BFF77BFFF/<BFFFB<</BB</7BFFFB########
@HISEQ:534:CB14TANXX:4:1101:2271:2041/2
GACTCATCTACAATGAGCCCATTCCCCAGGACCTGGGCTTTGTGGACCGGAGTGTTCCCGGCCACGAGTTTGATCTGGAGATCGGCTTCCTGGTGGGCAAGAAGCAGCTTGGCAAGATCATCGCC
+
<<BBBFFF<F/BFFFBFBF<BFF<<F/FFFBFFFF<<FFFFBFFFFFFBFFF/<B<F/<</<FFF//FFFFF/<<F/B/B/7/FF<<FF/7B/BBB/7///7////<B/B/BB/B/B/B/7BB##
- 被拉出并放入自己的 .fastq 文件后的正向读取示例:
@HISEQ:534:CB14TANXX:4:1101:1091:2161/1
GAGAAGCTCGTCCGGCTGGAGAATGTTGCGCTTGCGGTCCGGAGAGGACAGAAATTCGTTGATGTTAACGGTGCGCTCGCCGCGGACGCTCTTGATGGTGACGTCGGCGTTGAGCGTGACGCACG
+
B/</<//B<BFF<FFFFFF/BFFFFFFB<BFFF<B/7FFF7B/B/FF/F/<<F/FFBFFFBBFFFBFB/FF<BBB<B/B//BBFFFFFFF/B/FF/B77B//B7B7F/7F###############
--
@HISEQ:534:CB14TANXX:4:1101:1637:2053/1
NGTTTACCATACAACAATCTTGCGACCTATTCAAATCATCTATATGCCTTATCAAGTTTTCATAGCTTTCAAGATTCTCAATTTCCTCACGTCTCGCTTTGCTCAACCTACAAAAACTCTCTTCT
+
#<<BBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FFFFFFFFFFFFFFFFFFFFFFB/BFBBFBB<<<<FFFFFFBB<FBFFBFF
--
@HISEQ:534:CB14TANXX:4:1101:1792:2218/1
TCTATCGGCTGACCGATAAGCTGTCGCCTGCCGACCGTCCTGCCATGGGACGGCGCATCGCACAGCTCACCCTGGACTAACTCTCCAACACCATGATGCTGACACGCTCGGCAAAAACACCCGAT
+
<<B/<B</FF/<B/<//F<//FF<<<FF//</7/F<</FFF####################################################################################
--
@HISEQ:534:CB14TANXX:4:1101:1903:2238/1
TATTCCAGCGACCGTTATAATCAAACTCAACTACATAGTCATTGCGGATTGCTTCAAGAAATTTTTTCCAGACTATTTCATCAATATTTATTTTGGGAACTGGTGCAACAGCAATTCTTTTTAAA
+
BBBBBFFFFFFFFFFFFFBFF/FFBFFBFFFFFFFF/FFFFFF<<FFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFBF/B/<B<B/FBF7/<FFFFFFF/BB/7///7FF<BFFF//B/FFF###
--
@HISEQ:534:CB14TANXX:4:1101:2107:2125/1
TGTAGTATTTATTACATCATATAGAATGTAGATATAAAAGATGAAAAAGCTATAATTTCTTTGATAATATAAGGAGGGAATAACACTATGAGGATTGATAGAGCAGGAATCGAGGATTTGCCGAT
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FFFFF/FFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBFFFFFFFBB<FBB7BFF#
--
@HISEQ:534:CB14TANXX:4:1101:2023:2224/1
TCACCAGCTCGGCACGCTTGTCCTTGACCTCCTGCTCGATCTGACCGTCCATCTTGGCTGCCACGGTGTTCTCCTCGGCGGAGTAGGCAAAGCAGCCCAGACGGTCGAACTGTATCTCCTTGACA
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FFFFFFFFFFFFB<<B7BBFBFFF<FFBBFFFBF/7B/<B<
--
@HISEQ:534:CB14TANXX:4:1101:2038:2235/1
TTTATGCGAATGTAGAGTGGCTTCTCCACTGCCTCGGTGAAGCCCACGCGCGAGATGAGCGAATTAAGCTGCTTTGCAGTGAATTGCATTGCATATACACCTGCGTCGGCTTGAATACTTGTGCT
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFF//BFFFFFFFFFFFFF<B<BB###
--
@HISEQ:534:CB14TANXX:4:1101:2271:2041/1
NACACTTGTCGATGATCTTGCCAAGCTGCTTCTTGCCCACCAGGAAGCCGATCTCCAGATCAAACTCGTGGCCGGGAACACTCCGGTCCACAAAGCCCAGGTCCTGGGGAATGGGCTCATCGTAG
+
#<</BB/F/BB/F<FFFFFFFFF/<BFFFFFFFF<<FFBFFFFFFBFBFBBB<<FFFFBFFF/<B/FFFFFFFFFFFFFFFFF<FB<<BFF77BFFF/<BFFFB<</BB</7BFFFB########
--
@HISEQ:534:CB14TANXX:4:1101:2678:2145/1
CTGTACATAGTACGTATTTGACGCCTGCGTCGATGTAGCGTTTGAGGAAGGGAAGCAGCGGTTCTGCAGAGTCCTCTTTCCATCCGTTGATGCTAATCATTCCGTTGCGTACATCCGCTCCGAGA
+
BBBBBFFFFFFF<FFF<FFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<BFFF7BFFFFFFFF<BBFFFFFFFFBBFBBB<FFBFFFFFFFFFFFFB<BFFFFFFBFB/BFFF####
--
@HISEQ:534:CB14TANXX:4:1101:2972:2114/1
CTCTGTGCCGATCCCTTTGCCTTTGCGTTTTGAGGAAAGGAAACCACCTTCTGGGTCGGTGAGGATAGTTCCGGTGAAGGTGTTGTCCACCGCCAGGCATAGGGAATAGCTGTCAGCCTTTGCTC
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFB/FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFBFFFF<FFFFFFFFFF<BFFFFF
--
@HISEQ:534:CB14TANXX:4:1101:2940:2222/1
CTAATTTTTTCATTATATTACTAATTTTGTAATTGGTAAAATATTATAATATCCTTGTACATTAAGACCCCAATAATCAGAAGAAGTAAAATTAATTCCTGCAACAGTTCTTAAATATCCATTAG
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FBFFFFFFFFFFFFFFF/FBFBFFBFFFFF/<F<FFFFFFFFFF<FFFFFFBFFFFFFFFF</FBFBBF<F/7//FFBFBBFFF/<7BF#
--
@HISEQ:534:CB14TANXX:4:1101:3037:2180/1
CGTCAGTTCCGCAACGATAAAGAGTTCCGCATTGCAGTCACCTGTACGCTGGTAGCCACCGGAACCGATGTCAAGCCGTTGGAGGTGGTGATGTTCATGCGCGACGTAGCTTCCGAGCCGTTATA
+
B/BBBBBFFFFFFF<FFBFFFFFF<FFFFBFFFFFFF<BBFFFFFFFFFFFFFFFFFBFF/FFFFBFFBFFFFBFF/7F/BFB/BBFFFFFFFFBFF<BBF<7BBFFFFFFBBFFF/B#######
--
@HISEQ:534:CB14TANXX:4:1101:3334:2171/1
ACCGATGTACATACCCGGACGGGTACGCACATGCTCCATATCGCTCAAGTGGCGGATATTGTCATCTGTATATTCTACAGGTTGCTCCTGAGGGGTATTTGCCAGTTCTTCGGCAGCACCCTTTT
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFBFFFFFFFFFFFBFFFFFFFFFF</<BFFFFFFFFBBFFFFFFBF</BB///BF<FFFFF<</<B
--
@HISEQ:534:CB14TANXX:4:1101:3452:2185/1
CGCAGACGGATTTGCTTGAAGTCCGTCTCATCGTATTCCGACAACTCATCGAGGAACACACGCTTGTATTGACTGATACCCTTGATTTTCTCCGGGTCGTCAAGACCACTGAAATCAATCTTGCC
+
BBBBBFFFF<FFFFFFFFFFFFFFFFFFFFFFFFFFF<FFFFFBFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFBFFFFFBFFBFFFFFFFFFB/77B/FBBFFF/<FFF/77BBFFFBFFBBB
--
任何意见,将不胜感激。谢谢!