我正在处理双端 BAM 文件,并提出了许多这样的警告:
WARNING: Could not find pair for HWI-ST430:177:2:1:4979:15503#0
WARNING: Could not find pair for HWI-ST430:177:2:1:5127:13427#0
WARNING: Could not find pair for HWI-ST430:177:2:1:6521:21452#0
我检查了 BAM 文件中的警告读取,发现所有警告读取都有三个同名读取。例如:
HWI-ST430:177:2:1:4979:15503#0 65 chr32 26100696 60 79M21S chr5 36697147 0 ACTTTGCAATTTAAGTTTTACTTACTTTTTAACTAATATACATGCCTAAAATTTACAAAAACAATAATAAAAACAACAGAACACTGGAAACATTTTTAAA >;=<>=<<=======<====;===;=======<=>>>>>><=>>==>>>>=>>>>==>?>=<<==>?>>>?>?==><=?>><=<>>>?>?=>??>?===> BD:Z:FFHFCIKKIHG@EEEHF??DGGEDGGE???DEEGGEFFFFGDHHHHGGE??FF?DGDG???EDGFGFGGF@@@FEHFEIEGFEEIJJIHBHGLJDD@EF@ MD:Z:79 PG:Z:MarkDuplicates RG:Z:Basenji BI:Z:FFIECHGIHFEAFEEHEAAFFHDFFHDAAAFEEIHFGGHGGGHHGHHHFBBGFBGGGHBBBFGHGGFGGFBBBGHIGHJGHGHFKJJJJEIKLJGHBGFB NM:i:0 AS:i:79 XS:i:19
HWI-ST430:177:2:1:4979:15503#0 129 chr5 36697147 60 72M28S chr32 26100696 0 ATTTGCCCCTGGGCTATTTTTTTCCTNCCATGTAAGATTCCGTTTTAAAAATGTTTCCAGTGTTCTGTTGTTTTTATTATTGTTTTTGTAAATTTTAGGC ===<=<<<<====<=>========<<!<<<=><<=>>>>>=5=>>>>>>>>>>=>>>==>=>=>>>>=?>=>>>>>>>>=?>=>>>?>>>??>??>;<=> SA:Z:chr32,26100739,-,36M64S,60,0; BD:Z:FFG@JKKFFHIIEHIGFF?????EGGEEEGHHEGEEDGFEGEGF??DE???FHEF?EGGHIFFGFEIFGGFG@@@EGGEGGGFHAAAHGJHBJJDDEHHI MD:Z:26T37T7 PG:Z:MarkDuplicates RG:Z:Basenji BI:Z:FFFBHHHFFHGGDGHGGEAAAAADFGEEEIHHGHFFFGFEGHHFBBGFBBBGHGFBEGIIIFGFEFHGFHHGCCCHIGHIGHHGDDDIIKIFKJGHGHGH NM:i:2 AS:i:65 XS:i:21
HWI-ST430:177:2:1:4979:15503#0 401 chr32 26100739 60 36M64H = 26100696 -79 GCCTAAAATTTACAAAAACAATAATAAAAACAACAG ===<=>>=>>===>===<=>===========>;=== SA:Z:chr5,36697147,+,72M28S,60,2; BD:Z:IHHE??FF?EGEF???FEFFFDFGE@@AHHIJFIFF MD:Z:36 PG:Z:MarkDuplicates RG:Z:Basenji BI:Z:HGHGBBFFAEGFFAAAEFFEGFEGFABBFGHGGHFF NM:i:0 AS:i:36 XS:i:22
BAM 文件是使用 bwa 将 HiSeq 读取与参考基因组对齐,并使用 picard 去除冗余。使用 gatk 完成碱基重新排列。
我的困惑是:
1、为什么三个读同名却没有关系?
2、也许前两个被视为伴侣对,第三个被视为单读。那我可以忽略它吗?
大家可以帮帮我吗?非常感谢您的帮助!