0

我正在处理双端 BAM 文件,并提出了许多这样的警告:

WARNING: Could not find pair for HWI-ST430:177:2:1:4979:15503#0
WARNING: Could not find pair for HWI-ST430:177:2:1:5127:13427#0
WARNING: Could not find pair for HWI-ST430:177:2:1:6521:21452#0

我检查了 BAM 文件中的警告读取,发现所有警告读取都有三个同名读取。例如:

HWI-ST430:177:2:1:4979:15503#0  65  chr32   26100696    60  79M21S  chr5    36697147    0   ACTTTGCAATTTAAGTTTTACTTACTTTTTAACTAATATACATGCCTAAAATTTACAAAAACAATAATAAAAACAACAGAACACTGGAAACATTTTTAAA    >;=<>=<<=======<====;===;=======<=>>>>>><=>>==>>>>=>>>>==>?>=<<==>?>>>?>?==><=?>><=<>>>?>?=>??>?===>    BD:Z:FFHFCIKKIHG@EEEHF??DGGEDGGE???DEEGGEFFFFGDHHHHGGE??FF?DGDG???EDGFGFGGF@@@FEHFEIEGFEEIJJIHBHGLJDD@EF@   MD:Z:79 PG:Z:MarkDuplicates RG:Z:Basenji    BI:Z:FFIECHGIHFEAFEEHEAAFFHDFFHDAAAFEEIHFGGHGGGHHGHHHFBBGFBGGGHBBBFGHGGFGGFBBBGHIGHJGHGHFKJJJJEIKLJGHBGFB   NM:i:0  AS:i:79 XS:i:19
HWI-ST430:177:2:1:4979:15503#0  129 chr5    36697147    60  72M28S  chr32   26100696    0   ATTTGCCCCTGGGCTATTTTTTTCCTNCCATGTAAGATTCCGTTTTAAAAATGTTTCCAGTGTTCTGTTGTTTTTATTATTGTTTTTGTAAATTTTAGGC    ===<=<<<<====<=>========<<!<<<=><<=>>>>>=5=>>>>>>>>>>=>>>==>=>=>>>>=?>=>>>>>>>>=?>=>>>?>>>??>??>;<=>    SA:Z:chr32,26100739,-,36M64S,60,0;  BD:Z:FFG@JKKFFHIIEHIGFF?????EGGEEEGHHEGEEDGFEGEGF??DE???FHEF?EGGHIFFGFEIFGGFG@@@EGGEGGGFHAAAHGJHBJJDDEHHI   MD:Z:26T37T7    PG:Z:MarkDuplicates RG:Z:Basenji    BI:Z:FFFBHHHFFHGGDGHGGEAAAAADFGEEEIHHGHFFFGFEGHHFBBGFBBBGHGFBEGIIIFGFEFHGFHHGCCCHIGHIGHHGDDDIIKIFKJGHGHGH   NM:i:2  AS:i:65 XS:i:21
HWI-ST430:177:2:1:4979:15503#0  401 chr32   26100739    60  36M64H  =   26100696    -79 GCCTAAAATTTACAAAAACAATAATAAAAACAACAG    ===<=>>=>>===>===<=>===========>;===    SA:Z:chr5,36697147,+,72M28S,60,2;   BD:Z:IHHE??FF?EGEF???FEFFFDFGE@@AHHIJFIFF   MD:Z:36 PG:Z:MarkDuplicates RG:Z:Basenji    BI:Z:HGHGBBFFAEGFFAAAEFFEGFEGFABBFGHGGHFF   NM:i:0  AS:i:36 XS:i:22

BAM 文件是使用 bwa 将 HiSeq 读取与参考基因组对齐,并使用 picard 去除冗余。使用 gatk 完成碱基重新排列。

我的困惑是:

1、为什么三个读同名却没有关系?

2、也许前两个被视为伴侣对,第三个被视为单读。那我可以忽略它吗?

大家可以帮帮我吗?非常感谢您的帮助!

4

0 回答 0