我使用 GenomicRanges R 包来查找两组基因组范围之间的重叠。findOverlaps 函数的输出提供了两个信息:1. 与列表 A 重叠的范围的行号 2. 与列表 B 重叠的范围的行号。
我对列表 A 中的重叠感兴趣,并想在列表 A 中添加一列,指示每行的重叠数。
这是一个可以在 R 中直接使用的可重现示例:
#Define SetA
chrA = c(7,3,22)
startA = c(127991052,37327681,50117297)
stopA = c(127991052,37327681,50117297)
SetA = data.frame(chrA,startA,stopA)
#Define SetB
chrB = c(1,3,22,22)
startB = c(105278917,37236502,46384621,49214228)
stopB = c(105451039,37411958,50796976,50727239)
SetB = data.frame(chrB,startB,stopB)
#Find Overlaps between SetA and SetB
library(GenomicRanges)
gr0 = with(SetA, GRanges(chrA, IRanges(start=startA, end=stopA)))
gr1 = with(SetB, GRanges(chrB, IRanges(start=startB, end=stopB)))
hits = findOverlaps(gr0, gr1)
hits = data.frame(hits) #the fist col of hits is the row numbers (from SetA) of genomic ranges that overlap with SetB
mat
我想向 SetA 添加一列,指示每行与 SetB 重叠的频率。这是我的尝试和我需要得到的输出:
#Calculate frequencies:
OverlapFreq = data.frame(table(hits$queryHits)) #calculate frequencies for the first col in hits
OverlapFreq
#expected output:
SetA$OverlapFreq = c(0,1,2)
SetA
任何关于如何实现这一目标的建议都非常感谢!