1

我使用 GenomicRanges R 包来查找两组基因组范围之间的重叠。findOverlaps 函数的输出提供了两个信息:1. 与列表 A 重叠的范围的行号 2. 与列表 B 重叠的范围的行号。

我对列表 A 中的重叠感兴趣,并想在列表 A 中添加一列,指示每行的重叠数。

这是一个可以在 R 中直接使用的可重现示例:

#Define SetA    
    chrA = c(7,3,22)
    startA = c(127991052,37327681,50117297)
    stopA = c(127991052,37327681,50117297)
    SetA = data.frame(chrA,startA,stopA)

#Define SetB
    chrB = c(1,3,22,22)
    startB = c(105278917,37236502,46384621,49214228)
    stopB = c(105451039,37411958,50796976,50727239)
    SetB = data.frame(chrB,startB,stopB)

#Find Overlaps between SetA and SetB 
    library(GenomicRanges)
    gr0 = with(SetA, GRanges(chrA, IRanges(start=startA, end=stopA)))
    gr1 = with(SetB, GRanges(chrB, IRanges(start=startB, end=stopB)))

    hits = findOverlaps(gr0, gr1)
    hits = data.frame(hits) #the fist col of hits is the row numbers (from SetA) of genomic ranges that overlap with SetB
    mat

我想向 SetA 添加一列,指示每行与 SetB 重叠的频率。这是我的尝试和我需要得到的输出:

#Calculate frequencies:    
OverlapFreq = data.frame(table(hits$queryHits)) #calculate frequencies for the first col in hits
OverlapFreq

    #expected output:
    SetA$OverlapFreq = c(0,1,2)
    SetA

任何关于如何实现这一目标的建议都非常感谢!

4

2 回答 2

1

我想出了答案,它只是使用同一个包中的 countOverlaps 函数:

OverlapFreq = countOverlaps(gr0,gr1)
于 2018-08-22T11:51:18.403 回答
0

还使用函数的plyranges版本:

    # direct
    gr0$n_overlaps <- count_overlaps(gr0, gr1)

    # dplyr style 
    overlaps <- gr0 %>% mutate(n_overlaps = count_overlaps(., gr1))      

我还推荐用于连接操作的 plyranges。

    # return overlapping ranges
    find_overlaps(gr0,gr1)
于 2018-10-05T18:26:50.437 回答