r - 组合 IRange 对象并维护 mcols

Question

我将从一个示例开始，然后描述我尝试使用的逻辑。

我有两个IRanges跨越相同总范围的普通对象，但可能在不同数量的范围内这样做。每个IRanges人都有一个mcol，但那mcol是不同的IRanges。

a
#IRanges object with 1 range and 1 metadata column:
#          start       end     width | on_betalac
#      <integer> <integer> <integer> |  <logical>
#  [1]         1       167       167 |      FALSE
b
#IRanges object with 3 ranges and 1 metadata column:
#          start       end     width |  on_other
#      <integer> <integer> <integer> | <logical>
#  [1]         1       107       107 |     FALSE
#  [2]       108       112         5 |      TRUE
#  [3]       113       167        55 |     FALSE

您可以看到这两个IRanges范围都从 1 到 167，但a只有一个范围和b三个范围。我想将它们结合起来得到这样的输出：

my_great_function(a, b)
#IRanges object with 3 ranges and 2 metadata columns:
#          start       end     width | on_betalac  on_other
#      <integer> <integer> <integer> |  <logical> <logical>
#  [1]         1       107       107 |     FALSE     FALSE
#  [2]       108       112         5 |     FALSE      TRUE
#  [3]       113       167        55 |     FALSE     FALSE

输出disjoin与输入类似，但它保留mcols，甚至扩展它们，以使输出范围与mcol导致它的输入范围具有相同的值。

score 2 · Accepted Answer

选项 1：使用`IRanges::findOverlaps`

m <- findOverlaps(b, a)
c <- b[queryHits(m)]
mcols(c) <- cbind(mcols(c), mcols(a[subjectHits(m)]))
#IRanges object with 3 ranges and 2 metadata columns:
#          start       end     width |  on_other on_betacalc
#      <integer> <integer> <integer> | <logical>   <logical>
#  [1]         1       107       107 |     FALSE       FALSE
#  [2]       108       112         5 |      TRUE       FALSE
#  [3]       113       167        55 |     FALSE       FALSE

生成的对象c是IRanges具有两个元数据列的对象。

选项 2：使用`IRanges::mergeByOverlaps`

c <- mergeByOverlaps(b, a)
c
#DataFrame with 3 rows and 4 columns
#          b  on_other         a on_betacalc
#  <IRanges> <logical> <IRanges>   <logical>
#1     1-107     FALSE     1-167       FALSE
#2   108-112      TRUE     1-167       FALSE
#3   113-167     FALSE     1-167       FALSE

生成的输出对象是一个DataFrame以IRanges列和原始元数据列作为附加列的对象。

选项 3：使用`data.table::foverlaps`

library(data.table)
a.dt <- as.data.table(cbind.data.frame(a, mcols(a)))[, width := NULL]
b.dt <- as.data.table(cbind.data.frame(b, mcols(b)))[, width := NULL]

setkey(b.dt, start, end)
foverlaps(a.dt, b.dt, type = "any")[, `:=`(i.start = NULL, i.end = NULL)][]
   start end on_other on_betacalc
1:     1 107    FALSE       FALSE
2:   108 112     TRUE       FALSE
3:   113 167    FALSE       FALSE

结果对象是一个data.table.

选项 4：使用`fuzzyjoin::interval_left_join`

library(fuzzyjoin)
a.df <- cbind.data.frame(a, mcols(a))
b.df <- cbind.data.frame(b, mcols(b))
interval_left_join(b.df, a.df, by = c("start", "end"))
#  start.x end.x width.x on_other start.y end.y width.y on_betacalc
#1       1   107     107    FALSE       1   167     167       FALSE
#2     108   112       5     TRUE       1   167     167       FALSE
#3     113   167      55    FALSE       1   167     167       FALSE

结果对象是一个data.frame.

样本数据

library(IRanges)
a <- IRanges(1, 167)
mcols(a)$on_betacalc = F

b <- IRanges(c(1, 108, 113), c(107, 112, 167))
mcols(b)$on_other <- c(F, T, F)

score 0 · Accepted Answer

这是我能想到的。不像 MauritsEvers 那样优雅，但在某些方面可能对其他人有用。

combine_exposures <- function(...) {

  cd <- c(...)
  mc <- mcols(cd)
  dj <- disjoin(x = cd, with.revmap = TRUE)
  r <- mcols(dj)$revmap

  d <- as.data.frame(matrix(nrow = length(dj), ncol = ncol(mc)))
  names(d) <- names(mc)

  for (i in 1:length(dj)) {
    d[i,] <- sapply(X = 1:ncol(mc), FUN = function(j) { mc[r[[i]][j], j] })
  }

  mcols(dj) <- d
  return(dj)
}

这里是dput(c(e1, e2, e3, e4))（e1、e2、e3 和 e4 是一些示例 IRanges，它们都跨越 1,167）：

new("IRanges", start = c(1L, 1L, 108L, 113L, 1L, 1L), width = c(167L, 
107L, 5L, 55L, 167L, 167L), NAMES = NULL, elementType = "ANY", 
    elementMetadata = new("DataFrame", rownames = NULL, nrows = 6L, 
        listData = list(on_betalac = c(FALSE, NA, NA, NA, NA, 
        NA), on_other = c(NA, FALSE, TRUE, FALSE, NA, NA), on_pen = c(NA, 
        NA, NA, NA, FALSE, NA), on_quin = c(NA, NA, NA, NA, NA, 
        FALSE)), elementType = "ANY", elementMetadata = NULL, 
        metadata = list()), metadata = list())

r - 组合 IRange 对象并维护 mcols

2 回答 2

选项 1：使用IRanges::findOverlaps

选项 2：使用IRanges::mergeByOverlaps

选项 3：使用data.table::foverlaps

选项 4：使用fuzzyjoin::interval_left_join

样本数据

Related

Reference

选项 1：使用`IRanges::findOverlaps`

选项 2：使用`IRanges::mergeByOverlaps`

选项 3：使用`data.table::foverlaps`

选项 4：使用`fuzzyjoin::interval_left_join`