2

如何合并具有重叠间隔的 data.frames 中的 data.frame?

数据框 1

read.table(textConnection(
 "   from to Lith Form 
1   0   1.2 GRN   BCM
2   1.2 5.0 GDI   BDI
"), header=TRUE)    

数据框 2

read.table(textConnection(
"   from to Weath Str
1   0  1.1  HW ES
2   1.1 2.9 SW VS
3   2.9 5.0 HW ST 
"), header=TRUE) 

结果数据框

from to Weath Str Lith Form
1 0.0 1.1 HW ES GRN  BCM
2 1.1 1.2 SW VS GRN  BCM
3 1.2 2.9 SW VS GDI  BDI
4 2.9 5.0 HW ST GDI  BDI
4

1 回答 1

8

这是一种方法。它类似于 eddi 的答案(R 根据间隔切割两个 data.frames 并合并),但您可以在 data.frames 中拥有任意数量的列。

# change your data to data.table
dt1 <- data.table(df1, key='from')
dt2 <- data.table(df2, key='from')
# skeleton for joined data.table
dt <- data.table(from=sort(unique(c(dt1[,from], dt2[,from]))), 
                 to=sort(unique(c(dt1[,to], dt2[,to]))), 
                 key='from')
# function to join skeleton with data.table
j1 <- function(dt, dt1){
  dt3 <- dt1[dt, roll=TRUE]
  dt3[,':='(to=to.1, to.1=NULL)]
  setkey(dt3, from, to)
  return(dt3)
}
# merge two data.tables
j1(dt, dt2)[j1(dt, dt1)]

在 v1.9.3 中,最近实现了重叠连接(或间隔连接)。有了这个,我认为你的任务可以完成如下(假设你的 data.frames 是df1and df2):

require(data.table) ## 1.9.3+
setDT(df1)  ## convert to data.table without copy
setDT(df2)

setkey(df2, from, to)
ans = foverlaps(df1, df2, type="any")
ans = ans[, `:=`(from = pmax(from, i.from), to = pmin(to, i.to))]
ans = ans[, `:=`(i.from=NULL, i.to=NULL)][from <= to]
#    from  to Weath Str Lith Form
# 1:  0.0 1.1    HW  ES  GRN  BCM
# 2:  1.1 1.2    SW  VS  GRN  BCM
# 3:  1.2 2.9    SW  VS  GDI  BDI
# 4:  2.9 5.0    HW  ST  GDI  BDI
于 2013-10-22T07:47:38.433 回答