2

I'm working with a dataset that looks something like this, except there are many more columns with data like "serial" and "loc":

start <-c(1,8,16,24,28,32)
end   <-c(4,9,20,27,30,45)
serial<-c(1,2,3,4,5,6)
loc<-c(8,63,90,32,89,75)
dataset<-data.frame(cbind(start,end, serial,loc))

Here each row actually represents a run of consecutive integers; I'd like to make each of those consecutive integers into its own row and conserve the other attributes of that row. "start" indicates the beginning of a run and "end" represents the end of the run. So, for example, in the first row in "dataset", I would like to have that row separated into four rows: one for 1, one for 2, one for 3, and one for 4. Likewise, the second row in "dataset" would be split into two rows: one for 8 and one for 9 etc.

Thus the output for running just the first two lines of "dataset" would look like:

split serial loc
    1 1 8
    2 1 8
    3 1 8
    4 1 8
    8 2 63
    9 2 63
4

3 回答 3

3

data.table假设 serial 是唯一行标识符的解决方案

library(data.table)
DA <- as.data.table(dataset)
DB <- DA[,list(index = seq(start,end, by = 1), loc),by = serial]

如果serial不是唯一的行标识符,则

DB <- DA[, list(index = seq(start,end, by = 1), loc, serial), by = list(rowid = seq_len(nrow(DA)))]
于 2012-12-04T00:34:51.873 回答
1

这是坚持使用基础 R 的一种方法。

temp <- mapply(seq, dataset$start, dataset$end)
dataset2 <- data.frame(serial = rep(dataset$serial, sapply(temp, length)),
                       index = unlist(temp),
                       loc = rep(dataset$loc, sapply(temp, length)))
list(head(dataset2), tail(dataset2))
# [[1]]
#   serial index loc
# 1      1     1   8
# 2      1     2   8
# 3      1     3   8
# 4      1     4   8
# 5      2     8  63
# 6      2     9  63
# 
# [[2]]
#    serial index loc
# 27      6    40  75
# 28      6    41  75
# 29      6    42  75
# 30      6    43  75
# 31      6    44  75
# 32      6    45  75
于 2012-12-04T02:49:11.927 回答
0
# create the ranges
ranges <- mapply(seq, dataset$start, dataset$end)

# create the tables
tables <- lapply(seq(ranges), function(i) 
             cbind(split=ranges[[i]], dataset[i, c("serial", "loc")]) ) 

# to put all the tables in one matrix: 
do.call(rbind, tables)
于 2012-12-04T00:39:13.350 回答