r - 优化 data.table 中的 for 循环

Question

我正在使用此处找到的 data.table 解决方案：重复条目池，同时平均相邻列中的值

dt.out <- dt[, lapply(.SD, function(x) paste(x, collapse=",")), 
          by=c("ID2", "chrom", "strand", "txStart", "txEnd")]

dt.out <- dt.out[ ,list(ID=paste(ID, collapse=","), ID2=paste(ID2, collapse=","), 
                       txStart=min(txStart), txEnd=max(txEnd)), 
                       by=c("probe", "chrom", "strand", "newCol")]

数据集：

ID      ID2         probe       chrom   strand txStart  txEnd  newCol
Rest_3  uc001aah.4  8044649     chr1    0      14361    29370  1.02
Rest_4  uc001aah.4  7911309     chr1    0      14361    29370  1.30  
Rest_5  uc001aah.4  8171066     chr1    0      14361    29370  2.80         
Rest_6  uc001aah.4  8159790     chr1    0      14361    29370  4.12 

Rest_17 uc001abw.1  7896761     chr1    0      861120   879961 1.11
Rest_18 uc001abx.1  7896761     chr1    0      871151   879961 3.12

我添加了这个for循环，以便newCol平均单个单元格中的折叠值（从第一个dt.out）。然而，通过这个循环需要很长时间。有没有更快的方法来做到这一点？

for(i in 1:NROW(dt.out)){
  con <- textConnection(dt.out[i,grep("newCol", colnames(dt.out))])
  data <- read.csv(con, sep=",", header=FALSE)
  close(con)
  dt.out[i,grep("newCol", colnames(dt.out))]<- as.numeric(rowMeans(data)) 

}

score 2 · Accepted Answer

newCol与另一个问题中的数据相比，这似乎是一个额外的列。我想在获得第一个之后dt.out，你想取 ? 的折叠值的平均值newCol？

您可以通过newCol直接替换为sapply(strsplit(.)). 基本上，在获得第一个之后dt.out这样做：

dt.out[ , newCol := sapply(strsplit(newCol, ","), function(x) mean(as.numeric(x)))]

r - 优化 data.table 中的 for 循环

1 回答 1

Related

Reference