r - 可变长度核心名称标识

Question

我有一个具有以下行命名方案的数据集：

a.X.V
where:
a is a fixed-length core ID
X is a variable-length string that subsets a, which means I should keep X
V is a variable-length ID which specifies the individual elements of a.X to be averaged
. is one of {-,_}

我想要做的是对所有a.X's. 一个样品：

sampleList <- list("a.12.1"=c(1,2,3,4,5), "b.1.23"=c(3,4,1,4,5), "a.12.21"=c(5,7,2,8,9), "b.1.555"=c(6,8,9,0,6))
sampleList
$a.12.1
[1] 1 2 3 4 5

$b.1.23
[1] 3 4 1 4 5

$a.12.21
[1] 5 7 2 8 9

$b.1.555
[1] 6 8 9 0 6

目前我正在手动 gsubbing.Vs以获取一般列表：

sampleList <- t(as.data.frame(sampleList))
y <- rowNames(sampleList)
y <- gsub("(\\w\\.\\d+)\\.d+", "\\1", y)

有没有更快的方法来做到这一点？

这是我在工作流程中遇到的 2 个问题的一半。另一半在这里得到了回答。

score 2 · Accepted Answer

您可以使用模式向量来查找要分组的列的位置。我包含了一个我知道不会匹配任何东西的模式，以表明该解决方案对于这种情况是稳健的。

# A *named* vector of patterns you want to group by
patterns <- c(a.12="^a.12",b.12="^b.12",c.12="^c.12")
# Find the locations of those patterns in your list
inds <- lapply(patterns, grep, x=names(sampleList))
# Calculate the mean of each list element that matches the pattern
out <- lapply(inds, function(i) 
  if(l <- length(i)) Reduce("+",sampleList[i])/l else NULL)
# Set the names of the output
names(out) <- names(patterns)

score 2 · Accepted Answer

也许您可以考虑弄乱您的数据结构，以便更轻松地应用一些标准工具：

sampleList <- list("a.12.1"=c(1,2,3,4,5), 
  "b.1.23"=c(3,4,1,4,5), "a.12.21"=c(5,7,2,8,9), 
   "b.1.555"=c(6,8,9,0,6))
library(reshape2)
m1 <- melt(do.call(cbind,sampleList))
m2 <- cbind(m1,colsplit(m1$Var2,"\\.",c("coreID","val1","val2")))

结果如下所示：

head(m2)
  Var1    Var2 value coreID val1 val2
1     1  a.12.1     1      a   12    1
2     2  a.12.1     2      a   12    1
3     3  a.12.1     3      a   12    1

Then you can more easily do something like this:

aggregate(value~val1,mean,data=subset(m2,coreID=="a"))

score 1 · Accepted Answer

R is poised to do this stuff if you would just move to data.frames instead of lists. Make Your 'a', 'X', and 'V' into their own columns. Then you can use ave, by, aggregate, subset, etc.

data.frame(do.call(rbind, sampleList), 
           do.call(rbind, strsplit(names(sampleList), '\\.')))

#         X1 X2 X3 X4 X5 X1.1 X2.1 X3.1
# a.12.1   1  2  3  4  5    a   12    1
# b.1.23   3  4  1  4  5    b    1   23
# a.12.21  5  7  2  8  9    a   12   21
# b.1.555  6  8  9  0  6    b    1  555

r - 可变长度核心名称标识

3 回答 3

Related

Reference