r - 将字符向量作为参数传递给 plyr 中的函数

Question

我怀疑我做错了，但我想将字符向量作为参数传递给ddply. 有很多关于删除引号等的问答，但它似乎对我不起作用（例如，从 R和http://r.789695.n4.nabble.com/Pass-character-中的字符向量中删除引号向量到函数参数-td3045226.html）。

# reproducible data
df1<-data.frame(a=sample(1:50,10),b=sample(1:50,10),c=sample(1:50,10),d=(c("a","b","c","a","a","b","b","a","c","d")))
df2<-data.frame(a=sample(1:50,9),b=sample(1:50,9),c=sample(1:50,9),d=(c("e","f","g","e","e","f","f","e","g")))
df3<-data.frame(a=sample(1:50,8),b=sample(1:50,8),c=sample(1:50,8),d=(c("h","i","j","h","h","i","i","h")))

#make a list
list.1<-list(df1=df1,df2=df2,df3=df3)

# desired output
lapply(list.1, function(x)   ddply(x, .(d), function(x)  data.frame(am=mean(x$a), bm=mean(x$b), cm=mean(x$c))))

$df1
  d       am       bm       cm
1 a 31.00000 29.25000 18.50000
2 b 31.66667 24.33333 34.66667
3 c 18.50000  5.50000 24.50000
4 d 36.00000 39.00000 43.00000

$df2
  d       am       bm cm
1 e 18.25000 32.50000 18
2 f 27.66667 41.33333 24
3 g 25.00000  7.50000 42

$df3
  d       am       bm       cm
1 h 36.00000 25.00000 20.50000
2 i 25.33333 37.33333 24.33333
3 j 32.00000 32.00000 46.00000

但我的实际用例有许多新列和不同类型的计算，我想在ddply函数中计算。所以我想做类似的事情：

# here's a simple version of a function that I want to send to ddply    
func <- "am=mean(x$a), bm=mean(x$b), cm=mean(x$c)"

# here's how I imagine it might work
lapply(list.1, function(x)   ddply(x, .(d), function(x)  data.frame(func)) )

# not the desired outcome... 
$df1
  d                                     func
1 a am=mean(x$a), bm=mean(x$b), cm=mean(x$c)
2 b am=mean(x$a), bm=mean(x$b), cm=mean(x$c)
3 c am=mean(x$a), bm=mean(x$b), cm=mean(x$c)
4 d am=mean(x$a), bm=mean(x$b), cm=mean(x$c)

$df2
  d                                     func
1 e am=mean(x$a), bm=mean(x$b), cm=mean(x$c)
2 f am=mean(x$a), bm=mean(x$b), cm=mean(x$c)
3 g am=mean(x$a), bm=mean(x$b), cm=mean(x$c)

$df3
  d                                     func
1 h am=mean(x$a), bm=mean(x$b), cm=mean(x$c)
2 i am=mean(x$a), bm=mean(x$b), cm=mean(x$c)
3 j am=mean(x$a), bm=mean(x$b), cm=mean(x$c)

我已经尝试过noquote, deparse,eval(as.symbol())和这里的do.call(data.frame, ...)一些方法：httpsfunc ://github.com/hadley/devtools/wiki/Evaluation无济于事。此时解决方案可能很明显（即融化所有东西！），但如果不是，这里有一个更接近我的用例的更长示例：

# sample data
s <- 23 # number of samples
r <- 10 # number of runs per sample
el <- 17 # number of elements
mydata <- data.frame(ID = unlist(lapply(LETTERS[1:s], function(x) rep(x, r))),
                     run = rep(1:r, s))
# insert fake element data
mydata[letters[1:el]] <- lapply(1:el, function(i) rnorm(s*r, runif(1)*i^2))

# generate all combinations of 5 runs from  ten runs
su <- 5 # number of runs to sample from ten runs
idx <- combn(unique(mydata$run), su)

# RSE function
RSE <- function(x) {100*( (sd(x)/sqrt(length(x)))/mean(x) )}

# make a list of dfs for all samples for each combination of five runs
# to prepare to calculate RSEs
combys1 <- lapply(1:ncol(idx), function(i) mydata[mydata$run %in% idx[,i],] )

# make a list of dfs with RSE for each ID, for each combination of runs
combys2 <- lapply(1:length(combys1), function(i) ddply(combys1[[i]], "ID", summarise, RSEa=RSE(a), RSEb=RSE(b), RSEc=RSE(c), meana=mean(a), meanb=mean(b), meanc=mean(c)))

我想RSEa=RSE(a), RSEb=RSE(b), RSEc=RSE(c), meana=mean(a), meanb=mean(b), meanc=mean(c)用这里的对象替换上面的最后一行doRSE，以避免大量输入：

# prepare to calculate new colums with RSE and means
RSEs <- sapply(3:ncol(mydata), function(j) paste0("RSE",names(mydata[j]))) 
RSExs <- sapply(3:ncol(mydata), function(j) paste0("RSE(",names(mydata[j]),")")) 
doRSE <- paste0(sapply(1:length(RSEs), function(x) paste0(RSEs[x],"=",RSExs[x])), collapse=",", sep="")

我对涉及基础data.table和肮脏技巧的解决方案持开放态度。似乎这些接近我想要的，但我不能完全将它们转化为我的问题：传递字符参数和评估，使用字符向量强制评估多个变量，使用与表达式相对应的字符向量作为函数的参数

更新这里有一个问题：我希望能够func在简单示例（或doRSE我的用例）中修改，以创建一堆新列，这些新列是对现有列进行各种计算以探索数据。我想要一个允许生成的数据帧具有原始数据帧中没有的新列的工作流。抱歉，原始问题中没有更清楚。我看不出如何调整@Marius 的答案来做到这一点，但@mnel 很有帮助（请参阅下面的更新）

通过@mnel 出色的肮脏技巧，通过一些小修复，我可以在我的用例中获得所需的结果：

# @mnel's solution, adapted (no period before eval)
combys2 <- lapply(combys1, function(x) do.call(ddply,c(.data = quote(x), 
                           .variables = quote(.(ID)), .fun = quote(summarize),
                           eval(parse(text = sprintf('.(%s)', doRSE ))))))
head(combys2)

[[1]]
   ID       RSEa      RSEb     RSEc      RSEd     RSEe      RSEf     RSEg      RSEh      RSEi
1   A  168.30658  21.68632 5.657228  5.048057 4.162017 2.9581874 1.849009 0.6925148 0.4393491
2   B   26.55071  26.20427 4.782578  4.385409 2.342764 2.1813874 2.719625 1.1576681 0.6427935
3   C   73.83165  14.47216 8.154435  6.273202 3.046978 1.2179457 2.811405 1.1401837 0.8167067
4   D   31.96170  57.89260 9.438220  7.388410 3.755772 0.8601780 3.724875 0.8358204 0.9939387
5   E   63.22537  60.35532 5.839690 11.691304 3.828430 0.9217787 4.204300 0.8217187 0.7876634
6   F   56.37635  65.37907 4.149568  5.496308 2.227544 2.1548455 2.847291 1.1956212 0.2506518
7   G   69.32232  23.63214 4.255847  7.979225 4.917660 1.6185960 3.156521 0.3265555 0.8133279
8   H   29.82015  40.74184 7.372100  7.464792 2.749862 0.6054420 4.061368 0.9973909 1.3807720
9   I   50.58114  19.53732 2.989920  9.767678 4.000249 1.7451322 1.175397 0.9952093 0.9095086
10  J   92.96462  39.77475 6.140688 10.295668 3.407726 2.4663758 3.030444 0.5743419 0.9296482
11  K   90.72381  42.25092 2.483069  6.781054 3.142082 1.8080633 2.891740 1.1996176 0.8525290
12  L -385.24547  40.81267 4.506087  8.148382 2.976488 0.8304432 2.234134 0.2108664 0.4979777
13  M   22.77743  33.98332 2.913926  8.764639 2.307293 0.8366635 3.229944 1.0003125 0.3878567
14  N   66.75163  34.16087 6.611326 13.865377 1.285522 1.3863958 4.165575 0.7379386 0.4515194
15  O   37.37188 100.57479 5.738877  5.724862 2.839638 1.1366610 3.186332 0.7383855 0.3954544
16  P   17.08913  26.62210 6.060130  4.110893 2.688908 2.6970727 1.609043 1.3860834 0.8780010
17  Q   13.96392  74.92279 5.469304  8.467638 2.974131 1.2135436 3.284564 0.6232778 1.0759226
18  R   42.59899  30.75952 4.842832  8.764158 1.874020 1.5791048 3.427342 1.4479638 0.2964455
19  S   26.03307  15.56352 6.968717  7.783876 4.439733 2.0764179 4.683080 0.7459654 1.1268772
20  T   71.57945  33.81362 7.147049 11.201551 2.128315 2.2051611 2.419805 0.2688807 1.1559635
21  U   73.93002  11.77155 7.738910  7.207041 1.478491 1.4409844 4.042419 0.5883490 0.5585716
22  V   67.93166  39.54994 5.701551  8.636122 2.472963 1.6514199 2.627965 1.0359048 0.8747136
23  W   11.23057  12.51272 7.003448  7.424559 4.102693 0.6614847 2.246305 1.3422405 0.2665246
        RSEj      RSEk      RSEl      RSEm      RSEn      RSEo      RSEp      RSEq
1  0.6366733 0.3713819 2.1993487 0.3865293 0.5436581 0.9187585 0.4344699 0.8915868
2  0.3445095 0.2932025 1.8563179 0.5397595 1.0433388 0.3533622 0.1942316 0.1941072
3  0.2720344 0.5507595 2.0305726 0.4377259 0.8589854 0.5690906 0.1397337 0.4043247
4  0.6606667 0.6769112 3.4737352 0.5674656 1.2519256 0.8718298 0.1162969 0.8287504
5  0.4620774 0.5598069 1.9236112 0.7990046 0.9832732 0.6847352 0.4070675 0.9005185
6  0.7981610 0.4005493 0.9721068 0.2770989 1.7054674 0.3110139 0.4521183 0.8740444
7  0.3969116 0.4717575 4.1341106 0.7510628 0.9998299 0.5342292 0.4319642 1.1861705
8  0.2963956 0.2652221 0.4775827 0.2617120 0.8261874 0.5266087 0.1900943 0.2350553
9  0.2609359 0.5431035 2.6478440 0.1606919 0.7407281 0.6802262 0.1802069 0.7438792
10 0.4239787 0.8753544 3.4218030 0.5467869 0.7404017 0.5581173 0.3682014 0.6361436
11 0.4188502 0.8629862 4.4181479 0.1623873 0.8018811 0.5873609 0.3592134 0.5357984
12 0.5790265 0.5009210 3.7534287 0.1933726 0.5809601 0.5777868 0.3400925 0.4783890
13 0.3562582 0.2552756 2.1393219 0.1849345 0.5796194 0.6129469 0.3363311 0.4382125
14 0.7921502 0.6147990 2.9054634 0.5852325 1.4954072 0.9983203 0.2937837 0.7654504
15 0.5840424 0.2757707 1.5695675 0.3305385 0.8712636 0.5816490 0.1985457 0.7213289
16 0.3301280 0.3008273 2.9014987 0.4540833 0.5966479 0.9042004 0.1631630 0.7262141
17 0.5882511 0.2820978 3.0652666 0.4518936 1.3168151 0.4749311 0.2244693 0.6583083
18 0.4048816 0.3708787 3.2207478 0.2603412 1.3168318 0.3318745 0.3120436 0.6210711
19 0.4425123 0.3602076 3.7609863 0.5399527 0.8302572 0.3246904 0.1952143 0.2915325
20 0.5877835 0.6339015 1.6908570 0.3223056 0.5239339 0.6607198 0.2808094 0.3697380
21 0.4454056 0.7733354 4.3433420 0.4391075 0.5503594 0.5893406 0.2262403 0.2361512
22 0.9583940 0.6365843 3.0033951 0.6507968 0.8610046 0.6363198 0.2866719 0.5736855
23 0.4969730 0.3895182 2.0021608 0.3354475 1.4398250 0.7386870 0.2458906 0.3414804
...
...

score 4 · Accepted Answer

您可以使用quote和plyr::.

阅读https://github.com/hadley/devtools/wiki/Computing-on-the-language可能有助于了解您是否真的想这样做。

无论如何，一种方法可能是使用

用于.()创建参数向量，例如并使用 summarise 的工作方式

.(am=mean(a), bm=mean(b), cm=mean(c))

如果你真的想使用一个字符串

foo<- "am=mean(a), bm=mean(b), cm=mean(c)"
eval(parse(text = sprintf('.(%s)', foo )))

自由地使用quote来创建要传递给的列表do.call

例如

lapply(list.1, function(x) do.call(ddply,c(.data = quote(x), 
    .variables = quote(.(d)), .fun = quote(summarize),
      .(am=mean(a), bm=mean(b), cm=mean(c)))))

哦，男孩是那么丑陋。

或者，您可以使用data.tables

library(data.table)


listDT <- lapply(list.1, data.table)


lapply(listDT, function(x) x[,lapply(.SD, mean), by = 'd'])

或者

mystuff <- sprintf('list(%s)', foo)
lapply(listDT, function(x) x[, eval(parse(text = mystuff)), by = 'd'])

但是，如果您在所有 data.tables 中都有所有相同的列，那么创建一个大型 data.table（列表的每个元素都有一个标识符）并处理它会更有效。

score 2 · Accepted Answer

这是一个ddply计算不在数据框中的所有列的平均值的函数d：

lapply(list.1,
       function(x) {
         ddply(
           x,
           .(d),
           function(df_part) {
             result_df <- data.frame(d=df_part$d[1])
             non_d_cols <- colnames(df_part)[! colnames(df_part) == "d"]
             for (col in non_d_cols) {
               col_mean <- mean(df_part[[col]])
               col_name <- paste0(col, "_mean")
               result_df[[col_name]] <- col_mean
             }
             return(result_df)
           })
       })

在我看来，这似乎是最简单的方法，它应该很好地推广到您可能想要在这些列上进行的其他计算。也许您可以传入要计算平均值的列的字符向量参数，并使用它来代替non_d_cols.

r - 将字符向量作为参数传递给 plyr 中的函数

2 回答 2

Related

Reference