r - 将频率和汇总统计信息组合在一张表中？

Question

我刚刚发现了 plyr频率表的强大功能，其中包含 R 中的几个变量，我仍在努力理解它是如何工作的，我希望这里的一些可以帮助我。

我想创建一个表（数据框），我可以在其中组合频率和汇总统计信息，但无需对值进行硬编码。

这是一个示例数据集

require(datasets)

d1 <- sleep
# I classify the variable extra to calculate the frequencies 
extraClassified <- cut(d1$extra, breaks = 3, labels = c('low', 'medium', 'high') )
d1 <- data.frame(d1, extraClassified)

我正在寻找的结果应该是这样的：

  require(plyr)

  ddply(d1, "group", summarise,  
  All = length(ID), 

  nLow    = sum(extraClassified  == "low"),
  nMedium = sum(extraClassified  == "medium"),      
  nHigh =  sum(extraClassified  == "high"),

  PctLow     = round(sum(extraClassified  == "low")/ length(ID), digits = 1),
  PctMedium  = round(sum(extraClassified  == "medium")/ length(ID), digits = 1),      
  PctHigh    = round(sum(extraClassified  == "high")/ length(ID), digits = 1),

  xmean    = round(mean(extra), digits = 1),
  xsd    =   round(sd(extra), digits = 1))

我的问题：如何在不对值进行硬编码的情况下做到这一点？

记录：我试过这段代码，但它不起作用

ddply (d1, "group", 
   function(i) c(table(i$extraClassified),     
   prop.table(as.character(i$extraClassified))),
   )

提前致谢

score 2 · Accepted Answer

这是一个帮助您入门的示例：

foo <- function(x,colfac,colval){
    tbl <- table(x[,colfac])
    res <- cbind(n = nrow(x),t(tbl),t(prop.table(tbl)))
    colnames(res)[5:7] <- paste(colnames(res)[5:7],"Pct",sep = "")
    res <- as.data.frame(res)
    res$mn <- mean(x[,colval])
    res$sd <- sd(x[,colval])
    res
}

ddply(d1,.(group),foo,colfac = "extraClassified",colval = "extra")

不要把那个功能中的任何东西foo当作福音。我只是把它写在我的头上。当然改进/修改是可能的，但至少它是开始的。

score 2 · Accepted Answer

感谢乔兰。我稍微修改了您的函数以使其更通用（不参考变量的位置）。

require(plyr)
            foo <- function(x,colfac,colval)
            {

              # table with frequencies
              tbl    <- table(x[,colfac])
              # table with percentages 
              tblpct <- t(prop.table(tbl))
              colnames( tblpct) <- paste(colnames(t(tbl)), 'Pct', sep = '')

              # put the first part together 
              res <- cbind(n = nrow(x), t(tbl), tblpct)
              res <- as.data.frame(res)

              # add summary statistics 

              res$mn <- mean(x[,colval])
              res$sd <- sd(x[,colval])
              res
            }

ddply(d1,.(group),foo,colfac = "extraClassified",colval = "extra")

它有效！

PS：我仍然不明白（组）代表什么，但是

r - 将频率和汇总统计信息组合在一张表中？

2 回答 2

Related

Reference