r - 转置 Dataframe 后聚合

Question

我在正确转置我的数据时遇到了一些困难。我正在尝试获取列的平均值和 sd 的列表，其中列名现在是行。我能够使用以下代码创建方法和 sd：

data(iris)

mydata <- do.call(data.frame, aggregate(. ~ Species, iris, function(x) c(mean = mean(x), sd = sd(x))))

创建表：

<table><tbody><tr><th>Species</th><th>Sepal.Length.mean</th><th>Sepal.Length.sd</th><th>Sepal.Width.mean</th><th>Sepal.Width.sd</th><th>Petal.Length.mean</th><th>Petal.Length.sd</th><th>Petal.Width.mean</th><th>Petal.Width.sd</th></tr><tr><td>setosa</td><td>5.006</td><td>0.3524897</td><td>3.428</td><td>0.3790644</td><td>1.462</td><td>0.173664</td><td>0.246</td><td>0.1053856</td></tr><tr><td>versicolor</td><td>5.936</td><td>0.5161711</td><td>2.77</td><td>0.3137983</td><td>4.26</td><td>0.469911</td><td>1.326</td><td>0.1977527</td></tr><tr><td>virginica</td><td>6.588</td><td>0.6358796</td><td>2.974</td><td>0.3224966</td><td>5.552</td><td>0.5518947</td><td>2.026</td><td>0.27</td></tr></tbody></table>

我希望表格如下所示：

<table><tbody><tr><th> </th><th>Setosa</th><th> </th><th>Versicolor</th><th> </th><th>Virginica</th><th> </th></tr><tr><td> </td><td>Mean</td><td>SD</td><td>Mean</td><td>SD</td><td>Mean</td><td>SD</td></tr><tr><td>Sepal.Length</td><td> </td><td> </td><td> </td><td> </td><td> </td><td> </td></tr><tr><td>Sepal.Width</td><td> </td><td> </td><td> </td><td> </td><td> </td><td> </td></tr><tr><td>Petal.Length</td><td> </td><td> </td><td> </td><td> </td><td> </td><td> </td></tr><tr><td>Petal.Width</td><td> </td><td> </td><td> </td><td> </td><td> </td><td> </td></tr></tbody></table>

我意识到获取第二个标头很可能需要 kable 中的 add_header_above 函数，但在我到达那里之前，我在将数据帧构建成我想要的结构时遇到了一些困难。我一直在摆弄 cast 和 melt 功能，但运气不佳。

任何建议将不胜感激！

〜杰克

score 1 · Accepted Answer

这是 tidyverse 和tables软件包的解决方案。首先，我们使用gather()窄格式整理数据集。窄格式允许我们在表格中同时使用Species和flowerAttribute作为因子变量，并且无需转置数据。

其次，我们使用该tables::tabular()函数生成一个表，该表在列维度上具有物种均值和标准差，在行维度上具有花属性。

data(iris)
library(tables)
library(tidyverse)
tidyIris <- gather(iris,key=flowerAttribute,value=value,
                 Sepal.Length,Sepal.Width,Petal.Length,Petal.Width)
# factors required for tabular()
tidyIris$flowerAttribute <- as.factor(tidyIris$flowerAttribute)
tabular((flowerAttribute) ~ Format(digits=2)*(Species)*(value)*(mean + sd), 
       data=tidyIris )

...和输出：

> tabular((flowerAttribute) ~ Format(digits=2)*(Species)*(value)*(mean + sd), 
+         data=tidyIris )

                 Species                                    
                 setosa       versicolor      virginica     
                 value        value           value         
 flowerAttribute mean    sd   mean       sd   mean      sd  
 Petal.Length    1.46    0.17 4.26       0.47 5.55      0.55
 Petal.Width     0.25    0.11 1.33       0.20 2.03      0.27
 Sepal.Length    5.01    0.35 5.94       0.52 6.59      0.64
 Sepal.Width     3.43    0.38 2.77       0.31 2.97      0.32

对于那些以前使用过 SAS 的人，该tables软件包实现了类似于 SAS PROC TABULATE 的功能。

增强输出

通过对代码进行一些调整，我们可以准确地复制 OP 中请求的输出格式。

# key syntax elements
# 1. - renamed flowerAttribute to Attribute using = operator
# 2. - used Heading() to eliminate the printing of "value" and "Species" on columns
tabular((Attribute=flowerAttribute) ~ Format(digits=2)*(Heading()*Species)*Heading()*(value)*(mean + sd), 
        data=tidyIris )

...和输出：

              setosa       versicolor      virginica     
 Attribute    mean    sd   mean       sd   mean      sd  
 Petal.Length 1.46    0.17 4.26       0.47 5.55      0.55
 Petal.Width  0.25    0.11 1.33       0.20 2.03      0.27
 Sepal.Length 5.01    0.35 5.94       0.52 6.59      0.64
 Sepal.Width  3.43    0.38 2.77       0.31 2.97      0.32
 >

生成 LaTeX

最后，为了获得排版质量输出，可以使用tabular()编写 LaTeX 代码，使用 Sweave 将其编译成 PDF 文档。

latex(tabular((Attribute=flowerAttribute) ~ Format(digits=2)*(Heading()*Species)*Heading()*(value)*(mean + sd), 
        data=tidyIris ))

...生成 LaTeX 编译成：

score 0 · Accepted Answer

我猜你在找这个？

  `colnames<-`(do.call(rbind,by(t(mydata[-1]),rep(names(iris[-5]),each=2),unlist)),rep(c("Mean","Sd"),3))
              Mean        Sd  Mean        Sd  Mean        Sd
Petal.Length 1.462 0.1736640 4.260 0.4699110 5.552 0.5518947
Petal.Width  0.246 0.1053856 1.326 0.1977527 2.026 0.2746501
Sepal.Length 5.006 0.3524897 5.936 0.5161711 6.588 0.6358796
Sepal.Width  3.428 0.3790644 2.770 0.3137983 2.974 0.3224966

首先，因为我只处理数字列，所以我摆脱了Species columnby iris[-5]。另外，由于我不需要第一列，mydata所以我摆脱了它。为什么我重复了两次？有两个功能。为什么我重复了3次，有3种...

r - 转置 Dataframe 后聚合

2 回答 2

增强输出

生成 LaTeX

Related

Reference