r - 将行转为列

Question

假设（为了简化）我有一个包含一些控制与治疗数据的表：

Which, Color, Response, Count
Control, Red, 2, 10
Control, Blue, 3, 20
Treatment, Red, 1, 14
Treatment, Blue, 4, 21

对于每种颜色，我想要一行包含控制和处理数据，即：

Color, Response.Control, Count.Control, Response.Treatment, Count.Treatment
Red, 2, 10, 1, 14
Blue, 3, 20, 4, 21

我想这样做的一种方法是在每个控制/处理子集上使用内部合并（在颜色列上合并），但是有更好的方法吗？我在想 reshape 包或堆栈函数可以以某种方式做到这一点，但我不确定。

score 20 · Accepted Answer

使用重塑包。

首先，融化你的data.frame：

x <- melt(df)

然后投：

dcast(x, Color ~ Which + variable)

根据您使用的 reshape 包的版本，它可能是cast()(reshape) 或dcast()(reshape2)

瞧。

score 7 · Accepted Answer

包中的cast函数reshape（不要与基础 R 中的函数混淆reshape）可以执行此操作和许多其他操作。见这里：http ://had.co.nz/reshape/

score 6 · Accepted Answer

添加到选项（多年后）....

基本 R 中的典型方法将涉及reshape函数（这通常不受欢迎，因为需要时间掌握大量参数）。对于较小的数据集，这是一个非常有效的功能，但并不总是可以很好地扩展。

reshape(mydf, direction = "wide", idvar = "Color", timevar = "Which")
#   Color Response.Control Count.Control Response.Treatment Count.Treatment
# 1   Red                2            10                  1              14
# 2  Blue                3            20                  4              21

已经涵盖cast/dcast来自“reshape”和“reshape2”（现在，dcast.data.table来自“data.table”，当您拥有大型数据集时特别有用）。但同样来自 Hadleyverse，还有“tidyr”，它与“dplyr”包配合得很好：

library(tidyr)
library(dplyr)
mydf %>%
  gather(var, val, Response:Count) %>%  ## make a long dataframe
  unite(RN, var, Which) %>%             ## combine the var and Which columns
  spread(RN, val)                       ## make the results wide
#   Color Count_Control Count_Treatment Response_Control Response_Treatment
# 1  Blue            20              21                3                  4
# 2   Red            10              14                2                  1

~~还要注意的是，在即将发布的“data.table”版本中，该dcast.data.table函数应该能够处理这个问题，而无需首先处理melt您的数据。~~

的data.table实现dcast允许您将多个列转换为宽格式，而无需先将其熔化，如下所示：

library(data.table)
dcast(as.data.table(mydf), Color ~ Which, value.var = c("Response", "Count"))
#    Color Response_Control Response_Treatment Count_Control Count_Treatment
# 1:  Blue                3                  4            20              21
# 2:   Red                2                  1            10              14

score 3 · Accepted Answer

Reshape 确实适用于将瘦数据框（例如，从简单的 SQL 查询）旋转到宽矩阵，并且非常灵活，但速度很慢。对于大量数据，非常非常慢。幸运的是，如果你只想旋转到一个固定的形状，编写一个小 C 函数来快速完成旋转是相当容易的。

在我的例子中，旋转一个包含 3 列和 672,338 行的瘦数据框需要 34 秒 reshape，我的 R 代码需要 25 秒，C 需要 2.3 秒。具有讽刺意味的是，C 实现可能比我的更容易编写（针对速度进行了调整) R 实现。

这是用于旋转浮点数的核心 C 代码。请注意，它假定您在调用 C 代码之前已经在 R 中分配了正确大小的结果矩阵，这会导致 R 开发人员惊恐地颤抖：

#include <R.h> 
#include <Rinternals.h> 
/* 
 * This mutates the result matrix in place.
 */
SEXP
dtk_pivot_skinny_to_wide(SEXP n_row  ,SEXP vi_1  ,SEXP vi_2  ,SEXP v_3  ,SEXP result)
{
   int ii, max_i;
   unsigned int pos;
   int nr = *INTEGER(n_row);
   int * aa = INTEGER(vi_1);
   int * bb = INTEGER(vi_2);
   double * cc = REAL(v_3);
   double * rr = REAL(result);
   max_i = length(vi_2);
   /*
    * R stores matrices by column.  Do ugly pointer-like arithmetic to
    * map the matrix to a flat vector.  We are translating this R code:
    *    for (ii in 1:length(vi.2))
    *       result[((n.row * (vi.2[ii] -1)) + vi.1[ii])] <- v.3[ii]
    */
   for (ii = 0; ii < max_i; ++ii) {
      pos = ((nr * (bb[ii] -1)) + aa[ii] -1);
      rr[pos] = cc[ii];
      /* printf("ii: %d \t value: %g \t result index:  %d \t new value: %g\n", ii, cc[ii], pos, rr[pos]); */
   }
   return(result);
}

r - 将行转为列

4 回答 4

Related

Reference