r - 如何从多行中创建列？

Question

我对 R 和编程本身还是很陌生，现在我的 data.frame 存在问题，不允许我继续工作。

我有一组数据如下

表格1

    Individual             Score
    Tim                      45
    Tim                      77
    Tim                      32
    Clare                    92
    Clare                    70
    Clare                    88

让我解释一下上面的表 1，我有几个人（上例中的 TIm 和 Clare），我在他们在 3 个不同场合（2009 年、2010 年、2011 年）提出的测试中获得了他们的分数我正在想办法把上面的变成这样的：

表2

    Individual             Score09             Score10             Score11
    Tim                      45                   77                  32
    Clare                    92                   70                  88

我使用ddply获取表1，因为我原来有测试的子集的信息（变量分数只是所有子集的总和）

请让我知道是否有办法以表 2 而不是表 1 结束，因为我有超过 10000 个观察结果，并且表 1 的设置不会让我按照预期的建议继续前进。

编辑：

生成表 1 的原始 df 是：

数据框如下

    Base          Individual     score_math    score_bio     score_chem
    SB1120091       Tim              12            23             10
    SB1120092       Tim              30            25             22
    SB1120101       Tim              17             5             10
    SB1120091       Clare            50            20             22
    SB1120092       Clare            40            10             20
    SB1120101       Clare            47            20             21

代码是：

>Table1 <-ddply(x, .(Indivual), summarise, Score=(score_math*score_bio*score_chem))

编辑2：

原始数据集没有 Year 变量，但有一个基本变量，提供有关何时进行测试的信息。

此外，分数变量是作为所有子集分数的乘积计算的。

score 4 · Accepted Answer

数据：

df <- structure(list(Individual = structure(c(2L, 2L, 2L, 1L, 1L, 1L), 
                     .Label = c("Clare", "Tim"), class = "factor"), 
                     Score = c(45, 77, 32, 92, 70, 88), 
                     count = c(1L, 2L, 3L, 1L, 2L, 3L)), 
                     .Names = c("Individual", "Score", "count"), 
                     row.names = c(NA, -6L), class = "data.frame")
df$count <- rep(c("09", "10", "11"), 2)

reshape从基本统计中使用：

> reshape(df, idvar="Individual", timevar="count", direction="wide", sep="")

#   Individual Score09 Score10 Score11
# 1        Tim      45      77      32
# 4      Clare      92      70      88

score 2 · Accepted Answer

您可以使用该reshape2软件包：

# presuming your data frame is 'xx'
library(reshape2)

# Create a 'Case' Column
xx$Case <- rep(paste0("Score", c("09", "10", "11")), 2)

dcast(xx, Individual ~ Case, value.var="Score")
 Individual Score09 Score10 Score11
      Clare      92      70      88
        Tim      45      77      32

score 2 · Accepted Answer

现在您已经提供了原始表，请xtabs()在原始数据集上使用。假设您的数据集名为“x”：

xtabs(score_math + score_bio + score_chem ~ Individual + Year, x)
#           Year
# Individual 2009 2010 2011
#      Clare   92   70   88
#      Tim     45   77   32

score 1 · Accepted Answer

您的ddply调用是按个人拆分函数，这会为每个人生成一个单独的数据框，并分别计算每个数据框的总和。数据集中的每个人都有多行，所以这个总和对于每一行都有一个总和。然后它将数据重新组合在一起，默认情况下，在结果中为每个初始行提供一行。但是您希望每个人有一排；如果我们只是转置结果，它会变成一个包含一行的矩阵，从而产生所需的行为。

使用您提供的数据：

x <- read.table(text="Year Individual score_math score_bio score_chem
2009 Tim 12 23 10
2010 Tim 30 25 22
2011 Tim 17 5 10
2009 Clare 50 20 22
2010 Clare 40 10 20
2011 Clare 47 20 21", header=TRUE)

这是一个修改后的ddply调用：

> ddply(x, .(Individual), summarise, Score=t((score_math+score_bio+score_chem)))
  Individual Score.1 Score.2 Score.3
1      Clare      92      70      88
2        Tim      45      77      32

ddply但是，这确实不是正确的工具；您只需对每一行进行非常简单的计算，然后再进行整形。我的偏好是为总分添加一列，然后dcast从reshape2包中使用。这种偏好的一个原因是，您将拥有一个完整的主数据集，其中包含您以后可能需要的所有信息，然后使用它来执行所有计算和转换。

library(reshape2)
x$Total <- with(x, score_math + score_bio + score_chem)
dcast(x, Individual ~ Year, value.var="Total")

r - 如何从多行中创建列？

表格1

表2

编辑：

编辑2：

4 回答 4

Related

Reference