r - R中使用多条记录的数据格式化

Question

我有以下格式的数据：

site    location    treatment    response
1         1           1            20
1         1           2            30
1         1           3            30
1         2           1            80
1         2           2            30
1         2           3            50
1         3           1            10
1         3           2            15
1         3           3            100
1         4           1            25
1         4           2            20
1         4           3            90

每个站点的多个站点和 10 个位置。

我希望创建三个新变量，treat1、treat2 和treat3，指的是三种治疗方法，它们采用每个站点/位置组合的响应值。但是，我希望他们为所有三个治疗记录取值。即一个数据框，如：

site    location    treatment    response     treat1    treat2     treat3
1         1           1            20           20        30         30
1         1           2            30           20        30         30
1         1           3            30           20        30         30
1         2           1            80           80        30         50
1         2           2            30           80        30         50
1         2           3            50           80        30         50
1         3           1            10           10        15        100
1         3           2            15           10        15        100
1         3           3            100          10        15        100
1         4           1            25           25        20         90  
1         4           2            20           25        20         90 
1         4           3            90           25        20         90

为了实现这一点，我使用了一个相当冗长的解决方案（6 行代码 - 请参见下文），但我想知道是否有人可以指出更直接的方法：

对待1：

df$trt1 <- ifelse(df$treatment==1, df$response, NA)
df2 <- aggregate(df$trt1, list(df1$location, df$site), max, na.rm=TRUE)
df3 <- rbind(df2, df2, df2)
df4 <- df3[with(df3, order(Group.2, Group.1)),]
df$x <- ifelse(df4$x==-Inf, NA, df4$x)
names(df)[names(df) == 'x'] <- 'treat1'

我怀疑 tapply 在这里可能有用，但我不确定如何在这种情况下使用它。

谢谢你。

score 1 · Accepted Answer

一种方法可能是：

merge(DF, 
      do.call(data.frame, aggregate(response ~ site + location, DF, c)), 
      by = c("site", "location"))
#   site location treatment response response.1 response.2 response.3
#1     1        1         1       20         20         30         30
#2     1        1         2       30         20         30         30
#3     1        1         3       30         20         30         30
#4     1        2         1       80         80         30         50
#5     1        2         2       30         80         30         50
#6     1        2         3       50         80         30         50
#7     1        3         1       10         10         15        100
#8     1        3         2       15         10         15        100
#9     1        3         3      100         10         15        100
#10    1        4         1       25         25         20         90
#11    1        4         2       20         25         20         90
#12    1        4         3       90         25         20         90

其中“DF”：

DF = structure(list(site = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L), location = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L, 
4L, 4L), treatment = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 1L, 
2L, 3L), response = c(20L, 30L, 30L, 80L, 30L, 50L, 10L, 15L, 
100L, 25L, 20L, 90L)), .Names = c("site", "location", "treatment", 
"response"), class = "data.frame", row.names = c(NA, -12L))

score 1 · Accepted Answer

您还可以使用标准子集并通过以下方式获取处理矩阵：

matrix(df$response,ncol=3,byrow=T)[rep(1:(nrow(df)/3),rep(3,nrow(df)/3)),]

score 1 · Accepted Answer

1）dplyr

library(dplyr)

DF %>% 
   group_by(site, location) %>% 
   mutate(treat1 = response[1], treat2 = response[2], treat3 = response[3])

给予：

Source: local data frame [12 x 7]
Groups: site, location
   site location treatment response treat1 treat2 treat3
1     1        1         1       20     20     30     30
2     1        1         2       30     20     30     30
3     1        1         3       30     20     30     30
4     1        2         1       80     80     30     50
5     1        2         2       30     80     30     50
6     1        2         3       50     80     30     50
7     1        3         1       10     10     15    100
8     1        3         2       15     10     15    100
9     1        3         3      100     10     15    100
10    1        4         1       25     25     20     90
11    1        4         2       20     25     20     90
12    1        4         3       90     25     20     90

2) 数据表

library(data.table)
DT <- data.table(DF)

treats <- paste0("treat", unique(DF$treatment)) # column names
DT[, (treats) := as.list(response), by = list(site, location)]

给予：

> DT
    site location treatment response treat1 treat2 treat3
 1:    1        1         1       20     20     30     30
 2:    1        1         2       30     20     30     30
 3:    1        1         3       30     20     30     30
 4:    1        2         1       80     80     30     50
 5:    1        2         2       30     80     30     50
 6:    1        2         3       50     80     30     50
 7:    1        3         1       10     10     15    100
 8:    1        3         2       15     10     15    100
 9:    1        3         3      100     10     15    100
10:    1        4         1       25     25     20     90
11:    1        4         2       20     25     20     90
12:    1        4         3       90     25     20     90

3) 大道

treat <- function(i) ave(DF$response, DF$site, DF$location, FUN = function(x) x[i])
cbind(DF, treat1 = treat(1), treat2 = treat(2), treat3 = treat(3))

给予：

   site location treatment response treat1 treat2 treat3
1     1        1         1       20     20     30     30
2     1        1         2       30     20     30     30
3     1        1         3       30     20     30     30
4     1        2         1       80     80     30     50
5     1        2         2       30     80     30     50
6     1        2         3       50     80     30     50
7     1        3         1       10     10     15    100
8     1        3         2       15     10     15    100
9     1        3         3      100     10     15    100
10    1        4         1       25     25     20     90
11    1        4         2       20     25     20     90
12    1        4         3       90     25     20     90

添加的附加解决方案

r - R中使用多条记录的数据格式化

3 回答 3

Related

Reference