2

我需要一步重塑 R 中的data.frame 。简而言之,对象值的变化(x1 到 x6)逐行可见(从 1990 年到 1995 年):

> tab1[1:10, ] # raw data see plot for tab1
   id value year
1  x1     7 1990
2  x1    10 1991
3  x1    11 1992
4  x1     7 1993
5  x1     3 1994
6  x1     1 1995
7  x2     6 1990
8  x2     7 1991
9  x2     9 1992
10 x2     5 1993

我可以一步一步地进行重塑,有人知道如何一步一步完成吗?

原始数据 表 1 - 看到所有时间序列的最小值为“0”

步骤 1: 表 2 - 重新调整每个时间序列最小值等于“0”。 所有时间都落在 x 轴上

步骤 2: 表 3 -diff()在每个时间轴上应用函数。

第3 步: 表 4 -sort()在每个时间序列上应用函数。

我希望图片足够清晰,可以理解每个步骤。

所以决赛桌看起来像这样:

> tab4[1:10, ]
   id value time
1  x1    -4    1
2  x1    -4    2
3  x1    -2    3
4  x1     1    4
5  x1     3    5
6  x2    -4    1
7  x2    -3    2
8  x2     1    3
9  x2     1    4
10 x2     2    5

在此处输入图像描述

# Source data:
tab1 <- data.frame(id = rep(c("x1","x2","x3","x4","x5","x6"), each = 6),
                   value = c(7,10,11,7,3,1,6,7,9,5,2,3,11,9,7,9,1,
                             0,1,2,2,4,7,4,2,3,1,6,4,2,3,5,4,3,5,6),
                   year = rep(c(1990:1995), times = 6))

tab2 <- data.frame(id = rep(c("x1","x2","x3","x4","x5","x6"), each = 6),
                   value = c(6,9,10,6,2,0,4,5,7,3,0,1,11,9,7,9,1,0,
                             0,1,1,3,6,3,1,2,0,5,3,1,0,2,1,0,2,3),
                   year = rep(c(1990:1995), times = 6))

tab3 <- data.frame(id = rep(c("x1","x2","x3","x4","x5","x6"), each = 5),
                   value = c(3,1,-4,-4,-2,1,2,-4,-3,1,-2,-2,2,-8,-1,
                             1,0,2,3,-3,1,-2,5,-2,-2,2,-1,-1,2,1),
                   time = rep(c(1:5), times = 6))

tab4 <- data.frame(id = rep(c("x1","x2","x3","x4","x5","x6"), each = 5),
                   value = c(-4,-4,-2,1,3,-4,-3,1,1,2,-8,-2,-2,-1,2,
                             -3,0,1,2,3,-2,-2,-2,1,5,-1,-1,1,2,2),
                   time = rep(c(1:5), times = 6))
4

2 回答 2

3

使用data.table,这很简单:

require(data.table) ## 1.9.2
ans <- setDT(tab1)[, list(value=diff(value)), by=id]  ## aggregation
setkey(ans, id,value)[, time := seq_len(.N), by=id] ## order + add 'time' column

请注意,您的“第 1 步”是不必要的,因为您的第二步是计算差异,它不会产生任何影响(因此在此处跳过)。

于 2014-07-16T19:08:54.020 回答
2

听起来您想将一组函数应用于分组变量的每一组。在 R 中有很多方法可以做到这一点(从基础 Rbytapply附加包,如plyrdata.tabledplyr)。我一直在学习如何使用 package dplyr,并提出了以下解决方案。

require(dplyr)

tab4 = tab1 %>%
    group_by(id) %>% # group by id
    mutate(value = value - min(value), value = value - lag(value)) %>% # group min to 0, difference lag 1
    na.omit %>% # remove NA caused by lag 1 differencing
    arrange(id, value) %>% # order by value within each id
    mutate(time = 1:length(value)) %>% # Make a time variable from 1 to 5 based on current order
    select(-year) # remove year column to match final OP output
于 2014-07-15T18:44:03.290 回答