0

我有一个面板数据,数据框有三个人,每个人有4个时期的观察,

    test.data <- data.frame(
            id = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3),
            t = rep(1:4, 3), var1 = runif(12), var2 = runif(12)
    )

它应该看起来像这样

        id  t   var1    var2
    1   1   1   0.2851789   0.66365753
    2   1   2   0.6630548   0.07679873
    3   1   3   0.9000371   0.17182666
    4   1   4   0.8782424   0.11931904
    5   2   1   0.2642084   0.70807513
    6   2   2   0.9993678   0.48880088
    7   2   3   0.5662814   0.49188144
    8   2   4   0.7335935   0.74017649
    9   3   1   0.9868327   0.32792638
    10  3   2   0.5388366   0.05465845
    11  3   3   0.8814602   0.45199318
    12  3   4   0.9066551   0.89814063

现在我想得到每两个连续时间段的平均值(即将 t=1 和 t=2 组合为一个时间段,其值是两者的平均值),并将时间序列缩短为 2 个时间段。结果应该是这样的

        id  t   var1    var2
    1   1   1   0.4495637   0.88822370
    2   1   2   0.2770255   0.68399219
    3   2   1   0.8125967   0.15395440
    4   2   2   0.6232424   0.02663445
    5   3   1   0.8965059   0.79910001
    6   3   2   0.1109559   0.47906885

我该如何管理?

我看到有人已经在stackoverflow上问过同样的问题,但它在mysql中(这里如何组合几个时间跨度),我想知道R中是否有解决方案。(我无法阅读mysql代码..)

提前感谢,非常感谢!

EDIT: @dimitris_ps has already given an answer for the problem, and I wonder whether there is a more generic solution. what if the data frame is like below and has 50 variables?

        id  t   var1    var2
    1   1   1991    0.3900957   0.49582924
    2   1   1992    0.1157777   0.50907756
    3   1   1993    0.1358916   0.05172451
    4   1   1994    0.2608382   0.25032905
    5   2   1991    0.8958081   0.97127891
    6   2   1992    0.2265558   0.73085533
    7   2   1993    0.2310969   0.63263599
    8   2   1994    0.4302372   0.48394795
    9   3   1991    0.7823354   0.75783991
    10  3   1992    0.3295121   0.78468692
    11  3   1993    0.2771166   0.59183611
    12  3   1994    0.1905194   0.64325034
4

1 回答 1

0

This should work for you.

library(dplyr)

test.data %>% mutate(t=ceiling(t/2)) %>% group_by(id, t) %>% 
  summarise(var1=mean(var1), var2=mean(var2)) %>% ungroup

Also when you are creating random number use set.seed(x), where x is some number, before, i.e.

set.seed(123)
test.data <- data.frame(
        id = c(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3),
        t = rep(1:4, 3), var1 = runif(12), var2 = runif(12)
)

Update

A more general solution

test.data %>% group_by(id) %>% arrange(t) %>% mutate(t=ceiling(rank(t)/2)) %>% 
  group_by(id, t) %>% summarise(var1=mean(var1), var2=mean(var2)) %>% ungroup

Now depending on the number of groups you want to create change the 2 in rank(t)/2. For example if you wanted 4 groups put (4/# of obs per user) = 1

于 2015-04-15T12:13:23.890 回答