0

这就是我的时间序列、横截面数据的结构:

country     year group  change
Afghanistan 1980   1      0 
Afghanistan 1981   1      0 
Afghanistan 1982   1      1 
Afghanistan 1983   1      0 
Afghanistan 1984   1      0 
Afghanistan 1985   1      1 
Afghanistan 1986   1      0 
Afghanistan 1987   1      2 
Afghanistan 1988   1      0 
Bhutan      1980   2      0 
Bhutan      1981   2      0 
Bhutan      1982   2      0 
Bhutan      1983   2      0 
Bhutan      1984   2      1 
Bhutan      1985   2      0 
Bhutan      1986   2      0 
Bhutan      1987   2      0 
Bhutan      1988   2      2 
Chile       1980   3      0

如果有正变化,则变量变化为“1”,如果有负变化,则为“2”。

问题

我正在努力创建两个新变量:

(1) 一个叫做“趋势”的变量

通俗地说,这个变量应该代表“对于每个组(国家年),如果变化 = 1,趋势 = 1,但直到变化 = 2 ”。

(2) 一个叫做“时间”的变量

此变量应指定积极趋势(变化 =1)之前和之后的年份。

也就是说,最终数据集应如下所示:

country     year group  change  trend  time
Afghanistan 1980   1      0      0      -2
Afghanistan 1981   1      0      0      -1
Afghanistan 1982   1      1      1       1
Afghanistan 1983   1      0      1       2
Afghanistan 1984   1      0      1       3
Afghanistan 1985   1      1      1       4
Afghanistan 1986   1      0      1       5
Afghanistan 1987   1      2      0       0
Afghanistan 1988   1      0      0       0
Bhutan      1980   2      0      0      -4
Bhutan      1981   2      0      0      -3
Bhutan      1982   2      0      0      -2
Bhutan      1983   2      0      0      -1
Bhutan      1984   2      1      1       1
Bhutan      1985   2      0      1       2
Bhutan      1986   2      0      1       3
Bhutan      1987   2      0      1       4
Bhutan      1988   2      2      0       0
Chile       1980   3      0      0       0

我认为可以使用“拆分”来分隔组,例如

data$trend <- split(data$group, data$group)  # separate by unique values
[...]
data$trend <- unsplit(data$trend, data$group)  # make back into a vector

但是:这两行之间的命令是什么?

此行将生成一个序列

data.time$trend <- lapply(data.time$trend, seq)

但是:如何将其限制为积极趋势,即 data$trend==1?

任何想法都非常受欢迎!非常感谢。

4

2 回答 2

3

像下面这样的东西会做。关键显然要写得妥当myFunc

DF
##        country year group change
## 1  Afghanistan 1980     1      0
## 2  Afghanistan 1981     1      0
## 3  Afghanistan 1982     1      1
## 4  Afghanistan 1983     1      0
## 5  Afghanistan 1984     1      0
## 6  Afghanistan 1985     1      1
## 7  Afghanistan 1986     1      0
## 8  Afghanistan 1987     1      2
## 9  Afghanistan 1988     1      0
## 10      Bhutan 1980     2      0
## 11      Bhutan 1981     2      0
## 12      Bhutan 1982     2      0
## 13      Bhutan 1983     2      0
## 14      Bhutan 1984     2      1
## 15      Bhutan 1985     2      0
## 16      Bhutan 1986     2      0
## 17      Bhutan 1987     2      0
## 18      Bhutan 1988     2      2


myFunc <- function(x) {
    trend <- rep(0, nrow(x))

    trendStart <- which(x$change == 1)[1]
    trendEnd <- which(x$change == 2)[1] - 1

    trend[seq(from = trendStart, to = trendEnd)] <- 1

    time <- c(seq(from = 1 - trendStart, to = -1), seq(from = 1, to = trendEnd + 1 - trendStart), rep(0, nrow(x) - trendEnd))

    return(cbind(x, trend, time))

}

LL <- split(DF, DF$group)

do.call(rbind, lapply(LL, myFunc))
##          country year group change trend time
## 1.1  Afghanistan 1980     1      0     0   -2
## 1.2  Afghanistan 1981     1      0     0   -1
## 1.3  Afghanistan 1982     1      1     1    1
## 1.4  Afghanistan 1983     1      0     1    2
## 1.5  Afghanistan 1984     1      0     1    3
## 1.6  Afghanistan 1985     1      1     1    4
## 1.7  Afghanistan 1986     1      0     1    5
## 1.8  Afghanistan 1987     1      2     0    0
## 1.9  Afghanistan 1988     1      0     0    0
## 2.10      Bhutan 1980     2      0     0   -4
## 2.11      Bhutan 1981     2      0     0   -3
## 2.12      Bhutan 1982     2      0     0   -2
## 2.13      Bhutan 1983     2      0     0   -1
## 2.14      Bhutan 1984     2      1     1    1
## 2.15      Bhutan 1985     2      0     1    2
## 2.16      Bhutan 1986     2      0     1    3
## 2.17      Bhutan 1987     2      0     1    4
## 2.18      Bhutan 1988     2      2     0    0
于 2013-04-25T03:48:50.997 回答
1

这是使用的替代解决方案ddply(假设您的 df 名为 mydata):

changeTime <- function(x) {        # time function

    if (max(x)==0) return(0)       # checking for empty events

    y <- (1:length(x)-match(1,x))  # pre-constructing time
    y[y>=0] <- y[y>=0]+1           # adding extra 1

    if (!is.na(match(2,x))) {
      y[match(2,x):length(x)] <- 0 # setting 0 after 2
    }
    return(y)
}

changeTrend <- function(x) {       # trend function

    y <- cummax(x)   # using cumulative maximum function
    y[y>=2] <- 0     # remove trailing 2's
    return(y)

}

require(plyr)
ddply(mydata,.(country),mutate,trend=changeTrend(change),time=changeTime(change))

PS 我想事件本身的时间应该是 0,而不是 1。如果是这种情况,那么应该删除在第一个函数中添加额外 1 的行。

于 2013-04-25T07:41:19.843 回答