1

在 R 工作。我想使用初始值和一组转换参数来预测流行的时间序列。对于以下结构的数据

 cohort <- c(1980,1981,1982)
 A00 <- c(.15, .2,.4)
 B00 <- c(.25, .3, .4) 
 C00 <-c(.6, .5,.2)
 Tab<-c(.6,.5,.4)
 Tac<-c(.2,.25,.35)
 ds <- data.frame(cohort,A00,B00,C00,Tab,Tac)
 print (ds)

  cohort  A00  B00 C00 Tab  Tac
1   1980 0.15 0.25 0.6 0.6 0.20
2   1981 0.20 0.30 0.5 0.5 0.25
3   1982 0.40 0.40 0.2 0.4 0.35

A00、B00 和 C00 列中的初始值表示在时间 t=00 时每个组 (A,B,C) 的相关大小。它们在整行中加起来为 1 (A00+B00+C00=1)。参数 Tab 和 Tac 用于使用一些数学模型预测时间 t+1 的流行率,例如

A01   = df$A00 -df$Tab +df$Tac.

在时间 t+1 计算预测值的函数是

 forecast<- function( df ) {
  dsResult <- data.frame(
    cohort= df$cohort,
    A01   = df$A00 -df$Tab +df$Tac ,    
    B01   = df$B00 -df$Tab +df$Tac,    
    C01  =  df$C00 -df$Tab +df$Tac    

  )
  dsResult<- merge(df,dsResult,by="cohort")
  return( dsResult)
}
new<-forecast(ds)

并产生以下结果

  cohort  A00  B00 C00 Tab  Tac   A01   B01  C01
1   1980 0.15 0.25 0.6 0.6 0.20 -0.25 -0.15 0.20
2   1981 0.20 0.30 0.5 0.5 0.25 -0.05  0.05 0.25
3   1982 0.40 0.40 0.2 0.4 0.35  0.35  0.35 0.15

我非常感谢您帮助我学习如何编写一个循环来循环预测所需的年数(例如,对于 1:7 中的 t)。提前致谢!

4

1 回答 1

2

Initially I'd like to make two suggestions that might make the problem easier to code. First, revise the data schema so that each year is a unique row, and each group is a unique column. Second, since the cohorts are treated mathematically independent of each other, keep them separate for now, at least until the code's kernel is built. Put a loop around this later that cycles through them. In the first block of code, there are two matrices, one with observed data, and one that will collect the predicted data.

yearCount <- 7 #Declare the number of time points.
groupCount <- 3 #Declare the number of groups.

#Create fake data that sum to 1 across rows/times.
ob <- matrix(runif(yearCount*groupCount), ncol=groupCount)
ob <- ob / apply(ob, 1, function( x ){ return( sum(x) )})

#Establish a container to old the predicted values.
pred <- matrix(NA_real_, ncol=groupCount, nrow=yearCount)

t12<-.5; t13<-.2; t11<-1-t12-t13 #Transition parameters from group 1
t21<-.2; t23<-.4; t22<-1-t21-t23 #Transition parameters from group 2
t31<-.3; t32<-.1; t33<-1-t31-t32 #Transition parameters from group 3

for( i in 2:yearCount ) {
  pred[i, 1] <- ob[i-1, 1]*t11 + ob[i-1, 2]*t21 + ob[i-1, 3]*t31
  pred[i, 2] <- ob[i-1, 1]*t12 + ob[i-1, 2]*t22 + ob[i-1, 3]*t32
  pred[i, 3] <- ob[i-1, 1]*t13 + ob[i-1, 2]*t23 + ob[i-1, 3]*t33
}

#Calculate the squared errors
ss <- (pred[-1, ] - ob[-1, ])^2 #Ignore the first year of data

Inside the loop, you probably notice the familiar structure of matrix multiplication. Each row can be slightly condensed using inner products (ie, one row of the ob matrix is multiplied, then summed with a one "column" of the ts. I'm using t12 slightly differently than the Tab in your post; this is the probability of transitioning from group 1 to group 2 at a given time point.

#Create transition parameters that sum to 1 across rows/groups.
tt <-  matrix(runif(groupCount*groupCount), ncol=groupCount)
tt <- tt / apply(tt, 1, function( x ){ return( sum(x) )})

Pretend the tt matrix was defined earlier, instead of the separate variables of t11,...,t33.

for( i in 2:yearCount ) {
  pred[i, 1] <- ob[i-1, ] %*% tt[, 1] 
  pred[i, 2] <- ob[i-1, ] %*% tt[, 2]
  pred[i, 3] <- ob[i-1, ] %*% tt[, 3]
}

The loop's contents are slightly cleaner than when each element pair was explicitly multiplied and summed. But we don't have to treat each row/column pair individually. All three columns of the ob matrix can be operated on by all three columns of the tt matrix simultaneously:

for( i in 2:yearCount ) {
  pred[i, ] <- ob[i-1, ] %*% tt
}

This should be much quicker than even the previous version, because R's internal memory system isn't recreating the matrix three times for each row -only once per row. To reduce this to once per matrix, use the apply function, and then transpose the matrix if that suits your purpose. Finally, notice that the rows represent different years than pred (ie, row i-1 here is the same as row i in pred).

predictionWIthExtraYear <- t(apply(ob, 1, FUN=function(row){row %*% tt}))

To accommodate cohorts, perhaps you could declare a list with three elements (for the 1980, 1981, and 1982 cohorts). Each element would be a unique ob matrix. And create a second list for a unique pred matrix. Or maybe use three dimensional matrices (but that may be more taxing when R recreates the memory with the replacement function).

于 2013-03-23T04:58:57.180 回答