r - 按平均变化百分比推断每组的缺失数据

Question

我有一个包含 2010-2014 年邮政编码平均收入的数据框。我想要 2015-2017 年的数据，所以我正在寻找一种方法来根据每个邮政编码组在可用年份的年平均变化来推断这一点。

例如：

year  zip   income
2010  1111   5000
2011  1111   5500
2012  1111   6000
2013  1111   6500
2014  1111   7000
2010  2222   5000
2011  2222   6000
2012  2222   7000
2013  2222   8000
2014  2222   9000

应该（大致）有：

year  zip   income
2010  1111   5000
2011  1111   5500
2012  1111   6000
2013  1111   6500
2014  1111   7000
2015  1111   7614
2016  1111   8282
2017  1111   9009
2010  2222   5000
2011  2222   6000
2012  2222   7000
2013  2222   8000
2014  2222   9000
2015  2222   10424
2016  2222   12074
2017  2222   13986

基于邮政编码 1111 的平均增长率为 8.78%，邮政编码 2222 的平均增长率为 15.83%。

score 1 · Accepted Answer

这是一个非常快速的混乱 data.table 想法

library(data.table)

#Create data
last_year <- 2014 
dt <- data.table(year=rep(2010:last_year,2),
             zip=c(rep(1111,5),rep(2222,5)),
             income=c(seq(5000,7000,500),seq(5000,9000,1000)))

#Future data
dt_fut <- data.table(year=rep((last_year+1):2017,2),
           zip=c(rep(1111,3),rep(2222,3)),
           income=rep(NA_integer_,6))

#calculate mean percentage change per year
dt[,avg_growth:=mean(diff(log(income))),by=zip]
#bind old with future data
dt <- rbindlist(list(dt,dt_fut),fill=T);setorder(dt,zip,year)

#carry last value forward replace NA 
dt[,avg_growth:=na.locf(avg_growth),by=zip][,income:=na.locf(income),by=zip]

#calculate
# after 2014+1 (2015) then replace income 
# with income*cumulative product of the average growth (1+r)-1
dt[year>=last_year+1,income:=income*cumprod(1+avg_growth)-1,by=zip][]

r - 按平均变化百分比推断每组的缺失数据

1 回答 1

Related

Reference