1

我有多个湖泊的数据集,随着时间的推移水位升高。观察结果不规则地间隔并且有许多大的间隙。此外,一些较旧的观察结果可能质量较低或未知。我创建了一个单独的模型,它可以很好地预测随时间变化的水位,但仍然会以不同的数量错过实际观察结果。

我想创建第三组输入/插值数据,其中解决方案是:

由缺失观测值的模型值通知交叉高度加权的观测值,并由较低加权的观测值通知

到目前为止,我已经使用 fable 包的 TSLM->interpolate 来执行此操作。它工作得相当好,但我看不到将权重引入过程的方法。此外,它在很大程度上依赖于全局系数和截距,当建模值明显错过观察值时,它变得有点不稳定。我在想我需要使用某种加权黄土,它依赖于局部系数并且可以适应加权。

library(dplyr)
library(tsibble)
library(fable)
library(ggplot2)

test_data <-  data.frame(obs_year = c(2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009,2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009),
                       site_name = c("Lake1","Lake1","Lake1","Lake1","Lake1","Lake1","Lake1","Lake1","Lake1","Lake1","Lake2","Lake2","Lake2","Lake2","Lake2","Lake2","Lake2","Lake2","Lake2","Lake2"),
                       observed = c(100,200,NA, NA, NA, NA, 220, NA, NA, 125, NA,NA,425, NA, 475, NA, 450, 450, 475, 500),
                       weights = c(1,1,NA, NA, NA, NA, 2, NA, NA, 2, NA,NA,2, NA, 1, NA, 2, 2, 2, 2),
                       modeled = c(110,120,165,150, 200, 225, 240, 250, 150, 130, 450,430,415,400, 425, 450, 460, 460, 470, 490))

test_tsibble <- as_tsibble(test_data, key = site_name, index = obs_year)

tslm_interpolate <- test_tsibble %>%
  group_by(site_name) %>% 
  model(lm  = TSLM(observed~modeled)) %>%
  fabletools::interpolate(test_tsibble)

tslm_interpolate <- left_join(tslm_interpolate, test_data, by = c("site_name", "obs_year")) %>% 
  dplyr::select(obs_year, site_name, observed = observed.y, imputed = observed.x, modeled, weights)

tslm_interpolate %>% 
  ggplot(aes(x=obs_year))+
  geom_line(aes(y = imputed), color = "blue")+
  geom_line(aes(y = modeled), color = "red")+
  geom_point(aes(y = observed), color = "green")+
  facet_wrap(~site_name, scales = "free_y")
4

0 回答 0