r - 如何将寓言/预测（在 R 中）应用于该数据库？

Question

我正在尝试使用 R 中的 Fable 函数预测多个时间序列。这似乎是最有效的方法，但我对使用 R 非常陌生，所以我目前正在处理很多问题。我只是想向某人寻求建议和想法。我已经找到了如何仅使用预测功能包来做到这一点，但需要很多额外的步骤。我的数据是一个 5701 列和 50 行的 excel。第一行的每一列作为产品的名称，后面的49个值是数字，代表2017年1月到2021年1月的销售额。首先，如何将该表转换为tibble？我知道我需要这样做才能与 Fable 合作，但我被困在如此简单的一步。然后我想输出一个表格，其中包含未来 3 个学期（2021 年 4 月至 2022 年 9 月）的月度预测，其中包含 Product|Date|Model Arima(values)|error of arima(value/values)|model ETS|Error ETS的|模型天真|天真..等的错误。我的主要目标是获得一张表格，其中包含产品|2021 年 4 月/2021 年 9 月的最佳预测|2021 年 10 月/2021 年 3 月的最佳预测|2022 年 4 月/2022 年 9 月的最佳预测|

我正在做的是使用这段代码：

newdata <- read_excel("ALLINCOLUMNS.xlsx")
Fcast <- ts(newdata[,1:5701], start= c(1), end=c(49), frequency=12)
output <- lapply(Fcast, function(x) forecast(auto.arima(x)))
prediction <- as.data.frame(output)
write.table(prediction, file= "C:\\Users\\thega\\OneDrive\\Documentos\\finalprediction.csv",sep=",")

默认情况下，这给了我一些格式为 |product1.Point.Forecast||Product1.Lo.80||Product1.Hi.80|Product1.Lo.95|Product1.Hi.95|Product2.Point.Forecast |...|Product5071.Hi.95|... 无论如何，我不需要 80 和 95 间隔，这让我更难以使用它进行 excel 工作。如何获得以下格式的内容：|点预测产品 1|点预测产品 2|....|点预测产品 5701|，仅显示预测？我知道我必须在预测函数中使用 level=NULL，但它并没有按照我尝试的方式工作。我打算做一个编程来删除这些列，但它不那么优雅。最后，有没有办法显示列中方法的所有错误？我想将最好的方法添加到我的表中，所以我需要验证哪个错误更少。

score 2 · Accepted Answer

{fable} 包在数据格式整齐时效果最佳。在您的情况下，产品应该跨行而不是列表示。您可以在此处阅读有关哪些整洁数据的更多信息：https ://r4ds.had.co.nz/tidy-data.html 完成此操作后，您还可以在此处阅读有关时间序列的整洁数据：https：/ /otexts.com/fpp3/tsibbles.html

如果没有您的数据集，我只能猜测您的Fcast对象（ts()数据）看起来像这样：

Fcast <- cbind(mdeaths,fdeaths)
Fcast
#>          mdeaths fdeaths
#> Jan 1974    2134     901
#> Feb 1974    1863     689
#> Mar 1974    1877     827
#> Apr 1974    1877     677
#> May 1974    1492     522
#> Jun 1974    1249     406
#> Jul 1974    1280     441
#> and so on ...

也就是说，您的每个产品都有自己的列（并且您有 5701 个产品，而不仅仅是 2 我将在示例中使用）。

如果您已经在ts对象中拥有数据，则可以使用as_tsibble(<ts>)将其转换为整洁的时间序列数据集。

library(tsibble)
as_tsibble(Fcast, pivot_longer = TRUE)
#> # A tsibble: 144 x 3 [1M]
#> # Key:       key [2]
#>       index key     value
#>       <mth> <chr>   <dbl>
#>  1 1974 Jan fdeaths   901
#>  2 1974 Feb fdeaths   689
#>  3 1974 Mar fdeaths   827
#>  4 1974 Apr fdeaths   677
#>  5 1974 May fdeaths   522
#>  6 1974 Jan mdeaths  2134
#>  7 1974 Feb mdeaths  1863
#>  8 1974 Mar mdeaths  1877
#>  9 1974 Apr mdeaths  1877
#> 10 1974 May mdeaths  1492

^{由reprex 包于 2021-02-25 创建(v0.3.0)}

设置pivot_longer = TRUE会将列收集为长格式。这种格式适用于{fable}包。我们现在有一key列存储系列名称（您的数据的产品 ID），并且值存储在该value列中。

有了适当格式的数据，我们现在可以使用 autoARIMA()和forecast()来获取预测：

library(fable)
#> Loading required package: fabletools
as_tsibble(Fcast, pivot_longer = TRUE) %>% 
  model(ARIMA(value)) %>% 
  forecast()
#> # A fable: 48 x 5 [1M]
#> # Key:     key, .model [2]
#>    key     .model          index        value .mean
#>    <chr>   <chr>           <mth>       <dist> <dbl>
#>  1 fdeaths ARIMA(value) 1980 Jan N(825, 6184)  825.
#>  2 fdeaths ARIMA(value) 1980 Feb N(820, 6184)  820.
#>  3 fdeaths ARIMA(value) 1980 Mar N(767, 6184)  767.
#>  4 fdeaths ARIMA(value) 1980 Apr N(605, 6184)  605.
#>  5 fdeaths ARIMA(value) 1980 May N(494, 6184)  494.
#>  6 fdeaths ARIMA(value) 1980 Jun N(423, 6184)  423.
#>  7 fdeaths ARIMA(value) 1980 Jul N(414, 6184)  414.
#>  8 fdeaths ARIMA(value) 1980 Aug N(367, 6184)  367.
#>  9 fdeaths ARIMA(value) 1980 Sep N(376, 6184)  376.
#> 10 fdeaths ARIMA(value) 1980 Oct N(442, 6184)  442.
#> # … with 38 more rows

^{由reprex 包于 2021-02-25 创建(v0.3.0)}

您还可以通过在中指定多个模型来计算来自其他模型的预测model()。

Fcast <- cbind(mdeaths,fdeaths)
library(tsibble)
#> 
#> Attaching package: 'tsibble'
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, union
library(fable)
#> Loading required package: fabletools
as_tsibble(Fcast, pivot_longer = TRUE) %>% 
  model(arima = ARIMA(value), ets = ETS(value), snaive = SNAIVE(value)) %>% 
  forecast()
#> # A fable: 144 x 5 [1M]
#> # Key:     key, .model [6]
#>    key     .model    index        value .mean
#>    <chr>   <chr>     <mth>       <dist> <dbl>
#>  1 fdeaths arima  1980 Jan N(825, 6184)  825.
#>  2 fdeaths arima  1980 Feb N(820, 6184)  820.
#>  3 fdeaths arima  1980 Mar N(767, 6184)  767.
#>  4 fdeaths arima  1980 Apr N(605, 6184)  605.
#>  5 fdeaths arima  1980 May N(494, 6184)  494.
#>  6 fdeaths arima  1980 Jun N(423, 6184)  423.
#>  7 fdeaths arima  1980 Jul N(414, 6184)  414.
#>  8 fdeaths arima  1980 Aug N(367, 6184)  367.
#>  9 fdeaths arima  1980 Sep N(376, 6184)  376.
#> 10 fdeaths arima  1980 Oct N(442, 6184)  442.
#> # … with 134 more rows

^{由reprex 包于 2021-02-25 创建(v0.3.0)}

该.model列现在标识用于生成每个预测的模型，其中有 3 个模型。

如果您想并排关注点预测，您可以跨多列tidyr::pivot_wider()预测.mean值。

library(tsibble)
library(fable)
library(tidyr)
Fcast <- cbind(mdeaths,fdeaths)
as_tsibble(Fcast, pivot_longer = TRUE) %>% 
  model(arima = ARIMA(value), ets = ETS(value), snaive = SNAIVE(value)) %>% 
  forecast() %>% 
  as_tibble() %>% 
  pivot_wider(id_cols = c("key", "index"), names_from = ".model", values_from = ".mean")
#> # A tibble: 48 x 5
#>    key        index arima   ets snaive
#>    <chr>      <mth> <dbl> <dbl>  <dbl>
#>  1 fdeaths 1980 Jan  825.  789.    821
#>  2 fdeaths 1980 Feb  820.  812.    785
#>  3 fdeaths 1980 Mar  767.  746.    727
#>  4 fdeaths 1980 Apr  605.  592.    612
#>  5 fdeaths 1980 May  494.  479.    478
#>  6 fdeaths 1980 Jun  423.  413.    429
#>  7 fdeaths 1980 Jul  414.  394.    405
#>  8 fdeaths 1980 Aug  367.  355.    379
#>  9 fdeaths 1980 Sep  376.  365.    393
#> 10 fdeaths 1980 Oct  442.  443.    411
#> # … with 38 more rows

^{由reprex 包于 2021-02-25 创建(v0.3.0)}

您可以在此处了解如何评估这些模型/预测的准确性：https ://otexts.com/fpp3/accuracy.html

r - 如何将寓言/预测（在 R 中）应用于该数据库？

1 回答 1

Related

Reference