r - 需要帮助创建一个使用 Google 趋势数据预测失业率的简单模型

Question

亲爱的堆栈溢出社区，

我是使用 R 进行统计编程的新手。我的任务是创建一个简单的自回归模型，使用该模型可以预测，或者我应该说，仅使用来自 Google 趋势的数据来预测一个国家的失业率。为了创建模型，我获得了一个 .csv 文件，其中包含 2011 年至 2015 年（5 年）之间的失业率，以及一个 .csv 文件，其中包含主题“失业”（2011-2015 年）的 Google 趋势值。

可以想象，我已将这两个文件导入 RStudio 并将它们转换为时间序列（60 个月）。这是一个概述：

失业率与谷歌趋势

我现在需要帮助来创建那个 AR 模型。请记住，此模型应尽可能简单，并且并非完美无缺。以下是我的问题：

我应该使用分解的时间序列，即使分解的时间序列的值不是那么令人信服（p 值仍然很高）。
使用 R 和两个时间序列（失业，谷歌）创建自回归模型的最简单方法是什么。然后，应使用此模型使用实际的 Google 趋势值来预测实际失业率。

由于我对 R 不是很有经验，所以我有点迷失了。帮助将不胜感激！

非常感谢！

这是数据（下面的代码中提供了示例）
这是我到目前为止的代码：

# Import required libraries
library(lubridate)
library(tseries)
library(xts)
library(forecast)
library(readr)

# # # # # # # # # # # Unemployment Rate # # # # # # # # # # #

unemploymentRate <- read_csv("~/Desktop/UnemploymentRates_2011-2015.csv")

# Unemployment sample: structure(list(`Month` = 1:10, Year = c(2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L, 2011L), UnemploymentRate = c(7.9, 7.9, 7.6, 7.3, 7, 6.9, 7, 7, 6.6, 6.5)), .Names = c("Month", "Year", "UnemploymentRate"), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))

# Create monthly time series for unemployment rates
tsUnemployment <- ts(unemploymentRate$UnemploymentRate, start = c(2011,1), frequency = 12)

# # # # # # # # # # # Google Trends Topic # # # # # # # # # # #


google <- read_csv("~/Desktop/google.csv", col_types = cols(Woche = col_date(format="%Y-%m-%d")))
colnames(google)[2] <- "googleTrend"

#Google sample: structure(list(Week = structure(c(14976, 14983, 14990, 14997, 15004, 15011, 15018, 15025, 15032, 15039), class = "Date"), Unemployment = c(88L, 89L, 100L, 91L, 88L, 88L, 87L, 91L, 89L, 78L)), .Names = c("Week", "Unemployment"), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))

# Extract month and year from date
google$Month <- month(google$Week, abbr = FALSE)
google$Year <- year(google$Week)

# Aggregate weeks into months using the mean
aggGoogle <- aggregate(google$googleTrends ~ Month + Year , google, mean)
colnames(aggGoogle)[3] <- "aggGoogleTrends"

# Create monthly time series for the Google Trends
tsGoogle <- ts(aggGoogle$aggGoogleTrends, start = c(2011,1), frequency = 12)

# # # # # # # # # # # Decomposition + Analysis # # # # # # # # # # #

decompose_Unemployment <- decompose(tsUnemployment, "additive")
decompose_Google <- decompose(tsGoogle, "additive")

finalUnemployment <- decompose_Unemployment$seasonal + decompose_Unemployment$trend + decompose_Unemployment$random
finalGoogle <- decompose_Google$seasonal + decompose_Google$trend + decompose_Google$random

现在，我已准备好执行统计测试：

adf.test(tsUnemployment, alternative = "stationary")
Box.test(tsUnemployment, type = "Ljung-Box")
Box.test(finalUnemployment, type = "Ljung-Box")

adf.test(tsGoogle, alternative = "stationary")
Box.test(tsGoogle, type = "Ljung-Box")
Box.test(finalGoogle, type = "Ljung-Box")

score 0 · Accepted Answer

（就像@eipi10 评论的那样，这更像是一个交叉验证、数据科学或数学的问题，尤其是你似乎对代码和统计测试没有问题。如果你在这里得到的答案没有帮助，你应该考虑在那些地方问）

对问题 1 的建议：这个问题特别难以回答，因为它非常依赖于您的数据。根据此页面，如果您决定使用 AR，那么应用分解模型是一件合适的事情。但是，这并不意味着分解是您唯一的选择。

问题 2 的建议：要在 R 中实现自回归 (AR) 模型，最简单的方法是从statspackage.json 中实现。stats::ar如果您有时间序列数据集，该函数应该适合您。如果你的数据是data.frame时间序列但不是时间序列（ts），你可以使用该函数stats::ts进行转换。

r - 需要帮助创建一个使用 Google 趋势数据预测失业率的简单模型

1 回答 1

Related

Reference