time-series - ARIMA 节点在 KNIME 中如何使用？

Question

我是 KNIME 的新手，并试图使用 ARIMA 来推断我的时间序列数据。但我未能让 ARIMA Predictor 发挥作用。

输入数据的格式如下

year,cv_diff
2011,-4799.099999999977
2012,60653.5
2013,64547.5
2014,60420.79999999993

我想预测例如 2015 年和 2016 年的值。

我正在使用字符串到日期/时间节点将年份转换为日期。在 ARIMA Learner 中，我只能选择 cv_diff 字段。这是第一个问题：对于“包含单变量时间序列的列”选项，我应该设置要预测的年份列还是变量？但就我而言，我只有一个选择 - cv_diff 变量。之后，我将 Learner 的输出与 ARIMA Predictor 的输入连接起来并执行。执行失败，出现 ' ERROR ARIMA Predictor 2:3 Execute failed: 未找到具有定义时间序列的列。请重新配置节点。

帮助我了解我应该为学习者和预测者设置哪个变量？它应该是非时间序列变量吗？那么，Arima 节点将如何理解将哪一列用作时间序列？

score 1 · Accepted Answer

您应该将设置cv_diff为时间序列变量并将输入也连接到预测器。（并且不要尝试为参数设置太大的值，因为数据点太少，学习将不起作用。）

这是一个例子：

score 0 · Accepted Answer

最后，我想通了。ARIMA 学习者节点的选项“包含单变量时间序列的列”似乎有点令人困惑，尤其是对于那些不熟悉时间序列分析的人。我不应该明确提供任何时间序列字段，因为 ARIMA 将要在其上进行预测的变量视为在相等的时间间隔内收集的，并且它们的间隔类型无关紧要。

我找到了一个很好的解释“单变量时间序列”的含义

The term "univariate time series" refers to a time series that consists of single (scalar) observations recorded sequentially over equal time increments. Some examples are monthly CO2 concentrations and southern oscillations to predict el nino effects. Although a univariate time series data set is usually given as a single column of numbers, time is in fact an implicit variable in the time series. If the data are equi-spaced, the time variable, or index, does not need to be explicitly given. The time variable may sometimes be explicitly used for plotting the series. However, it is not used in the time series model itself.

因此，我应该为 Learner 和 Predictor 选择cv_diff变量，并且不提供任何时间戳或任何其他与时间相关的列。

还有一件事我不明白。我应该训练一些数据系列，然后提供另一个我想要预测的系列。当您只需要提供新数据并且根本没有序列的概念时，这与其他机器学习工作流程略有不同。

time-series - ARIMA 节点在 KNIME 中如何使用？

2 回答 2

Related

Reference