1

I have a question about hierarchical grouping of time-series in R. I currently have this matrix:

           A      B     C      F     G      H      I
[1,] -33.697  8.610 42.31 17.465 24.84 14.210 10.632
[2,]  -4.698 15.993 20.69  6.222 14.47  3.423 11.047
[3,] -37.458  9.687 47.14 14.659 32.49 12.759 19.726
[4,] -23.851 16.517 40.37 14.392 25.98  9.438 16.538
[5,]   3.329 15.629 12.30  3.449  8.85  2.635  6.215
[6,] -38.071  5.746 43.82 15.932 27.89 14.113 13.772

Just by inspection, I can figure out that:

  • G = H + I
  • C = F + G
  • A = B - C

Is there a way that I can find these sum relationships (positive and negative) automatically on large time-series in R? I have tried using an lm() to figure out the relationships but that is too time consuming to do on every series. Not to mention many times there are collinearity problems.

Many Thanks!

structure(list(A = c(-33.6970557915047, -4.69841752527282, -37.457728596637, 
-23.8508993089199, 3.32904924079776, -38.0712462896481), B = c(8.60984595282935, 
15.9929901333526, 9.68719404516742, 16.5167794595473, 15.6285679822322, 
5.74573907931335), C = c(42.306901744334, 20.6914076586254, 47.1449226418044, 
40.3676787684672, 12.2995187414344, 43.8169853689615), F = c(17.4649945173878, 
6.22195235290565, 14.6593122615013, 14.3921482057776, 3.44929573708214, 
15.9315551938489), G = c(24.8419072269462, 14.4694553057197, 
32.4856103803031, 25.9755305626895, 8.8502230043523, 27.8854301751126
), H = c(14.2098777298816, 3.42268325854093, 12.7592747195158, 
9.43778987810947, 2.63517117220908, 14.1129822209477), I = c(10.6320294970647, 
11.0467720471788, 19.7263356607873, 16.5377406845801, 6.21505183214322, 
13.7724479541648)), .Names = c("A", "B", "C", "F", "G", "H", 
"I"), row.names = c(NA, -6L), class = "data.frame")
4

3 回答 3

2

这也使用回归,但它

  • 使用lm.fitlm. fastLm ( rcppArmadillo 和 rcppEigen 中也存在你也可以尝试的。)

  • 通过仅使用独特的组合来避免重复回归。

  • 假设只需要研究三元组以减少计算量(因为在帖子中似乎是这种情况)

  • 假设所有系数都是整数以清理输出

代码是:

eps <- .1
combos <- combn(ncol(DF), 3)
for(j in 1:ncol(combos)) {
    ix <- combos[, j]
    fit <- lm.fit(as.matrix(DF[ix[-1]]), DF[[ix[1]]])
    SSE <- sum(resid(fit)^2)
    if (SSE < eps) {
        ecoef <- round(c(-1, coef(fit)))
        names(ecoef)[1] <- names(DF)[ix[1]]
        print(ecoef)
    }
}

这给出了帖子中的数据:

 A  B  C 
-1  1 -1 
 C  F  G 
-1  1  1 
 G  H  I 
-1  1  1 
于 2013-08-29T13:25:53.433 回答
0

您可以尝试分层聚类方法。这不会为您提供确切的关系和系数,但可以让您了解应该测试的关系。首先,我们准备您的数据。

a<-rbind(c(-33.697,8.610,42.31, 17.465, 24.84, 14.210, 10.632), 
  c(-4.698,15.993,20.69,6.222, 14.47,3.423, 11.047),
  c(-37.458,9.687, 47.14, 14.659, 32.49, 12.759, 19.726),
  c(-23.851,16.517,40.37,14.392,25.98,9.438,16.538),
  c(3.329,15.629,12.30,3.449,8.85,2.635,6.215),
  c(-38.071,5.746,43.82,15.932,27.89,14.113,13.772))
colnames(a)<-c("A", "B", "C", "F", "G", "H", "I")

然后我们计算变量之间的相关性并创建距离,然后我们将其聚类。

dd <- as.dist((1 - cor(a))/2)
plot(hclust(dd))

这应该让您了解不同时间序列之间的关系。结果图如下所示。

聚类树状图

于 2013-08-29T11:18:08.353 回答
0

您可以找到与 的线性依赖关系MASS::Null。它们与您通过目视检查发现的相同,但不如它们稀疏。

library(MASS)
Null(t(d)) # One relation per column
#             [,1]        [,2]        [,3]
# [1,]  0.41403998 -0.04178588  0.45582586
# [2,] -0.41403998  0.04178588 -0.45582586
# [3,] -0.02626794 -0.52439443  0.49812649
# [4,]  0.44030792  0.48260856 -0.04230063
# [5,]  0.62687195 -0.01159430 -0.36153375
# [6,] -0.18656403  0.49420285  0.31923312
# [7,] -0.18656403  0.49420285  0.31923312
as.matrix(d) %*% Null(t(d))  # zero
于 2013-08-29T11:46:17.987 回答