r - R成对PCA函数转换X是非数字对象

Question

我正在编写一个函数，该函数对 xts 对象中的变量对执行 PCA，直到所有变量之间的相关性小于 0.1。这是我写的函数：


PCA_Selection <- function(X, r=0.1){

  M <- cor(X) # Creating corrolation matrix 
  M[M==1] <- 0 # filling the diagnal with 0s so that pairs of the same variables are not considered 
  while(max(abs(M)) > r){
    M <- cor(X)
    PCA_vars <- matrix(,nrow = (nrow(M))^2 ,ncol = 2)
    for(i in 1:ncol(M)){ # Selects variables that will be use for PCA
      for(j in 1:nrow(M)){
        if(M[j,i] > r & M[j,i] < 1){
          PCA_vars[c(i*j),] <- c(row.names(M)[i],colnames(M)[j])
        }}} # works 
    PCA_vars <- na.omit(PCA_vars) # works 
    for (i in 1:nrow(PCA_vars)) {
      PCA_pre <- prcomp(X[,c(names(X) %in% PCA_vars[i,])]) 
      Sum_PCA <- summary(PCA_pre)
      tmp <- data.frame()
      if (Sum_PCA[["importance"]][2,1] > 0.95){ # if the first component captures 95% of variance
        tmp <- data.frame(predict(PCA_pre, X)[,1]) # then only use the first component for predictions 
        names(tmp) <- c(paste0("Com_",PCA_vars[i,1],"_",PCA_vars[i,2],"_1"))
      }else { # else use all both of the component and do not reduce the dimensions 
        tmp <- predict(PCA_pre,X)
        colnames(tmp) <- c(paste0("Com_",PCA_vars[i,1],"_",PCA_vars[i,2],"_1"), 
                        paste0("Com_",PCA_vars[i,1],"_",PCA_vars[i,2],"_2"))
      }
      Xnew <- cbind(X,tmp)
      X <- Xnew
    }

    PCA_vars <- unique(as.vector(PCA_vars)) # Variables to be removed 
    X <- X[, -which(colnames(X) %in% PCA_vars)]

    M <- cor(X)
    M[M==1] <- 0
  }  
    return(Xnew)
}

但是，当我运行函数 r 返回一个奇怪的错误：

Error in colMeans(x, na.rm = TRUE): 'x' must be numeric

我用来测试函数的数据是一个 xts 对象，它没有任何缺失的观察结果。此外，所有变量都具有非零方差，并且数据中只有连续的数值变量。

score 0 · Accepted Answer

错误发生在第 15 行：PCA_pre <- prcomp(X[,c(names(X) %in% PCA_vars[i,])])

实际上，这适用于第一次运行，当 i=1 时。但是由于以下原因，当 i=2 时，它在第二次运行时失败。

在第 27 行，您X通过将其分配给来修改Xnew：

27: X <- Xnew

这是在第 26 行创建的：

26: `Xnew <- cbind(X,tmp)

我无法完全理解。无论如何，tmp分配在第 19 行（如果主成分捕获 > 总方差的 0.95）或第 22 行（如果没有）。

19: tmp <- data.frame(predict(PCA_pre, X)[,1])
22: tmp <- predict(PCA_pre,X)

这也让我感到困惑，因为第 19tmp行将有一个“data.frame”类，而在第 22 行它将有一个“矩阵”类。当您在第 26 行创建对象时，这很重要Xnew（见上文）。如果tmp是一个数据框，那么Xnew将是一个“矩阵”，它没有名称属性：

names(X)
NULL

这就是你在第 15 行得到错误的原因（见上文）；该prcomp函数正在尝试在空集上运行 PCA。

我认为解决方案可能是不使用第 19 行的 data.frame() 函数。

19: tmp <- predict(PCA_pre, X)[,1]

我在样本“xts”数据集上对此进行了测试，但它似乎永远运行。但至少没有错误。

顺便说一句，第 17 行可以省略，因为它似乎没有做任何事情。

17: tmp <- data.frame()

r - R成对PCA函数转换X是非数字对象

1 回答 1

Related

Reference