我正在编写一个函数,该函数对 xts 对象中的变量对执行 PCA,直到所有变量之间的相关性小于 0.1。这是我写的函数:
PCA_Selection <- function(X, r=0.1){
M <- cor(X) # Creating corrolation matrix
M[M==1] <- 0 # filling the diagnal with 0s so that pairs of the same variables are not considered
while(max(abs(M)) > r){
M <- cor(X)
PCA_vars <- matrix(,nrow = (nrow(M))^2 ,ncol = 2)
for(i in 1:ncol(M)){ # Selects variables that will be use for PCA
for(j in 1:nrow(M)){
if(M[j,i] > r & M[j,i] < 1){
PCA_vars[c(i*j),] <- c(row.names(M)[i],colnames(M)[j])
}}} # works
PCA_vars <- na.omit(PCA_vars) # works
for (i in 1:nrow(PCA_vars)) {
PCA_pre <- prcomp(X[,c(names(X) %in% PCA_vars[i,])])
Sum_PCA <- summary(PCA_pre)
tmp <- data.frame()
if (Sum_PCA[["importance"]][2,1] > 0.95){ # if the first component captures 95% of variance
tmp <- data.frame(predict(PCA_pre, X)[,1]) # then only use the first component for predictions
names(tmp) <- c(paste0("Com_",PCA_vars[i,1],"_",PCA_vars[i,2],"_1"))
}else { # else use all both of the component and do not reduce the dimensions
tmp <- predict(PCA_pre,X)
colnames(tmp) <- c(paste0("Com_",PCA_vars[i,1],"_",PCA_vars[i,2],"_1"),
paste0("Com_",PCA_vars[i,1],"_",PCA_vars[i,2],"_2"))
}
Xnew <- cbind(X,tmp)
X <- Xnew
}
PCA_vars <- unique(as.vector(PCA_vars)) # Variables to be removed
X <- X[, -which(colnames(X) %in% PCA_vars)]
M <- cor(X)
M[M==1] <- 0
}
return(Xnew)
}
但是,当我运行函数 r 返回一个奇怪的错误:
Error in colMeans(x, na.rm = TRUE): 'x' must be numeric
我用来测试函数的数据是一个 xts 对象,它没有任何缺失的观察结果。此外,所有变量都具有非零方差,并且数据中只有连续的数值变量。