我想用 kmeans 将大数据矩阵(500 万 X 512)聚类到 5000 个中心。我使用 R 是为了不让这个矩阵破坏我的记忆。
我编写了这段代码来将 txt 矩阵转换为 xdf 然后聚类:
rxTextToXdf(inFile = inFile, outFile = outFile)
vars <- rxGetInfo(outFile,getVarInfo=TRUE)
myformula <- as.formula(paste("~", paste(names(vars$varInfo), collapse = "+"), sep=""))
clust <- rxKmeans(formula = myformula, data = outFile,numClusters = 5000, algorithm = "lloyd", overwrite = TRUE)
write.table(clust$centers, file = centersFiletxt, sep=",", row.names=FALSE, col.names=FALSE)
但它已经运行了一周。任何想法如何使它更快?