1

这是我想在 R 中复制的 sas 代码,

proc fastclus data = in.stores_standard
maxclusters = 20
outseed= in.out_seed
maxiter = 1000
converge = 0 
strict=5.0; 
var storesize sales_per_sqft sales_per_visits tothhsinta;
id store_nbr;
run;

我的尝试:

library(amap)
set.seed(1)
kmeans_object=Kmeans(stores_standard, 20, iter.max = 1000, nstart = 1, method = c("euclidean"))
p=do.call(rbind, kmeans_object)

无法实现的目标:1) 仅对这些参数运行 kmeans:storesize、sales_per_sqft、sales_per_visits、tothhsinta

2) store_nbr 上的 id

3) R 中的种子函数

谢谢!

4

1 回答 1

4

1)很容易:

want <- c("storesize", "sales_per_sqft", "sales_per_visits", "tothhsinta")
Kmeans(stores_standard[, want], 20, iter.max = 1000, nstart = 1,
       method = c("euclidean"))

对于 2)

 ## a 2-dimensional example from ?Kmeans
 x <- rbind(matrix(rnorm(100, sd = 0.3), ncol = 2),
            matrix(rnorm(100, mean = 1, sd = 0.3), ncol = 2))
 colnames(x) <- c("x", "y")
 cl <- Kmeans(x, 2)

现在看看cl

R> str(cl)
List of 4
 $ cluster : int [1:100] 2 2 2 2 2 2 2 2 2 2 ...
 $ centers : num [1:2, 1:2] 1.0245 -0.017 1.0346 0.0375
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:2] "1" "2"
  .. ..$ : chr [1:2] "x" "y"
 $ withinss: num [1:2] 0.00847 0.22549
 $ size    : int [1:2] 50 50
 - attr(*, "class")= chr "kmeans"

列表的cluster组成部分包含分配的集群 ID。这些与输入数据中的样本顺序相同。如果您想将cluster组件分配为输入数据中的列,我们将执行以下操作:

R> x <- cbind(x, Cluster = cl$cluster)
R> head(x)
               x            y Cluster
[1,] -0.24251497  0.532012889       2
[2,]  0.10957740  0.225168920       2
[3,] -0.35563544 -0.428798979       2
[4,] -0.41251306  0.529953489       2
[5,] -0.61212001 -0.003443993       2
[6,]  0.04435213  0.086595025       2

对于您的数据,请执行以下操作:

stores_standard <- cbind(stores_standard, Cluster = kmeans_object$cluster)

至于 3,这kmeans()在标准 R 和Kmeans()amap中都不可能出现。

于 2012-06-20T08:12:16.773 回答