r - R data.table 和 kmeans 聚类

Question

我什至不确定这是否可以使用 data.table。我有一个如下所示的数据集。它是一个数据框，但我后来转换为一个 data.table，称为x

id xcord ycord
a  2 3
a  3 4
a  3 3
a  9 10
a  8 9
b  1 3
b  1 2
b  8 19
b  7 21

我想为每个 id 识别两个集群，事实证明这很困难。我尝试了以下

x = x[,list(x1 = kmeans(xcord,centers=2)$centers, y1 = kmeans(ycord,centers=2)$centers,by = id]

但它给出了以下错误消息。 All items in j=list(...) should be atomic vectors or lists. If you are trying something like j=list(.SD,newcol=mean(colA)) then use := by group instead (much quicker), or cbind or merge afterwards. Calls: [ -> [.data.table Execution halted

我期待一个数据表，其中包含可以“视为”中心列表的条目。这甚至可能吗？

score 4 · Accepted Answer

该centers元素是一个矩阵（它将包含x与kmeans.

如果您想在同一个聚类情节中找到考虑xcord的ycord聚类，您需要将矩阵传递给kmeans. 然后，您必须在之后强制返回 data.table。这将明智地保留名称。

# eg.
fx <- x[,data.table(kmeans(cbind(xcord,ycord),centers=2)$centers),by=id]
fx
#    id    xcord     ycord
# 1:  a 2.666667  3.333333
# 2:  a 8.500000  9.500000
# 3:  b 7.500000 20.000000
# 4:  b 1.000000  2.500000

r - R data.table 和 kmeans 聚类

1 回答 1

Related

Reference