我正在尝试在 PL/R 中使用 kmeans 函数。我创建了下表
CREATE TABLE EMP (NAME1 TEXT, AGE SMALLINT, SALARY NUMERIC );
INSERT INTO EMP VALUES
('Joe', 41, 55000),
('Jill', 27, 25000),
('Jack', 31, 45000),
('Joker', 65, 20000),
('Joy', 22, 31000),
('Jane', 72, 35000),
('Jackson', 42, 65000),
('Jessica', 23, 37000);
我的聚类功能如下
CREATE OR REPLACE FUNCTION CLUS(sal NUMERIC[])
RETURNS INTEGER[] AS
$BODY$
a <- NA
a = kmeans(sal, 3)$cluster
return(a)
$BODY$
LANGUAGE 'plr' ;
当我执行以下查询时
SELECT * , CLUS(ARRAY (SELECT SALARY FROM EMP ORDER BY SALARY)) AS CLUSTER_ID from emp order by salary;
我的输出是
name1 age salary cluster_id
Joker 65 20000 {3,3,2,2,2,2,1,1}
Jill 27 25000 {2,2,2,2,2,3,3,1}
Joy 22 31000 {3,3,2,2,2,2,1,1}
Jane 72 35000 {1,1,1,1,1,2,2,3}
Jessica 23 37000 {1,1,2,2,2,2,3,3}
Jack 31 45000 {2,2,3,3,3,3,1,1}
Joe 41 55000 {1,1,3,3,3,3,2,2}
Jackson 42 65000 {2,2,3,3,3,3,1,1}
我面临的问题是,每次评估一行时,集群都会发生变化,我想要以下输出......
name1 age salary cluster_id
Joker 65 20000 3
Jill 27 25000 3
Joy 22 31000 2
Jane 72 35000 2
Jessica 23 37000 2
Jack 31 45000 2
Joe 41 55000 1
Jackson 42 65000 1
请让我知道是否可以以更好和简单的方式完成