1

我正在尝试使用 RStudio 执行分层聚类分析,方法是使用包daisy. 这是我的数据集:

data.frame':341 obs. of  28 variables:
$ Impo_Env : Ord.factor w/ 3 levels "Low"<"Med"<"High": 3 2 3 2 3 2 3 3 2 3 ...
$ ComparativePriority_IAS: Ord.factor w/ 3 levels "Low"<"Med"<"High": 3 1 3 2 3 2 3 2 3 2 ...
$ Strategy_Eradication: Ord.factor w/ 3 levels "No intervention"<..: 3 2 3 2 3 2 3 2 2 3 ...
$ Knowl_BiodivLoss: Factor w/ 2 levels "0","1": 2 1 2 2 2 1 2 2 2 2 ...
$ Control_Trade: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
$ Engagement_Retail: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
$ Knowl_PastProj: Factor w/ 2 levels "0","1": 1 1 1 1 2 1 1 1 2 1 ...
$ Priority_IAS: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
$ Knowl_Eradic: Factor w/ 2 levels "0","1": 2 1 2 1 2 2 1 2 2 1 ...
$ Alert_CFS: Factor w/ 2 levels "0","1": 1 2 1 2 1 2 2 1 2 1 ...
$ Alert_Municipality: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ Alert_Park: Factor w/ 2 levels "0","1": 2 1 2 1 2 1 1 2 1 1 ...
$ Alert_Police: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
$ Alert_Firemen: Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 2 ...
$ Supp_AuthorityIAS: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
$ Knowl_Env: Factor w/ 2 levels "0","1": 2 1 2 2 2 2 2 2 2 2 ...
$ Info_Tv: Factor w/ 2 levels "0","1": 2 2 1 2 2 2 2 1 2 1 ...
$ Info_Web: Factor w/ 2 levels "0","1": 2 1 2 2 2 1 2 1 2 2 ...
$ Info_Radio: Factor w/ 2 levels "0","1": 1 1 1 1 1 2 1 2 1 1 ...
$ Info_Magazines: Factor w/ 2 levels "0","1": 1 1 2 1 2 1 1 2 1 1 ...
$ Info_School: Factor w/ 2 levels "0","1": 1 1 2 1 1 1 1 1 1 2 ...
$ Blacklist: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
$ Workshop: Factor w/ 2 levels "0","1": 1 1 2 1 2 1 2 2 1 1 ...
$ SuppFin_FutProj: Factor w/ 2 levels "0","1": 2 1 2 1 2 2 2 2 2 2 ...
$ Tourist_dummy: Factor w/ 2 levels "0","1": 1 1 1 2 2 1 1 1 2 1 ...
$ Gender: Factor w/ 2 levels "Female","Male": 1 2 1 2 1 1 2 2 2 1 ...
$ logIASknown: num  2.89 2.94 2.89 2.56 3.14 ...
$ Age: int  20 41 14 10 26 33 19 59 23 16 ...

我想使用欧几里得距离daisy,但是当我跑步时

daisy(fuu, metric = c("euclidean"), type=list(ordratio = c(1,2,3), asymm=c(4:24), symm=c(25,26)))

输出不好。使用高尔距离代替欧几里得距离:

警告信息:在 daisy(fuu, metric = c("euclidean"), type = list(ordratio = c(1,:with mixed variables, metric "gower" 自动使用

我该如何解决?

4

1 回答 1

0

如集群包中包含的菊花功能文档中的“详细信息”部分所述:

名义、有序和(a)对称二进制数据的处理是通过使用 Gower (1971) 的一般相异系数来实现的。如果 x 包含这些数据类型的任何列,则参数 metric 和 stand 都将被忽略,Gower 系数将用作 metric

换句话说,对于要计算的欧几里德度量(距离作为差异的平方和根),输入列必须是数字(模式)变量(即当 x 是矩阵时的所有列),因此被识别为区间缩放变量,与名义(类因子列)变量或有序(类有序列)变量相反。在类型参数中指定变量类型不会改变这一事实。

在这些前提下,假设它对所有 28 个变量都有意义,尽管其中一些是定性二进制的,您可以尝试将它们转换为as.numeric然后继续,原因是:使用混合变量 metric "gower" overrides being automatically used

于 2016-02-22T09:39:08.747 回答