我有一个数据集,我正在计算它的距离矩阵。下面是数据,有 251 个观测值。
> str(mydata)
'data.frame': 251 obs. of 7 variables:
$ BodyFat: num 12.3 6.1 25.3 10.4 28.7 20.9 19.2 12.4 4.1 11.7 ...
$ Weight : num 154 173 154 185 184 ...
$ Chest : num 93.1 93.6 95.8 101.8 97.3 ...
$ Abdomen: num 85.2 83 87.9 86.4 100 94.4 90.7 88.5 82.5 88.6 ...
$ Hip : num 94.5 98.7 99.2 101.2 101.9 ...
$ Thigh : num 59 58.7 59.6 60.1 63.2 66 58.4 60 62.9 63.1 ...
$ Biceps : num 32 30.5 28.8 32.4 32.2 35.7 31.9 30.5 35.9 35.6 ...
我标准化数据。
means = apply(mydata,2,mean)
sds = apply(mydata,2,sd)
nor = scale(mydata,center=means,scale=sds)
当我计算距离矩阵时,我可以看到很多空值,而且距离仅从 4 个观测值中测量。
distance =dist(nor)
> str(distance)
'dist' num [1:31375] 1.33 2.09 1.9 3.08 3.99 ...
- attr(*, "Size")= int 251
- attr(*, "Labels")= chr [1:251] "1" "2" "3" "4" ...
- attr(*, "Diag")= logi FALSE
- attr(*, "Upper")= logi FALSE
- attr(*, "method")= chr "euclidean"
- attr(*, "call")= language dist(x = nor)
> distance # o/p omitted from this post as it has 257 observations.
1 2 3 4 5 6 7
2 1.3346445
3 2.0854437 2.5474796
4 1.8993458 1.4908813 2.5840752
5 3.0790252 3.4485667 2.2165366 2.7021809
8 9 10 11 12 13 14
2
3
4
5
15 16 17 18 19 20 21
对于剩余的 247 个比较,此列表为空。
现在,我将数据集减少到 20 个观察值
在这里,我得到了一个适当的距离矩阵。
distancetiny=dist(nor)
> str(distancetiny)
'dist' num [1:1176] 1.14 1.8 1.61 2.62 3.39 ...
- attr(*, "Size")= int 49
- attr(*, "Labels")= chr [1:49] "1" "2" "3" "4" ...
- attr(*, "Diag")= logi FALSE
- attr(*, "Upper")= logi FALSE
- attr(*, "method")= chr "euclidean"
- attr(*, "call")= language dist(x = nor)
> distancetiny
1 2 3 4 5 6 7
2 1.1380433
3 1.7990293 2.2088928
4 1.6064118 1.2871522 2.2483586
5 2.6235853 2.9669283 1.9132224 2.3256624
6 3.3898119 3.3730508 3.3718447 2.2615557 2.0094434
7 1.8947704 2.0065514 1.7685604 1.1065940 1.7387938 2.2321156
8 1.1732465 1.0663217 1.6733689 0.8873140 2.1959298 2.7939555 1.1448269
9 2.2721969 2.0545882 3.4263262 1.4058375 3.1811955 2.4011074 2.3078714
10 2.3753110 2.2424464 3.0289947 1.2808398 2.3230202 1.4242653 1.8571654
11 1.5620472 1.1878554 2.5750350 0.5718248 2.7714795 2.6314286 1.5132365
12 3.5088571 3.2484020 4.1164488 2.2723772 3.1377318 1.4795230 2.8274818
13 2.1448841 2.2679705 1.8726670 1.3494988 1.2176727 1.5544030 1.0725518
14 3.6679035 3.7459402 3.6869023 2.6677308 2.1318420 0.7347359 2.5729973
15 2.9908457 3.3312661 3.1289870 2.4340473 1.8027070 1.3626019 2.3795360
16 1.6117570 2.0283356 1.2011116 1.5961064 1.3196981 2.4456436 1.2569683
17 3.2991393 3.5991747 3.0438049 2.6066933 1.4742664 1.0945621 2.2214101
18 3.9409008 4.0726826 4.0113908 2.9250144 2.5228901 0.9087254 2.8158563
19 2.7468511 2.9495031 3.2439229 1.8312508 2.4122436 1.3932604 1.9640170
20 3.7515064 3.7021743 3.9404231 2.5813440 2.5390519 0.8352961 2.6530503
21 2.3102053 2.3878491 2.0836800 1.4328028 1.2991221 1.5287862 1.1769205
当观察值为 21 时,输出中没有空值。
为什么会这样?当观察计数超过阈值时 dist() 是否不起作用?
我无法弄清楚。请帮忙。