几周以来,我使用以下脚本生成了一个散点图,其中包含大约 10,000 个(非零,正)数据点。由于转换的警告,只有少数 (<20) 个数据点未包括在内。
visual <- ggplot(data=dots, aes(GRNHLin, REDHLin)) +
geom_point(colour=rgb(0.17, 0.44, 0.71), size=0.500, alpha=0.250) +
scale_x_log10(breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x)), limits = c(1,1e4)) +
scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x)), limits = c(1,1e3))
visual
从这周开始,我想做一些基于模型的聚类。我编写的脚本(见下文)使用相同的数据集(10,000 个非零的正数据点),但由于以下原因而遗漏了 9,000 多个数据点:
Warning messages:
1: In self$trans$transform(x) : NaNs produced
2: Transformation introduced infinite values in continuous x-axis
3: In self$trans$transform(x) : NaNs produced
4: Transformation introduced infinite values in continuous y-axis
5: Removed 9692 rows containing missing values (geom_point).
这是第二个脚本:
dots.Mclust <- Mclust(dots, modelNames="VVV", G=8)
visual <- fviz_cluster(dots.Mclust,
ellipse=FALSE,
shape=20,
geom = c("point")) +
scale_x_log10(breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x)), limits = c(1,1e3)) +
scale_y_log10(breaks = trans_breaks("log10", function(x) 10^x),
labels = trans_format("log10", math_format(10^.x)), limits = c(1,1e4))
visual
编辑
一些附加信息:
数据集仅包含大于 0 的值。 Head(dots.Mclust) 提供以下内容:
$data
GRNHLin RED2HLin
[1,] 81.50364 176.379654
[2,] 57.94751 116.310577
[3,] 42.89310 119.758621
[4,] 41.82213 275.607971
[5,] 437.14648 141.309647
[6,] 15.20952 177.128616
[7,] 18.88731 257.249207
[8,] 768.64935 172.374069
[9,] 24.66220 118.283150
[10,] 17.12160 68.955154
[11,] 73.00019 71.517052
[12,] 1182.08911 180.694122
[13,] 320.09827 224.808563
[14,] 268.42401 235.375259
[15,] 149.05655 205.708282
[16,] 98.43160 152.093704
[17,] 25.10120 177.061386
[18,] 293.87103 239.007050
[19,] 118.42249 295.722168
[20,] 724.16718 243.950455
[21,] 255.26083 128.209717
[22,] 105.15983 247.946701
[23,] 86.25691 220.004745
[24,] 122.01743 32.232780
[25,] 50.42104 9.923141
该图在移除 x 轴和 y 轴上的缩放比例后,如下所示。显然,数据点出了问题。数据集中没有负值,但仍有(很多)点低于 0。此外,x 轴和 y 轴不覆盖条目 [12,] 中的值。这可能是问题的根本原因。但是这个错误值的问题是如何发生的呢?
这里的根本问题是什么?