3

pvclust is great for cluster analysis in R. However, when running it as part of a batch operation, it is annoying to get different results for the same data. Obviously, there are many "correct" clusterings of the same data, and it seems that pvclust uses some randomness to determine the clusters of a specific run. But is there any way to get deterministic results?

I want to be able to present a minimal, repeatable analysis package: the data plus an R script, and a separate written document that contains my interpretations of the clustering. It is then possible for others to add to the analysis, e.g. by changing the aesthetic appearance of plots. Now, the interpretations will always be out of sync with what someone else gets when they run the script containing pvclust.

4

2 回答 2

6

不仅用于聚类分析,而且当涉及随机性时,您可以修复随机数生成器,以便始终获得相同的结果。

尝试:

set.seed(seed=123)
# your code here

seed可以是任何整数,也可以是可以转换为整数的东西。就这样。

于 2014-01-02T05:53:08.117 回答
2

我只使用了 k 手段。在那里,我必须将“运行”或迭代的数量设置为比默认值更高的值,以便在连续运行中获得相同的 custers。

于 2014-01-02T06:08:18.113 回答