2

我刚从 R 开始,遇到了问题。

我想要hclust一张有很多数据的表。

数据由一个矩阵组成:

> nrow(hell)
[1] 202397
> ncol(hell)
[1] 39840

矩阵由整数组成(大部分为 0,但当然也有更高的值)。

现在我设法在 48core、280gb RAM linux 机器上读取了我的 15gb txtfile,read.table()但是如果我想计算距离矩阵,它会失败并出现以下错误:

d <- dist(as.matrix(hell))
Error in unlist(x,recursive, use.names):
resulting vector exceeds vector length limit in 'AnswerType'

我已经用谷歌搜索但找不到答案(或了解如何处理这个问题)。有机会做我想做的事吗?:(

4

1 回答 1

2
> 202397^2 > .Machine$integer.max
[1] TRUE

R uses integers to index its vectors and matrices, matrices being a sort of folded vector. Some tasks will be too large for R. Even if you divide that product by 2 to account for the fact that a distance matrix only needs to hold the lower triangular part of the n^2/2-n calculations, it still requires a longer vector than R can construct.

> 202397^2/2- 202397 > .Machine$integer.max
[1] TRUE
于 2013-05-28T16:45:01.077 回答