r - 大型数据集上的 reshape2 dcast 错误

翻译自：https://stackoverflow.com/questions/12231020 2012-09-01T20:09:25.337

2613 次

我有一个包含大约 1,800,000 行的 search_query[factor]、movie_name[factor]、clicks[int] 列的数据集。当我使用 reshape2 包中的 dcast 函数尝试从搜索查询和电影名称创建矩阵时，以 click 作为值，我收到此错误：

    train.matrix <- dcast(train, query ~ movie, value.var = "clicks")

    Aggregation function missing: defaulting to length
    Error in .Call("split_indices", index, group, as.integer(n)) : 
       negative length vectors are not allowed
    In addition: Warning message:
    In split_indices(seq_along(.value), .group, .n) :
      NAs introduced by coercion

如果我将数据子集为 100,000 行，那么我可以从 reshape2 包中运行 dcast 就好了。

    train.matrix <- dcast(train[1:100000,], query ~ movie, value.var = "clicks")

电影的值为 69,598，点击值都是正数，没有 NA。运行 R 的 2.15.1 版本。

可能是什么问题，数据集是否太大？如果是这样，我怎样才能用这个数据集实现相同的结果？

提前非常感谢！

r - 大型数据集上的 reshape2 dcast 错误

0 回答 0

Related

Reference