8

我有一个简单的矩阵,例如

test <- matrix(c("u1","p1","u1","p2","u2","p2","u2",
                 "p3","u3","p1","u4","p2","u5","p1",
                 "u5","p3","u6","p3","u7","p4","u7",
                 "p3","u8","p1","u9","p4"),
               ncol=2,byrow=TRUE) 
colnames(test) <- c("user","product")
test1<-as.data.frame(test)

测试:

   user   product
1  u1      p1
2  u1      p2
3  u2      p2 
4  u2      p3
5  u3      p1
6  u4      p2
7  u5      p1
8  u5      p3
9  u6      p3
10 u7      p4
11 u7      p3
12 u8      p1
13 u9      p4

我想统计有多少用户一起购买了产品对,比如p1&p2,p2&p3...

table(test1$product,test1$product)给我这个:

     p1   p2  p3  p4
 p1   4   0   0   0
 p2   0   3   0   0
 p3   0   0   4   0
 p4   0   0   0   2

我怎样才能得到正确的结果:

     p1   p2  p3  p4
 p1   4   1   1   0
 p2   1   3   1   0
 p3   1   1   4   1
 p4   0   0   1   2
4

3 回答 3

13

查看您想要的输出,您正在寻找crossprod功能:

crossprod(table(test1))
#        product
# product p1 p2 p3 p4
#      p1  4  1  1  0
#      p2  1  3  1  0
#      p3  1  1  4  1
#      p4  0  0  1  2

这与crossprod(table(test1$user, test1$product))(反映丹尼斯的评论)相同。

于 2013-11-14T12:32:52.730 回答
4

标记到这篇文章的一个类似问题要求一个有效的解决方案,但现在被删除而不删除。我们决定在这里发布解决方案。

这是一个RcppEigen做叉积的

library(RcppEigen)
library(inline)
prodFun <- '
        typedef Eigen::Map<Eigen::MatrixXi> MapMti;
        const MapMti B(as<MapMti>(BB));
        const MapMti C(as<MapMti>(CC));
        return List::create(B.adjoint() * C);
        '

funCPr <- cxxfunction(signature(BB= "matrix", CC = "matrix"),
                     prodFun, plugin = "RcppEigen") 
tbl <- table(test1)
funCPr(tbl, tbl)[[1]]
#     [,1] [,2] [,3] [,4]
#[1,]    4    1    1    0
#[2,]    1    3    1    0
#[3,]    1    1    4    1
#[4,]    0    0    1    2

基准

set.seed(24)
test2 <- data.frame(user = sample(1:5000, 1e6, replace=TRUE),
    product = sample(paste0("p", 1:50), 1e6, replace = TRUE),
    stringsAsFactors=FALSE)
tbl1 <- table(test2)

library(microbenchmark)
microbenchmark(cPP = funCPr(tbl1, tbl1)[[1]], 
              CrossP = crossprod(tbl1),
              adjMat = adjmat(tbl1)$adjacency,
              unit = "relative", times = 10L)
#Unit: relative
#   expr      min       lq     mean   median       uq       max neval cld
#    cPP 1.000000 1.000000 1.000000 1.000000 1.000000  1.000000    10  a 
# CrossP 2.079867 2.070509 2.234376 2.074388 2.290516  2.676798    10  a 
# adjMat 6.223034 6.500791 9.619088 7.197824 7.771270 31.394812    10   b

注意:这可以通过执行tableinRcpp

于 2017-04-26T13:00:18.347 回答
3

Ananda 的解决方案非常出色(重量更轻,不需要外部封装),但我正在放下另一个。我相信这被称为邻接矩阵(如果我错了,聪明的人可以随意编辑它):

library(qdap)
adjmat(table(test1))$adjacency

##        product
## product p1 p2 p3 p4
##      p1  4  1  1  0
##      p2  1  3  1  0
##      p3  1  1  4  1
##      p4  0  0  1  2
于 2013-11-14T15:49:50.380 回答