r - 如何将用户评分列表转换为R中的矩阵

Question

我正在研究协同过滤问题，并且在将原始数据重塑为用户评分矩阵时遇到了问题。我得到了一个评级数据库，其中包含“电影”、“用户”和“评级”列。从这个数据库中，我想获得一个大小为#users x #movies 的矩阵，其中每一行表示用户的评分。

这是一个最小的工作示例：

# given this:
ratingDB <- data.frame(rbind(c(1,1,1),c(1,2,NA),c(1,3,0), c(2,1,1), c(2,2,1), c(2,3,0), 
                         c(3,1,NA), c(3,2,NA), c(3,3,1)))
names(ratingDB) <- c('user', 'movie', 'liked')

#how do I get this?
userRating <- matrix(data = rbind(c(1,NA,0), c(1,1,0), c(NA,NA,1)), nrow=3)

我可以使用两个 for 循环来解决问题，但这当然不能很好地扩展。任何人都可以通过矢量化解决方案帮助我吗？

score 3 · Accepted Answer

这可以在没有任何循环的情况下完成。它适用于以下功能matrix：

# sort the 'liked' values (this is not neccessary for the example data)
vec <- with(ratingDB, liked[order(user, movie)])

# create a matrix
matrix(vec, nrow = length(unique(ratingDB$user)), byrow = TRUE)

     [,1] [,2] [,3]
[1,]    1   NA    0
[2,]    1    1    0
[3,]   NA   NA    1

这会将存储的向量转换ratingDB$liked为矩阵。该参数byrow = TRUE允许按行排列数据（默认为按列）。

更新：如果NA案例不在数据框中怎么办？ （见@steffen 的评论）

首先，删除包含的行NA：

subDB <- ratingDB[complete.cases(ratingDB), ]

  user movie liked
1    1     1     1
3    1     3     0
4    2     1     1
5    2     2     1
6    2     3     0
9    3     3     1

可以重建完整的数据帧。该函数expand.grid用于生成user和的所有组合movie：

full <- setNames(with(subDB, expand.grid(sort(unique(user)), sort(unique(movie)))),
                 c("user", "movie"))

  movie user
1     1    1
2     2    1
3     3    1
4     1    2
5     2    2
6     3    2
7     1    3
8     2    3
9     3    3

现在，子数据帧subDB和完整组合数据帧的信息full可以用merge函数进行组合：

ratingDB_2 <- merge(full, subDB, all = TRUE)

  user movie liked
1    1     1     1
2    1     2    NA
3    1     3     0
4    2     1     1
5    2     2     1
6    2     3     0
7    3     1    NA
8    3     2    NA
9    3     3     1

结果与原始矩阵相同。因此，可以应用相同的过程将其转换为liked值矩阵：

matrix(ratingDB_2$liked, nrow = length(unique(ratingDB_2$user)), byrow = TRUE)

     [,1] [,2] [,3]
[1,]    1   NA    0
[2,]    1    1    0
[3,]   NA   NA    1

r - 如何将用户评分列表转换为R中的矩阵

1 回答 1

Related

Reference