data-mining - 推荐系统的数据集

Question

我想创建自己的简单推荐系统，关于书籍。但是有一些问题——一个人不可能（至少，非常困难）组织算法的训练数据集。

那么，有没有关于人们投票、哪些书以及他们喜欢多少信息的免费数据集或测验？

第二个问题是关于书的参数。对于某些基于项目的预测，确实必须使用书籍的评分（例如语言、平均单词长度、段落中的平均单词数，我已经计算了大约 30 个这样的参数）及其权重（例如，书籍的语言被评分1 分，平均单词长度为 0.314）。那么，是否有任何准备好的信息？

事实上，如果我得到第一个问题的答案，我可以找到第二个问题的解决方案，但我确信，需要的信息是存在的。

另外，我正在阅读推荐系统手册，它提供了完整的信息（附有参考资料），但很难阅读。在这种情况下，你能建议一些额外的书吗？

score 9 · Accepted Answer

你能检查一下 Books.txt.gz 吗？ https://snap.stanford.edu/data/web-Amazon.html 包含来自亚马逊的图书评级。它还有产品标题、价格、评论摘要等。

图书交叉数据集也可能有用 http://grouplens.org/datasets/book-crossing/

我想您的第二个问题是特征选择问题，每个数据集的权重会有所不同。

coursera 上的这门课程提供了推荐系统的简要信息，它也有阅读部分。不幸的是，测验不再可用

课程：https ://www.coursera.org/course/recsys

读数：http ://recsys.cs.umn.edu/readings.html

编辑：另一个书籍数据集。

好书：

http://fastml.com/goodbooks-10k-a-new-dataset-for-book-recommendations/

score 0 · Accepted Answer

This dataset is about movies rather than books, but you might find the Netflix Prize dataset useful as a way of testing recommendation algorithms. The underlying issues are the same with both datasets : needing out-of-band features, having to combine features with different weights, etc.

As for extra books to read, I recommend "Programming Collective Intelligence." I found it to be clearly written and very helpful. It also includes code for all of the example algorithms.

data-mining - 推荐系统的数据集

2 回答 2

Related

Reference