1

Okay so this isn't a specific question; I just need some guidelines as to how to go about doing a final project for my class in python, version 2.7. What the program basically has to do is import information from 2 text files, one with just a list of books and their author seperated by a comma and lines:
ex:
Douglas Adams,The Hitchhiker's Guide To The Galaxy
Richard Adams,Watership Down
Mitch Albom,The Five People You Meet in Heaven

and then a file of users and their name on one line and with their rating of the 55 total books on the next(text box isn't big enough to put all ratings on one line): ex:
Ben
5 0 0 0 0 0 0 1 0 1 -3 5 0 0 0 5 5 0 0 0 0 5 0 0 0 0 0 0 0 0 1 3 0 1 0 -5 0 0 5 5 0 5 5 5 0 5 5 0 0 0 5 5 5 5 -5
Moose
5 5 0 0 0 0 3 0 0 1 0 5 3 0 5 0 3 3 5 0 0 0 0 0 5 0 0 0 0 0 3 5 0 0 0 0 0 5 -3 0 0 0 5 0 0 0 0 0 0 5 5 0 3 0 0

Anyways, the actual project has you use an algorithm provided by the instructor to compare the users interests in books. You input a user and it searches all of the other 87 users by multiplying their rating of the book and adding them all together. For example, for the first book Ben has a rating of 5 and so does Moose, so the score of similarity for Moose is 25, you do this for every book multiplying the inputted users ratings with all of the other users and whomever has the closest similarity score, you output them and refer 5 books to the user. Basically books that the closest similarity score user read and rated high but the inputted user hasn't read. Okay now after that huge wall of text which I'm sorry about. I understand what I have to do but I just cannot figure out a simple way to do this task, I'm not looking for someone to do the project for me or anything like that its just that this project is worth a large portion of our grade and I don't even know where to start. If anybody could even lead me in the right direction as to what type of data structures would be easiest to accomplish this task I would be extremely grateful, again sorry for such a long post but I'm desperate.

4

1 回答 1

1

你所说的算法听起来很像向量空间模型(也是这个页面)。将每个用户的分数设想为一个 55 维向量(在 55 维空间中形成一条线),并且您通过计算它们的角度彼此之间的接近程度来比较用户线的相似性。

无论如何,您的应用程序有两个基本部分:

  1. 文档解析构建数据结构
  2. 使用创建的结构实现您的算法

请注意,它们的共同点是您需要决定数据结构,因此您使用的数据结构是应用程序的核心。

可能工作的最简单的事情是两个列表。一个列表是书籍数据:books = [('author', 'book'), ...]. 另一个是分数数据:scores = [('user', [1,2,3,4,...]), ...]. 然后你确定那scores[n][1][m]是对应的分数books[m]。然后确保实现算法的函数接受这些结构。

我会确保你可以先让这个方法起作用。然后您可以查看更高级别的抽象。

您可以使用类和对象将数据与其操作捆绑在一起,以提供更高级别的抽象。例如,您可以将您的图书记录存储为命名元组,并拥有另一个对象,该对象包含一组图书记录并具有用于图书查找的方法(例如findByIndexfindByAuthor等),以及用于得分的类似方法。您可以提供带有书集的乐谱集,以便乐谱可以从乐谱索引中查找图书记录。您可以创建一个评分类来接受评分集并对其执行操作,例如返回评分集中给定用户的得分最高的图书记录列表。等等。

于 2012-04-12T03:52:02.187 回答