mysql - Mahout 0.7 无法使用 MysqlJdbcDataModel 获得大数据的推荐

Question

我正在使用 Mahout 构建基于项目的 Cf 推荐引擎。我创建了一个 MahoutHelper 类，它有一个构造函数：

    public MahoutHelper(String serverName, String user, String password,
        String DatabaseName, String tableName) {


    source = new MysqlConnectionPoolDataSource();

    source.setServerName(serverName);
    source.setUser(user);
    source.setPassword(password);
    source.setDatabaseName(DatabaseName);
    source.setCachePreparedStatements(true);
    source.setCachePrepStmts(true);
    source.setCacheResultSetMetadata(true);
    source.setAlwaysSendSetIsolation(true);
    source.setElideSetAutoCommits(true);
    DBmodel = new MySQLJDBCDataModel(source, tableName, "userId", "itemId",
            "value", null);

    similarity = new TanimotoCoefficientSimilarity(DBmodel);

}

推荐的方法是：

   public List<RecommendedItem> recommendation() throws TasteException {

    Recommender recommender = null;
    recommender = new GenericItemBasedRecommender(DBmodel, similarity);
    List<RecommendedItem> recommendations = null;
    recommendations = recommender.recommend(userId, maxNum);
    System.out.println("query completed");
    return recommendations;
}

它使用数据源来构建数据模型，但问题是当 mysql 只有少量数据（少于 100 个）时，程序对我来说运行良好，而当规模超过 1,000,000 时，程序会在推荐时堆栈并且永远不会前进. 我不知道它是怎么发生的。顺便说一句，我用相同的数据用.dat文件构建了一个FileDataModel，完成分析只需要2~3秒。我很困惑。

score 2 · Accepted Answer

Using the database directly will only work for tiny data sets, like maybe a hundred thousand data points. Beyond that the overhead of such data-intensive applications will never run quickly; a query takes thousands of SQL queries or more.

Instead you must load and re-load into memory. You can still pull from the database; look at ReloadFromJDBCDataModel as a wrapper.

mysql - Mahout 0.7 无法使用 MysqlJdbcDataModel 获得大数据的推荐

1 回答 1

Related

Reference