1

按照 mahout 中的甜甜圈分类示例,我构建了如下模型:

构建模型:

./mahout trainlogistic --input donut.csv --output ./model --target color --categories 2 --predictors x y a b c --types numeric --features 20 --passes 100 --rate 10

我这样评估模型:

./mahout runlogistic --input donut.csv --model model  --auc --confusion

输出是:

AUC = 0.97
confusion: [[27.0, 13.0], [0.0, 0.0]]
entropy: [[-0.4, -0.3], [-1.2, -0.7]]

第一个命令在本地磁盘上生成了一个模型文件。如何使用此模型对新数据进行分类?有这个命令吗?还是我需要编写 Java 代码来加载该模型并对其进行分类?

4

1 回答 1

2
./mahout runlogistic --input new_data.csv --model model  --auc --confusion

For example, I fetch ten recodes from donut.csv, and it is renamed as donut2.csv. Then I test it as follows.

[double@double mahout-distribution-0.7]$ bin/mahout runlogistic --input donut2.csv  --model donut.model --auc --scores --confusion

The output is:

"target","model-output","log-likelihood"
0,0.496,-0.685284
0,0.490,-0.674055
0,0.491,-0.675162
1,0.495,-0.703361
1,0.493,-0.706289
0,0.495,-0.683275
0,0.496,-0.685282
0,0.492,-0.677191
1,0.494,-0.704222
1,0.492,-0.708679
AUC = 0.50
confusion: [[6.0, 4.0], [0.0, 0.0]]
entropy: [[-0.7, -0.4], [-0.7, -0.4]]
13/06/04 15:22:50 INFO driver.MahoutDriver: Program took 1402 ms (Minutes: 0.023366666666666668)
于 2013-06-04T07:29:25.903 回答