1

这是我关于 weka 使用的第二个帖子(第一个发布在这里)。我使用 TextDirectoryLoader 成功地为 Weka 提供了训练和样本测试数据。效果很好。现在我想把它转移到生产中,所以要分类的数据是从 mysql 表中检索的。这就是我的做法:

    TextDirectoryLoader loader = new TextDirectoryLoader();
    loader.setDirectory(new File("c:/Users/Yehia A.Salam/Desktop/dd/training-data"));
    Instances dataRaw = loader.getDataSet();

    StringToWordVector filter = new StringToWordVector();
    filter.setInputFormat(dataRaw);
    Instances dataTraining = Filter.useFilter(dataRaw, filter);

    // Create test data instances[this works, but the sample data now needs to come frm the db instead, see below]
    //loader.setDirectory(new File("c:/Users/Yehia A.Salam/Desktop/dd/test-data"));
    //dataRaw = loader.getDataSet();
    //Instances dataTest = Filter.useFilter(dataRaw, filter);

    InstanceQuery query = new InstanceQuery();
    query.setUsername("myusername");
    query.setPassword("mypassword");
    String sql = "SELECT d.desc FROM deals d WHERE d.j48 = 1";
    query.setQuery(sql);
    Instances dataTest = Filter.useFilter(query.retrieveInstances(), filter);

    // Classify
    J48 model = new J48();
    model.buildClassifier(dataTraining);

    for (int i = 0; i < dataTest.numInstances(); i++) {
             dataTest.instance(i).setClassMissing();
             double cls = model.classifyInstance(dataTest.instance(i));
             dataTest.instance(i).setClassValue(cls);
             System.out.println(cls + " -> " + dataTest.instance(i).classAttribute().value((int) cls));

    }

不幸的是,这不起作用,weka 在这条线上意外停止:

Instances dataTest = Filter.useFilter(query.retrieveInstances(), filter);

所以我想我的问题是如何改变这部分

// Create test data instances[this works, but the sample data now needs to come frm the db instead, see below]
//loader.setDirectory(new File("c:/Users/Yehia A.Salam/Desktop/dd/test-data"));
//dataRaw = loader.getDataSet();
//Instances dataTest = Filter.useFilter(dataRaw, filter);

到基于 sql 的数据

InstanceQuery query = new InstanceQuery();
query.setUsername("myusername");
query.setPassword("mypassword");
String sql = "SELECT d.desc FROM deals d WHERE d.j48 = 1";
query.setQuery(sql);
Instances dataTest = Filter.useFilter(query.retrieveInstances(), filter);

请注意,数据库连接没有问题,我实际上得到了正确数量的实例。

感谢帮助,非常接近。

4

2 回答 2

0

您的代码使用基于来自 Text Collections 的 Arff Files 的TextDirectoryLoader类。根据他们的帮助文件

"Loads all text files in a directory and 
 uses the subdirectory names as class labels. 
 The content of the text files will be stored in a String attribute, 
 the filename can be stored as well."

请参阅以下代码

 double[] newInst = new double[2];
 newInst[0] = (double)data.attribute(0).addStringValue(files[i]);
 ....
 newInst[1] = (double)data.attribute(1).addStringValue(txtStr.toString());
 data.add(new Instance(1.0, newInst));

如您所见,此代码需要 2 个属性值来添加您的数据集。但是您的 sql 只提供一个属性。

 String sql = "SELECT d.desc FROM deals d WHERE d.j48 = 1";

这可能是代码 newInst 1部分中出现问题“(java.lang.ArrayIndexOutOfBoundsException)”的原因。Weka 找不到第二个属性。

于 2013-03-27T07:27:03.457 回答
-1

我自己也是一个初学者,但以防万一它有用,你知道有一个DatabaseLoader类和一个DatabaseConverter接口吗?

于 2013-07-18T12:11:00.943 回答