1

我正在尝试使用 weka 提供的信息增益库来评估选择的属性,但是如果决定如何对实例进行分类的属性并不总是提供值,则它不起作用。很难解释,所以举个例子:

这是我的关系。它由 3 个数字属性和一个类属性组成。如果属性 3 取值 5000,则分类设置为失败。

@relation TEST

@attribute attr1 numeric
@attribute attr2 numeric
@attribute attr3 numeric
@attribute class {failing,correct}

@data
8,  0.519674, 5000, failing
?,  6.78149,  ?,    correct
?,  7.384081, 5000, failing
21, ?,        ?,    correct
5,  1.016151, 5000, failing

执行 Information Gain Attribute Evaluator 后,输出如下:

=== Attribute Selection on all input data ===

Search Method:
    Attribute ranking.

Attribute Evaluator (supervised, Class (nominal): 4 class):
    Information Gain Ranking Filter

Ranked attributes:
 0.249  1 attr1
 0      3 attr3
 0      2 attr2

Selected attributes: 1,3,2 : 3

现在,属性 3 应该是最高排名,因为它的值决定了实例被归类为失败还是正确。但不知何故,情况并非如此。

所以,我的问题是:如何告诉 WEKA 在计算信息增益时使用缺失值?

一种可能性是用这样的常量替换缺失值:

@relation TEST

@attribute attr1 numeric
@attribute attr2 numeric
@attribute attr3 numeric
@attribute class {failing,correct}

37, 9.295889,  5000, failing
48, ?,         0,    correct
35, 14.722155, 5000, failing
?,  11.417347, 0,    correct
?,  4.539502,  5000, failing

然后排名工作:

=== Attribute Selection on all input data ===

Search Method:
    Attribute ranking.

Attribute Evaluator (supervised, Class (nominal): 4 class):
    Information Gain Ranking Filter

Ranked attributes:
 0.971  3 attr3
 0.249  1 attr1
 0      2 attr2

Selected attributes: 3,1,2 : 3

但这并不是我真正想要的,因为我无法预测属性 3 的值。

这是我的代码:

 public static void test() {
        FastVector attributes = new FastVector();
        Random rand = new Random();

        Attribute attr1 = new Attribute("attr1");
        Attribute attr2 = new Attribute("attr2");
        Attribute attr3 = new Attribute("attr3");

        attributes.addElement(attr1);
        attributes.addElement(attr2);
        attributes.addElement(attr3);

        FastVector classValues = new FastVector(2);
        classValues.addElement("failing");
        classValues.addElement("correct");
        Attribute classAttribute = new Attribute("class", classValues);
        attributes.addElement(classAttribute);

        Instances instances = new Instances("TEST", attributes, 5);

        for (int i = 0; i < 5; i++) {
            Instance instance = new Instance(4);
            instance.setDataset(instances);

            if (i % (rand.nextInt(4) + 1) == 0)
                instance.setValue(attr1, rand.nextInt(50));

            if (i % (rand.nextInt(4) + 1) == 0)
                instance.setValue(attr2, rand.nextFloat() * 15);

            if (i % 2 == 0) {
                instance.setValue(attr3, 5000);
                instance.setValue(classAttribute, "failing");
            } else {
                //instance.setValue(attr3, 0);
                instance.setValue(classAttribute, "correct");
            }

            instances.add(instance);
        }

        instances.setClass(classAttribute);
        instances.compactify();
        System.out.println(instances);

        try {
            System.out.println(AttributeSelection.SelectAttributes(new InfoGainAttributeEval(), new String[]{"-s", "weka.attributeSelection.Ranker"}, instances));
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

谢谢!

4

0 回答 0