1

Let's say we have the following dataset:

X1: {4,7,0,1}
X2: {4,3,2,1}
X3: {6,6,6,6}

I'd like to remove any instance that has an attribute with value > 5, in this example X1 and X3 should be removed. I have more than 500 attributes, and I've tried to use:

SubsetByExpression -E "(ATT1 < 6) or ... or (ATT500 < 6)"

which did filter most of the instances, but there are still some instances that have values greater than 5 (I'm not really sure why it removed some and retained others).

Is there another more appropriate filter to use or any other way to achieve this task from within WEKA?

Update:

Here's a concrete example. The ARFF file's content:

@relation Test

@attribute word_1 NUMERIC
@attribute word_2 NUMERIC
@attribute word_3 NUMERIC
@attribute word_4 NUMERIC

@data
4,7,0,1
4,3,2,1
6,6,6,6
0,5,1,4

I'd like to remove all instances that have an attribute with a value of 6 or more, so the 1st and 3rd rows should be removed. If I use this filter:

SubsetByExpression -E "(ATT1 < 6) or (ATT2 < 6) or (ATT3 < 6) or (ATT4 < 6)"

Only one instance is removed, which is the the 3rd, but the 1st instance is still there.

The version I'm using is: 3.6.2

4

1 回答 1

0

如果您将表达式更改为:

SubsetByExpression -E "(ATT1< 6) and (ATT2< 6) and (ATT3< 6) and (ATT4< 6)",你会得到想要的结果。

我相信您当前的声明说,只要一个属性值小于六,您就应该保留该实例。这个新声明说所有属性值都应该小于六

于 2013-10-02T14:42:05.037 回答