Let's say we have the following dataset:
X1: {4,7,0,1}
X2: {4,3,2,1}
X3: {6,6,6,6}
I'd like to remove any instance that has an attribute with value > 5
, in this example X1 and X3 should be removed. I have more than 500 attributes, and I've tried to use:
SubsetByExpression -E "(ATT1 < 6) or ... or (ATT500 < 6)"
which did filter most of the instances, but there are still some instances that have values greater than 5 (I'm not really sure why it removed some and retained others).
Is there another more appropriate filter to use or any other way to achieve this task from within WEKA?
Update:
Here's a concrete example. The ARFF file's content:
@relation Test
@attribute word_1 NUMERIC
@attribute word_2 NUMERIC
@attribute word_3 NUMERIC
@attribute word_4 NUMERIC
@data
4,7,0,1
4,3,2,1
6,6,6,6
0,5,1,4
I'd like to remove all instances that have an attribute with a value of 6 or more, so the 1st and 3rd rows should be removed. If I use this filter:
SubsetByExpression -E "(ATT1 < 6) or (ATT2 < 6) or (ATT3 < 6) or (ATT4 < 6)"
Only one instance is removed, which is the the 3rd, but the 1st instance is still there.
The version I'm using is: 3.6.2