weka - Weka : 如何在 weka 中准备测试集

Question

我一直在使用带有以下数据的 SVM 分类器

@relation whatever

@attribute mfe numeric
@attribute GB numeric
@attribute GTB numeric
@attribute Seeds numeric
@attribute ABP numeric
@attribute AU_Seed numeric
@attribute GC_Seed numeric
@attribute GU_Seed numeric
@attribute UP numeric
@attribute AU numeric
@attribute GC numeric
@attribute GU numeric
@attribute A-U_L numeric
@attribute G-C_L numeric
@attribute G-U_L numeric
@attribute (G+C) numeric
@attribute MFEi1 numeric
@attribute MFEi2 numeric
@attribute MFEi3 numeric
@attribute MFEi4 numeric
@attribute dG numeric
@attribute dP numeric
@attribute dQ numeric
@attribute dD numeric
@attribute Outcome {Yes,No}


@data
-24.3,1,18,2,9,4,3,0.5,8,10,7,1,0.454545455,0.318181818,0.045454545,7,-0.157792208,-0.050206612,-1.104545455,-1.35,-1.104545455,0,0,0,Yes
-24.8,2,15,2,7.5,2,3,1,7,5,8,2,0.208333333,0.333333333,0.083333333,8,-0.129166667,-0.043055556,-0.516666667,-1.653333333,-1.033333333,0,0,0,No
-24.4,1,16,3,5.333333333,1.666666667,2.666666667,1,4,5,8,3,0.217391304,0.347826087,0.130434783,8,-0.132608696,-0.046124764,-1.060869565,-1.525,-1.060869565,0,0,0,Yes
-24.2,1,18,2,9,2,2.5,1,10,5,11,2,0.227272727,0.5,0.090909091,11,-0.1,-0.05,-1.1,-1.344444444,-1.1,0,0,0,Yes
-24.5,3,17,2,8.5,2,3,1,5,6,9,2,0.272727273,0.409090909,0.090909091,9,-0.123737374,-0.050619835,-0.371212121,-1.441176471,-1.113636364,-0.12244898,0,0,Yes

这是我的训练集。在此它定义了我的数据是是类还是非类。我的问题是我的测试数据来自未知来源，我不知道它属于哪个类。那么如何准备我的测试集。没有结果属性 weka 给出了 "ereor: Data mismatch" 。如何准备测试集？使用 SVM 将我的变量分隔为 Yes 和 no 类。

score 8 · Accepted Answer

准备测试集的步骤：

创建 CSV 格式的训练集。
还要以 CSV 格式创建具有相同编号的测试集。的属性和相同的类型。
复制测试集并粘贴到训练集的末尾并保存为新的 CSV 文件。
使用 Weka>>Explorer>>Preprocess 导入步骤 3 中保存的 CSV 文件。
在过滤器选项中选择过滤器>>无监督>>实例>>删除范围。
单击显示 RemoveRange-R first-last 的提要。
指定要删除的范围，例如训练数据有 100 个值，然后选择 first-100 并应用过滤器。
另存为 Arff 文件，这可以用作测试集。
然后应用这个集合。如果您仍然有任何错误，请写下此帖子的回复。

score 2 · Accepted Answer

如果您不想遇到麻烦，那么您可以使用与训练集中一样的确切名称、数据类型和数据范围来准备测试集，当然还有属性值。将存在类属性，但值应该是问号 (?)。例如，要将给定的训练集转换为测试集，可以进行以下更改`@relation 不管

    @relation whatever-TEST

    @attribute mfe numeric
    @attribute GB numeric
    @attribute GTB numeric
    @attribute Seeds numeric
    @attribute ABP numeric
    @attribute AU_Seed numeric
    @attribute GC_Seed numeric
    @attribute GU_Seed numeric
    @attribute UP numeric
    @attribute AU numeric
    @attribute GC numeric
    @attribute GU numeric
    @attribute A-U_L numeric
    @attribute G-C_L numeric
    @attribute G-U_L numeric
    @attribute (G+C) numeric
    @attribute MFEi1 numeric
    @attribute MFEi2 numeric
    @attribute MFEi3 numeric
    @attribute MFEi4 numeric
    @attribute dG numeric
    @attribute dP numeric
    @attribute dQ numeric
    @attribute dD numeric
    @attribute Outcome {Yes,No}


    @data
    -24.3,1,18,2,9,4,3,0.5,8,10,7,1,0.454545455,0.318181818,0.045454545,7,-0.157792208,-0.050206612,-1.104545455,-1.35,-1.104545455,0,0,0,?
    -24.8,2,15,2,7.5,2,3,1,7,5,8,2,0.208333333,0.333333333,0.083333333,8,-0.129166667,-0.043055556,-0.516666667,-1.653333333,-1.033333333,0,0,0,?
    -24.4,1,16,3,5.333333333,1.666666667,2.666666667,1,4,5,8,3,0.217391304,0.347826087,0.130434783,8,-0.132608696,-0.046124764,-1.060869565,-1.525,-1.060869565,0,0,0,?
    -24.2,1,18,2,9,2,2.5,1,10,5,11,2,0.227272727,0.5,0.090909091,11,-0.1,-0.05,-1.1,-1.344444444,-1.1,0,0,0,?
    -24.5,3,17,2,8.5,2,3,1,5,6,9,2,0.272727273,0.409090909,0.090909091,9,-0.123737374,-0.050619835,-0.371212121,-1.441176471,-1.113636364,-0.12244898,0,0,?

`

score 0 · Accepted Answer

我们是否需要将测试数据中最后一个属性的值替换为问号？我很困惑我确实通过两种方法测试了我的数据

删除最后一个属性的值并放置？作为替代品。
我按原样使用测试数据（不重新定义类属性）

score 0 · Accepted Answer

无论您是在数据集上评估经过训练的模型还是尝试使用经过训练的模型进行预测，数据集都必须具有与训练数据完全相同的结构（属性名称、属性类型、名义标签的顺序）。这包括类属性。

如果你想测试你的模型，那么你需要真实值来比较预测。否则您无法生成统计信息。

如果要进行预测，那么类值应该全部丢失。

要删除类值，您可以手动执行此操作，也可以使用missing-values-imputation Weka 包。将weka.filters.unsupervised.attribute.MissingValuesInjection过滤器与ClassOnly注入方案结合使用。

weka - Weka : 如何在 weka 中准备测试集

4 回答 4

Related

Reference