4

我正在试驾 turicreate,以解决分类问题,其中数据由 10-uples(q、w、e、r、t、y、u、i、o、p、label)组成,其中 'q. .p' 是一个字符序列(目前有 2 种类型),+,-,如下所示:

q,w,e,r,t,y,u,i,o,p,label
-,+,+,e,e,e,e,e,e,e,type2
+,+,e,e,e,e,e,e,e,e,type1
-,+,e,e,e,e,e,e,e,e,type2

'e' 只是一个填充字符,因此向量的固定长度为 10。

注意:数据明显倾向于一个标签(90%),数据集很小,< 100 个点。

我使用 Apple 的 vanilla 脚本来准备和处理数据(来自这里):

import turicreate as tc

# Load the data
data =  tc.SFrame('data.csv')

# Note, for sake of investigating why predictions do not work on Swift, the model is deliberately over-fitted, with split 1.0
train_data, test_data = data.random_split(1.0)
print(train_data)

# Automatically picks the right model based on your data.
model = tc.classifier.create(train_data, target='label', features = ['q','w','e','r','t','y','u','i','o','p'])

# Generate predictions (class/probabilities etc.), contained in an SFrame.
predictions = model.classify(train_data)

# Evaluate the model, with the results stored in a dictionary
results = model.evaluate(train_data)

print("***********")
print(results['accuracy'])
print("***********")
model.export_coreml("MyModel.mlmodel")

注意:模型过度拟合整个数据(目前)。收敛好像没问题

PROGRESS: Model selection based on validation accuracy:
PROGRESS: ---------------------------------------------
PROGRESS: BoostedTreesClassifier          : 1.0
PROGRESS: RandomForestClassifier          : 0.9032258064516129
PROGRESS: DecisionTreeClassifier          : 0.9032258064516129
PROGRESS: SVMClassifier                   : 1.0
PROGRESS: LogisticClassifier              : 1.0
PROGRESS: ---------------------------------------------
PROGRESS: Selecting BoostedTreesClassifier based on validation set performance.

并且分类按预期工作(尽管过度拟合)但是,当我在我的代码中使用 mlmodel 时,无论如何,它总是返回相同的标签,这里是“type2”。这里的规则是 type1 = 只有“+”和“e”,type2 = 所有其他组合。

我尝试使用 text_classifier,结果远没有那么准确......

我不知道我做错了什么......

以防万一有人想检查一个小数据集,这里是原始数据。

q,w,e,r,t,y,u,i,o,p,label
-,+,+,e,e,e,e,e,e,e,type2
-,+,e,e,e,e,e,e,e,e,type2
+,+,-,+,e,e,e,e,e,e,type2
-,-,+,-,e,e,e,e,e,e,type2
+,e,e,e,e,e,e,e,e,e,type1
-,-,+,+,e,e,e,e,e,e,type2
+,-,+,-,e,e,e,e,e,e,type2
-,+,-,-,e,e,e,e,e,e,type2
+,-,-,+,e,e,e,e,e,e,type2
+,+,e,e,e,e,e,e,e,e,type1
+,+,-,-,e,e,e,e,e,e,type2
-,+,-,e,e,e,e,e,e,e,type2
-,-,-,-,e,e,e,e,e,e,type2
-,-,e,e,e,e,e,e,e,e,type2
-,-,-,e,e,e,e,e,e,e,type2
+,+,+,+,e,e,e,e,e,e,type1
+,-,+,+,e,e,e,e,e,e,type2
+,+,+,e,e,e,e,e,e,e,type1
+,-,-,-,e,e,e,e,e,e,type2
+,-,-,e,e,e,e,e,e,e,type2
+,+,+,-,e,e,e,e,e,e,type2
+,-,e,e,e,e,e,e,e,e,type2
+,-,+,e,e,e,e,e,e,e,type2
-,-,+,e,e,e,e,e,e,e,type2
+,+,-,e,e,e,e,e,e,e,type2
e,e,e,e,e,e,e,e,e,e,type1
-,+,+,-,e,e,e,e,e,e,type2
-,-,-,+,e,e,e,e,e,e,type2
-,e,e,e,e,e,e,e,e,e,type2
-,+,+,+,e,e,e,e,e,e,type2
-,+,-,+,e,e,e,e,e,e,type2

还有快速代码:

//Helper
extension MyModelInput {
    public convenience init(v:[String]) {
        self.init(q: v[0], w: v[1], e: v[2], r: v[3], t: v[4], y: v[5], u: v[6], i: v[7], o: v[8], p:v[9])
    }
}
    let classifier = MyModel()
    let data = ["-,+,+,e,e,e,e,e,e,e,e", "-,+,e,e,e,e,e,e,e,e,e", "+,+,-,+,e,e,e,e,e,e,e", "-,-,+,-,e,e,e,e,e,e,e","+,e,e,e,e,e,e,e,e,e,e"]
    data.forEach { (tt) in
        let gg = MyModelInput(v: tt.components(separatedBy: ","))
        if let prediction = try? classifier.prediction(input: gg) {
            print(prediction.labelProbability)
        }
    }

python 代码保存了一个 MyModel.mlmodel 文件,您可以将其添加到任何 Xcode 项目并使用上面的代码。

注意:python 部分工作正常,例如:

+---+---+---+---+---+---+---+---+---+---+-------+
| q | w | e | r | t | y | u | i | o | p | label |
+---+---+---+---+---+---+---+---+---+---+-------+
| + | + | + | + | e | e | e | e | e | e | type1 |
+---+---+---+---+---+---+---+---+---+---+-------+

被标记为预期。但是当使用 swift 代码时,标签显示为 type2。这件事让我发疯了(是的,我检查了 mlmodel 是否会在我创建新版本时替换旧版本,以及在 Xcode 中)。

4

0 回答 0