python - 分类器预测不可靠，是因为我的 GMM 分类器没有正确训练吗？

Question

我正在使用 MFCC 值训练两个 GMM 分类器，每个分类器用于一个标签。我将一个类的所有 MFCC 值连接起来并放入一个分类器中。对于每个分类器，我将其标签概率的概率相加。

def createGMMClassifiers():
    label_samples = {}
    for label, sample in training.iteritems():
        labelstack = np.empty((50,13))
        for feature in sample:
            #debugger.set_trace()
            labelstack = np.concatenate((labelstack,feature))
        label_samples[label]=labelstack
    for label in label_samples:
        #debugger.set_trace()
        classifiers[label] = mixture.GMM(n_components = n_classes)
        classifiers[label].fit(label_samples[label])
    for sample in testing['happy']:
        classify(sample)
def classify(testMFCC):
    probability = {'happy':0,'sad':0}
    for name, classifier in classifiers.iteritems():
        prediction = classifier.predict_proba(testMFCC)
        for probforlabel in prediction:
            probability[name]+=probforlabel[0]
    print 'happy ',probability['happy'],'sad ',probability['sad']

    if(probability['happy']>probability['sad']):
        print 'happy'
    else:
        print 'sad'

但是我的结果似乎并不一致，我很难相信这是因为 RandomSeed=None 状态，因为所有预测通常对于所有测试数据都是相同的标签，但每次运行它通常会给出完全相反的结果（见输出 1 和输出 2)。

所以我的问题是，在训练我的分类器时我做错了什么吗？

输出 1：

happy  123.559202732 sad  122.409167294
happy

happy  120.000879032 sad  119.883786657
happy

happy  124.000069307 sad  123.999928962
happy

happy  118.874574047 sad  118.920941127
sad

happy  117.441353421 sad  122.71924156
sad

happy  122.210579428 sad  121.997571901
happy

happy  120.981752603 sad  120.325940128
happy

happy  126.013713257 sad  125.885047394
happy

happy  122.776016525 sad  122.12320875
happy

happy  115.064172476 sad  114.999513909
happy

输出 2：

happy  123.559202732 sad  122.409167294
happy

happy  120.000879032 sad  119.883786657
happy

happy  124.000069307 sad  123.999928962
happy

happy  118.874574047 sad  118.920941127
sad

happy  117.441353421 sad  122.71924156
sad

happy  122.210579428 sad  121.997571901
happy

happy  120.981752603 sad  120.325940128
happy

happy  126.013713257 sad  125.885047394
happy

happy  122.776016525 sad  122.12320875
happy

happy  115.064172476 sad  114.999513909
happy

早些时候我问了一个相关的问题并得到了正确的答案。我在下面提供链接。

使用 GMM 分类器每次运行都有不同的结果

编辑：添加了收集数据并分为训练和测试的主要功能

def main():
    happyDir = dir+'happy/'
    sadDir = dir+'sad/'
    training["sad"]=[]
    training["happy"]=[]
    testing["happy"]=[]
    #TestSet
    for wavFile in os.listdir(happyDir)[::-1][:10]:
        #print wavFile
        fullPath = happyDir+wavFile
        testing["happy"].append(sf.getFeatures(fullPath))
    #TrainSet
    for wavFile in os.listdir(happyDir)[::-1][10:]:
        #print wavFile
        fullPath = happyDir+wavFile
        training["happy"].append(sf.getFeatures(fullPath))
    for wavFile in os.listdir(sadDir)[::-1][10:]:
        fullPath = sadDir+wavFile
        training["sad"].append(sf.getFeatures(fullPath))
    #Ensure the number of files in set
    print "Test(Happy): ", len(testing['happy'])
    print "Train(Happy): ", len(training['happy'])
    createGMMClassifiers()

编辑2：根据答案更改了代码。仍然有类似的不一致结果。

score 0 · Accepted Answer

您的代码没有多大意义，您为每个新的训练样本重新创建分类器。

正确的训练代码方案应该是这样的：

label_samples = {}
classifiers = {}

# First we collect all samples per label into array of samples
for label, sample in samples:
     label_samples[label].concatenate(sample)

# Then we train classifier on every label data
for label in label_samples:
     classifiers[label] = mixture.GMM(n_components = n_classes)
     classifiers[label].fit(label_samples[label])

你的解码代码没问题。

score 0 · Accepted Answer

对于分类任务来说，调整给分类器的参数很重要，大量的分类算法也遵循选择的理论，这意味着如果你简单地改变模型的一些参数，你可能会得到一些巨大的不同结果。同样重要的是使用不同的算法，而不仅仅是对所有分类任务使用一种算法，

对于这个问题，你可以尝试不同的分类算法来测试你的数据是否良好，并为每个分类器尝试不同的参数和不同的值，然后你就可以确定问题出在哪里。

另一种方法是使用网格搜索来探索和调整特定分类器的最佳参数，请阅读：http ://scikit-learn.org/stable/modules/grid_search.html

python - 分类器预测不可靠，是因为我的 GMM 分类器没有正确训练吗？

2 回答 2

Related

Reference