我正在使用 MFCC 值训练两个 GMM 分类器,每个分类器用于一个标签。我将一个类的所有 MFCC 值连接起来并放入一个分类器中。对于每个分类器,我将其标签概率的概率相加。
def createGMMClassifiers():
label_samples = {}
for label, sample in training.iteritems():
labelstack = np.empty((50,13))
for feature in sample:
#debugger.set_trace()
labelstack = np.concatenate((labelstack,feature))
label_samples[label]=labelstack
for label in label_samples:
#debugger.set_trace()
classifiers[label] = mixture.GMM(n_components = n_classes)
classifiers[label].fit(label_samples[label])
for sample in testing['happy']:
classify(sample)
def classify(testMFCC):
probability = {'happy':0,'sad':0}
for name, classifier in classifiers.iteritems():
prediction = classifier.predict_proba(testMFCC)
for probforlabel in prediction:
probability[name]+=probforlabel[0]
print 'happy ',probability['happy'],'sad ',probability['sad']
if(probability['happy']>probability['sad']):
print 'happy'
else:
print 'sad'
但是我的结果似乎并不一致,我很难相信这是因为 RandomSeed=None 状态,因为所有预测通常对于所有测试数据都是相同的标签,但每次运行它通常会给出完全相反的结果(见输出 1 和输出 2)。
所以我的问题是,在训练我的分类器时我做错了什么吗?
输出 1:
happy 123.559202732 sad 122.409167294
happy
happy 120.000879032 sad 119.883786657
happy
happy 124.000069307 sad 123.999928962
happy
happy 118.874574047 sad 118.920941127
sad
happy 117.441353421 sad 122.71924156
sad
happy 122.210579428 sad 121.997571901
happy
happy 120.981752603 sad 120.325940128
happy
happy 126.013713257 sad 125.885047394
happy
happy 122.776016525 sad 122.12320875
happy
happy 115.064172476 sad 114.999513909
happy
输出 2:
happy 123.559202732 sad 122.409167294
happy
happy 120.000879032 sad 119.883786657
happy
happy 124.000069307 sad 123.999928962
happy
happy 118.874574047 sad 118.920941127
sad
happy 117.441353421 sad 122.71924156
sad
happy 122.210579428 sad 121.997571901
happy
happy 120.981752603 sad 120.325940128
happy
happy 126.013713257 sad 125.885047394
happy
happy 122.776016525 sad 122.12320875
happy
happy 115.064172476 sad 114.999513909
happy
早些时候我问了一个相关的问题并得到了正确的答案。我在下面提供链接。
编辑:添加了收集数据并分为训练和测试的主要功能
def main():
happyDir = dir+'happy/'
sadDir = dir+'sad/'
training["sad"]=[]
training["happy"]=[]
testing["happy"]=[]
#TestSet
for wavFile in os.listdir(happyDir)[::-1][:10]:
#print wavFile
fullPath = happyDir+wavFile
testing["happy"].append(sf.getFeatures(fullPath))
#TrainSet
for wavFile in os.listdir(happyDir)[::-1][10:]:
#print wavFile
fullPath = happyDir+wavFile
training["happy"].append(sf.getFeatures(fullPath))
for wavFile in os.listdir(sadDir)[::-1][10:]:
fullPath = sadDir+wavFile
training["sad"].append(sf.getFeatures(fullPath))
#Ensure the number of files in set
print "Test(Happy): ", len(testing['happy'])
print "Train(Happy): ", len(training['happy'])
createGMMClassifiers()
编辑2:根据答案更改了代码。仍然有类似的不一致结果。