0

我想在我的小数据集(65x8)上执行简单的 LDA。我有 65 个实例(样本)、8 个特征(属性)和 4 个类。LDA 的任何 matlab 代码,据我所知,Matlab Toolbox 没有 LDA 功能所以我需要编写自己的代码。有什么帮助吗?

我在网上找到这段代码

load /Data;
 All_data= Data(:,1:8);
 All_data_label= Data(:,9);
 testing_ind = [];
 for i = 1:length(Data)
     if rand>0.8
         testing_ind = [testing_ind, i];
         end
 end
training_ind = setxor(1:length(Data), testing_ind);

[ldaClass,err,P,logp,coeff] = classify(Data(testing_ind,:),...
    Data((training_ind),:),Data_label(training_ind,:),'linear');
[ldaResubCM,grpOrder] = confusionmat(All_data_label(testing_ind,:),ldaClass)

然后我得到了这个结果 ldaClass =

 3
 2
 3
 2
 1
 4
 3
 3
 1
 2
 1
 1
 2

错误 =

0.2963

P =

0.0001    0.0469    0.7302    0.2229
0.1178    0.5224    0.3178    0.0419
0.0004    0.2856    0.4916    0.2224
0.0591    0.6887    0.1524    0.0998
0.8327    0.1637    0.0030    0.0007
0.0002    0.1173    0.3897    0.4928
0.0000    0.0061    0.7683    0.2255
0.0000    0.0241    0.5783    0.3976
0.9571    0.0426    0.0003    0.0000
0.2719    0.5569    0.1630    0.0082
0.9999    0.0001    0.0000    0.0000
0.9736    0.0261    0.0003    0.0000
0.0842    0.6404    0.2634    0.0120

系数 =

具有字段的 4x4 结构数组:类型 name1 name2 const linear

ldaResubCM =

 4     0     0     0
 0     3     1     0
 0     1     1     0
 0     0     2     1

grpOrder =

 1
 2
 3
 4

所以我有 65 个实例、8 个属性和 4 个类(1、2、3、4)。所以不知道如何解释这些结果。有什么帮助吗?

4

1 回答 1

1

The interpretation of the results derives directly from the documentation of classify.

classify trains a classifier based on the training data and labels (second and third argument), and applies the classifier to the test data (first argument).

ldaClass gives the classes chosen for the test data points, based on the classifier that has been trained using the training data points and labels.

err is the training error rate, the fraction of training data points that are incorrectly classified using the classifier which was trained using that data. The training error rate underestimates the error to be expected on independent test data.

P gives the posterior probabilities. I.e. for each test data point (rows) it gives for each class (columns) the probability that the data point belongs to that class. Probabilities sum to 1 across classes (for each row). The definite classification in ldaClass derives from the posterior probabilities such that for each test data point the class with the highest probability is chosen: [~, ind] = max(P') results in ind = ldaClass'.

coeff contains details about the trained classifier. In order to use this, you have to study in detail how the classifier works.

confusionmat compares the classes assigned by the classifier to the test data with the known true classes, and makes a table of the results, a confusion matrix. Each row corresponds to the true class of a test data point, each column to the class assigned by the classifier. Numbers on the diagonal indicate correct classifications; in your result, you have a test error of 1 - sum(diag(confusionmat)) / sum(confusionmat(:)) of 0.308. In particular, the confusion matrix shows you that of the 4 test data points that belong to class two, three have been classified correctly and 1 incorrectly (as belonging to class three).

grpOrder just gives the explicit class labels for the four classes numbered 1 to 4; in your case, indices and labels are identical.

于 2013-12-11T16:36:07.563 回答