I am working with the Orange package and have written the following code based on the tutorials available:
import orange,orngTest,orngStat,orngTree,Orange
bayes = orange.BayesLearner()
tree=orngTree.TreeLearner(mForPruning=2)
bayes.name="bayes"
tree.name="tree"
data = Orange.data.Table("iris.tab")
learners=[bayes,tree]
results=orngTest.crossValidation(learners,data,folds=10)
print "Learner CA IS Brier AUC"
for i in range(len(learners)):
print "%-8s %5.3f %5.3f %5.3f %5.3f" %\
(learners[i].name,\
orngStat.CA(results)[i],\
orngStat.IS(results)[i],\
orngStat.BrierScore(results)[i],\
orngStat.AUC(results)[i])
This results in the following print:
Running script:
Learner CA IS Brier AUC
bayes 0.920 1.402 0.098 0.993
tree 0.940 1.447 0.120 0.967
Based on the following description of Information Score
Let the correct class of an instance be C. Recall that P(C) is the prior probability of class C and P'(C) is the posterior probability returned by the classifier. We consider two cases: (a) P'(C) > P(C) Here the probability of class C has changed in the right direction, therefore we will call such an answer useful. It should be awarded a positive score. Co) P'(C) < P(C) Here the probability of class C has changed in the wrong direction, therefore we will call such an answer misleading. It should be assigned a negative score.
Following:
Suppose the classifier in 1950 answered:
P'(Bush) = 0.45
P'(Dukakis) = 0.55
P'(all others) = 0
Is a value greater than 1.0 invalid for information score? Or is it the case that I have used the incorrect classifier on this type of data set, there are three distinct categories of data in the iris.tab
data set.