1

I am working with the Orange package and have written the following code based on the tutorials available:

import orange,orngTest,orngStat,orngTree,Orange
bayes = orange.BayesLearner()
tree=orngTree.TreeLearner(mForPruning=2)
bayes.name="bayes"
tree.name="tree"
data = Orange.data.Table("iris.tab")
learners=[bayes,tree]
results=orngTest.crossValidation(learners,data,folds=10)
print "Learner    CA    IS    Brier    AUC"
for i in range(len(learners)):
    print "%-8s    %5.3f    %5.3f    %5.3f    %5.3f" %\
    (learners[i].name,\
    orngStat.CA(results)[i],\
    orngStat.IS(results)[i],\
    orngStat.BrierScore(results)[i],\
    orngStat.AUC(results)[i])

This results in the following print:

Running script:
Learner      CA        IS     Brier    AUC  
bayes       0.920    1.402    0.098    0.993  
tree        0.940    1.447    0.120    0.967  

Based on the following description of Information Score

Let the correct class of an instance be C. Recall that P(C) is the prior probability of class C and P'(C) is the posterior probability returned by the classifier. We consider two cases: (a) P'(C) > P(C) Here the probability of class C has changed in the right direction, therefore we will call such an answer useful. It should be awarded a positive score. Co) P'(C) < P(C) Here the probability of class C has changed in the wrong direction, therefore we will call such an answer misleading. It should be assigned a negative score.

Following:

Suppose the classifier in 1950 answered:   
P'(Bush) = 0.45   
P'(Dukakis) = 0.55   
P'(all others) = 0  

Is a value greater than 1.0 invalid for information score? Or is it the case that I have used the incorrect classifier on this type of data set, there are three distinct categories of data in the iris.tab data set.

4

0 回答 0