I'm currently try to implement the calculation of a ROC curve in ruby. I tried to transform the pseudocode from http://people.inf.elte.hu/kiss/13dwhdm/roc.pdf (see 6th site, chapter 5, Algorithm 1 "Efficient Method for generating ROC points") into Ruby code.
I worked out a simple example, but I'm always getting values over 1.0
for recall. I think I misunderstood something, or made a mistake at programming. Here is what I gor so far:
# results from a classifier
# index 0: users voting
# index 1: estimate from the system
results = [[5.0,4.8],[4.6,4.2],[4.3,2.2],[3.1,4.9],[1.3,2.6],[3.9,4.3],[1.9,2.4],[2.6,2.3]]
# over a score of 2.5 an item is a positive one
threshold = 2.5
# sort by index 1, the estimate
l_sorted = results.sort { |a,b| b[1] <=> a[1] }
# count the real positives and negatives
positives, negatives = 0, 0
positives, negatives = 0, 0
l_sorted.each do |item|
if item[0] >= threshold
positives += 1
else
negatives += 1
end
end
fp, tp = 0, 0
# the array that holds the points
r = []
f_prev = -Float::INFINITY
# iterate over all items
l_sorted.each do |item|
# if the score of the former iteration is different,
# add another point to r
if item[1]!=f_prev
r.push [fp/negatives.to_f,tp/positives.to_f]
f_prev = item[1]
end
# if the current item is a real positive
# (user likes the item indeed, and estimater was also correct)
# add a true positive, otherwise, add a false positve
if item[0] >= threshold && item[1] >= threshold
tp += 1
else
fp += 1
end
end
# push the last point (1,1) to the array
r.push [fp/negatives.to_f,tp/positives.to_f]
r.each do |point|
puts "(#{point[0].round(3)},#{point[1].round(3)})"
end
Based on a results
array of arrays, the code tries to calculate the points. I'm not sure what the f_prev
is all about. Is in the f_prev
the score of the classifier stored, or only if it's true
or false
?
It would be awesome, if someone could have a quick look at my code, and help me find my mistake. thx!