I have following results set but plotting this as it is doesn't give meaningful plot because for each recall value there are multiple precision values. If I plot this results as it is:
plot4 = ggplot(aes(x='recall', y='precision'), data=df_unique) + geom_line() + geom_point() + ggtitle("MoLT Evaluation test")
ggsave(plot4, "myplot.png")
It gives a not so pretty plot comparing to normal curves that we get for this type of metrics.
precision recall
1 0.000000 0.000000
17859 0.133333 0.009050
13159 0.066667 0.012195
9232 0.133333 0.012500
6131 0.066667 0.013333
7900 0.066667 0.014085
11671 0.066667 0.014925
5297 0.466667 0.015284
535 0.066667 0.015625
11223 0.133333 0.018018
5409 0.066667 0.019608
10840 0.266667 0.019802
13241 0.066667 0.020408
15957 0.200000 0.020833
21584 0.200000 0.021583
11746 0.333333 0.021834
11272 0.066667 0.022222
10904 0.066667 0.023256
13015 0.466667 0.023891
1010 0.533333 0.025641
2461 0.066667 0.027027
15294 0.200000 0.027523
11566 0.600000 0.028846
5103 0.066667 0.029412
7547 0.333333 0.030864
10059 0.333333 0.032258
20019 0.266667 0.033058
637 0.066667 0.033333
16226 0.200000 0.033708
9071 0.200000 0.034884
I want to take the average for each value and build a new data frame.
(Pdb) x[(x.recall == 0.1)]
precision recall
230 0.066667 0.1
119 0.133333 0.1
714 0.200000 0.1
284 0.266667 0.1
15705 0.333333 0.1
8057 0.466667 0.1
4871 0.533333 0.1
I want to build my new dataframe as follows
precision recall
1 0.000 0.0
2 0.104 0.1
3 0.234 0.2
How can I do something like this in an apply mode:
x[(x.recall == 0.1)]
Or any other technique to build a data frame with averages for each unique recall values.