0

I have following results set but plotting this as it is doesn't give meaningful plot because for each recall value there are multiple precision values. If I plot this results as it is:

plot4 = ggplot(aes(x='recall', y='precision'), data=df_unique) + geom_line() + geom_point() + ggtitle("MoLT Evaluation test")
ggsave(plot4, "myplot.png")

It gives a not so pretty plot comparing to normal curves that we get for this type of metrics.

       precision    recall
1       0.000000  0.000000
17859   0.133333  0.009050
13159   0.066667  0.012195
9232    0.133333  0.012500
6131    0.066667  0.013333
7900    0.066667  0.014085
11671   0.066667  0.014925
5297    0.466667  0.015284
535     0.066667  0.015625
11223   0.133333  0.018018
5409    0.066667  0.019608
10840   0.266667  0.019802
13241   0.066667  0.020408
15957   0.200000  0.020833
21584   0.200000  0.021583
11746   0.333333  0.021834
11272   0.066667  0.022222
10904   0.066667  0.023256
13015   0.466667  0.023891
1010    0.533333  0.025641
2461    0.066667  0.027027
15294   0.200000  0.027523
11566   0.600000  0.028846
5103    0.066667  0.029412
7547    0.333333  0.030864
10059   0.333333  0.032258
20019   0.266667  0.033058
637     0.066667  0.033333
16226   0.200000  0.033708
9071    0.200000  0.034884

I want to take the average for each value and build a new data frame.

(Pdb) x[(x.recall == 0.1)]
       precision  recall
230     0.066667     0.1
119     0.133333     0.1
714     0.200000     0.1
284     0.266667     0.1
15705   0.333333     0.1
8057    0.466667     0.1
4871    0.533333     0.1

I want to build my new dataframe as follows

       precision  recall
   1     0.000     0.0
   2     0.104     0.1
   3     0.234     0.2

How can I do something like this in an apply mode:

x[(x.recall == 0.1)]

Or any other technique to build a data frame with averages for each unique recall values.

4

1 回答 1

2

Split this problem into two parts:

  • create a 'bin' column to group on
  • group data by your new column, then calculate the mean of each group

Your code might look like this:

import pandas as pd
import numpy as np

# ... load your data ...

# Create bins by rounding 'recall' to specified number of decimal places
df_unique["recall_bins"] = np.round(df_unique["recall"], 2)   

# Group your data according to the bins you just created
groups = df_unique.groupby("recall_bins")

# Calculate the means of each group
precision_means = groups.aggregate({"precision": np.mean})

You can read more about the split-apply-combine appraoch here.

于 2015-04-21T20:17:16.420 回答