0

I have build framework to do some algorithm evaluation. I have build methods to calculate based on data that I am passing into these method. RMSE@K, NDCG@K, MAE@K etc.

ndcg = []
rmse = []
mae = []
for i in xrange(11):
    results = generate_metrics(data_file, i)
    ndcg.append(np.mean(results['ndcg']))
    rmse.append(np.mean(results['rmse']))
    mae.append(np.mean(results['mae']))
plt.plot(ndcg)
plt.plot(rmse)
plt.plot(mae)
plt.plot()
plt.show()

I want to use ggplot within python to plot this in one graph: X axis is @k values which is 0-10 and y axis relevant value in each list.

how can I convert above lists to a data frame like this:

   at_k      ndcg      rmse       mae
1     1 0.4880583 0.3438043 0.3400933
2     2 0.4880583 0.3438043 0.3400933
3     3 0.4880583 0.3438043 0.3400933
4     4 0.4880583 0.3438043 0.3400933
5     5 0.4880583 0.3438043 0.3400933
6     6 0.4880583 0.3438043 0.3400933
7     7 0.4880583 0.3438043 0.3400933
8     8 0.4880583 0.3438043 0.3400933
9     9 0.4880583 0.3438043 0.3400933
10   10 0.4880583 0.3438043 0.3400933

and plot it using ggplot

4

1 回答 1

1

在使用与您的数据集相同的形式生成一些随机数据后

import numpy as np
ndcg, rmse, mae = [], [], []
for i in xrange(11):
    rand = np.random.sample(3)
    ndcg.append(rand[0])
    rmse.append(rand[1])
    mae.append(rand[2])

我可以从中创建一个 Pandas DataFrame:

    import pandas as pd
at_k = range(1, 12)
df = pd.DataFrame({"at_k": at_k, "ndcg": ndcg, "rmse": rmse, "mae": mae})
print df

这输出

    at_k       mae      ndcg      rmse
0      1  0.153102  0.546553  0.794357
1      2  0.882718  0.342260  0.762997
2      3  0.153298  0.695626  0.581455
3      4  0.073772  0.491996  0.384631
4      5  0.014066  0.369490  0.606842
5      6  0.892553  0.818312  0.396829
6      7  0.143114  0.739370  0.812050
7      8  0.847054  0.323221  0.932366
8      9  0.122838  0.613340  0.393237
9     10  0.645705  0.486312  0.138259
10    11  0.339063  0.223995  0.115242

耶!但是我们还不能用它来用 yhat 的 ggplot 进行绘图。按照这个例子,我们需要转换数据:

df2 = pd.melt(df[['at_k', 'mae', 'ndcg', 'rmse']], id_vars=['at_k'])
print df2

现在我们有这样的东西(截断):

    at_k variable     value
0      1      mae  0.153102
1      2      mae  0.882718
2      3      mae  0.153298
3      4      mae  0.073772
...
30     9     rmse  0.393237
31    10     rmse  0.138259
32    11     rmse  0.115242

现在是绘制的时候了:

ggplot(aes(x='at_k', y='value', colour='variable'), data=df2) +\
    geom_point()

在此处输入图像描述

于 2015-02-09T21:08:33.630 回答