2

我试图分析一组数据集。但是,我找不到有效展示的方法。我想也许groupby()可以解决它,但我打算一次显示所有表格,我不知道应该如何说明它。我的另一个解决方案是在比较中显示每一列;第一,第二,然后第三。这是我主要想要实现的,例如:

                  Mean  Std   Max  Min
First_Result_Set
Second_Result_Set
Third_Result_Set

这是我的另一个解决方案(可能不好):

                               Mean  Std   Max  Min
First_Result_Set_first_column
Second_Result_Set_first_column
Third_Result_Set_first_column

任何建议或解决方案都会有所帮助。代码:

def analyse_data(self, np_array, raw=45, column=3):
    df = pd.DataFrame(np_array.reshape(raw, column),
                      columns=("Time", "Random Score", "AI Score"))
    data_result = df.describe()
    print(data_result)
    return data_result

analyse_cache_ab_classes_depth_5 = file.analyse_data(cache_ab_classes_depth_5)

analyse_cache_ab_classes_depth_4 = file.analyse_data(cache_ab_classes_depth_4)

analyse_cache_ab_classes_depth_3 = file.analyse_data(cache_ab_classes_depth_3)

输出:

            Time  Random Score   AI Score
count  45.000000     45.000000  45.000000
mean    1.054444      2.355556  12.488889
std     0.423377      2.496867   7.225656
min     0.400000      0.000000   0.000000
25%     0.850000      0.000000   6.000000
50%     0.960000      2.000000  14.000000
75%     1.180000      4.000000  16.000000
max     2.620000      8.000000  28.000000

            Time  Random Score   AI Score
count  45.000000     45.000000  45.000000
mean    2.021333      5.644444  35.288889
std     0.889095      4.270169  12.764692
min     0.780000      0.000000  12.000000
25%     1.310000      2.000000  28.000000
50%     1.780000      4.000000  34.000000
75%     2.590000      8.000000  42.000000
max     4.220000     18.000000  76.000000

            Time  Random Score   AI Score
count  45.000000     45.000000  45.000000
mean    0.207333      1.822222  15.333333
std     0.077295      2.124413   6.993503
min     0.110000      0.000000   4.000000
25%     0.150000      0.000000  10.000000
50%     0.180000      2.000000  16.000000
75%     0.250000      2.000000  20.000000
max     0.380000     10.000000  30.000000
4

1 回答 1

3

考虑将您的 DF 收集到面板中:

In [149]: p = pd.Panel({'d1':d1, 'd2':d2, 'd3':d3})

In [150]: p.axes
Out[150]:
[Index(['d1', 'd2', 'd3'], dtype='object'),
 Index(['count', 'mean', 'std', 'min', '25%', '50%', '75%', 'max'], dtype='object'),
 Index(['Time', 'Random Score', 'AI Score'], dtype='object')]

In [151]: p.loc['d1']
Out[151]:
            Time  Random Score   AI Score
count  45.000000     45.000000  45.000000
mean    1.054444      2.355556  12.488889
std     0.423377      2.496867   7.225656
min     0.400000      0.000000   0.000000
25%     0.850000      0.000000   6.000000
50%     0.960000      2.000000  14.000000
75%     1.180000      4.000000  16.000000
max     2.620000      8.000000  28.000000

In [152]: p.loc[:, 'mean']
Out[152]:
                     d1         d2         d3
Time           1.054444   2.021333   0.207333
Random Score   2.355556   5.644444   1.822222
AI Score      12.488889  35.288889  15.333333

In [153]: p.loc[:, :, 'AI Score']
Out[153]:
              d1         d2         d3
count  45.000000  45.000000  45.000000
mean   12.488889  35.288889  15.333333
std     7.225656  12.764692   6.993503
min     0.000000  12.000000   4.000000
25%     6.000000  28.000000  10.000000
50%    14.000000  34.000000  16.000000
75%    16.000000  42.000000  20.000000
max    28.000000  76.000000  30.000000

或者,您可以构建一个多索引 DF,类似于以下内容:

In [154]: p.to_frame()
Out[154]:
                           d1         d2         d3
major minor
count Time          45.000000  45.000000  45.000000
      Random Score  45.000000  45.000000  45.000000
      AI Score      45.000000  45.000000  45.000000
mean  Time           1.054444   2.021333   0.207333
      Random Score   2.355556   5.644444   1.822222
      AI Score      12.488889  35.288889  15.333333
std   Time           0.423377   0.889095   0.077295
      Random Score   2.496867   4.270169   2.124413
      AI Score       7.225656  12.764692   6.993503
min   Time           0.400000   0.780000   0.110000
...                       ...        ...        ...
25%   AI Score       6.000000  28.000000  10.000000
50%   Time           0.960000   1.780000   0.180000
      Random Score   2.000000   4.000000   2.000000
      AI Score      14.000000  34.000000  16.000000
75%   Time           1.180000   2.590000   0.250000
      Random Score   4.000000   8.000000   2.000000
      AI Score      16.000000  42.000000  20.000000
max   Time           2.620000   4.220000   0.380000
      Random Score   8.000000  18.000000  10.000000
      AI Score      28.000000  76.000000  30.000000

[24 rows x 3 columns]

或者

                    count       mean        std    min    25%    50%    75%    max
major minor
d1    Time           45.0   1.054444   0.423377   0.40   0.85   0.96   1.18   2.62
      Random Score   45.0   2.355556   2.496867   0.00   0.00   2.00   4.00   8.00
      AI Score       45.0  12.488889   7.225656   0.00   6.00  14.00  16.00  28.00
d2    Time           45.0   2.021333   0.889095   0.78   1.31   1.78   2.59   4.22
      Random Score   45.0   5.644444   4.270169   0.00   2.00   4.00   8.00  18.00
      AI Score       45.0  35.288889  12.764692  12.00  28.00  34.00  42.00  76.00
d3    Time           45.0   0.207333   0.077295   0.11   0.15   0.18   0.25   0.38
      Random Score   45.0   1.822222   2.124413   0.00   0.00   2.00   2.00  10.00
      AI Score       45.0  15.333333   6.993503   4.00  10.00  16.00  20.00  30.00
于 2018-03-31T22:27:23.440 回答