python - 如何在特定级别重新排序多索引数据框列

Question

我有一个多索引DataFrame，其名称附加到列级别。我希望能够轻松地打乱列，使它们与用户指定的顺序相匹配。由于这是在管道中，我无法使用这个推荐的解决方案并在创建时正确订购它们。

我有一个看起来（有点）像的数据表

Experiment           BASE           IWWGCW         IWWGDW
Lead Time                24     48      24     48      24     48
2010-11-27 12:00:00   0.997  0.991   0.998  0.990   0.998  0.990
2010-11-28 12:00:00   0.998  0.987   0.997  0.990   0.997  0.990
2010-11-29 12:00:00   0.997  0.992   0.997  0.992   0.997  0.992
2010-11-30 12:00:00   0.997  0.987   0.997  0.987   0.997  0.987
2010-12-01 12:00:00   0.996  0.986   0.996  0.986   0.996  0.986

我想接受一个类似的列表['IWWGCW', 'IWWGDW', 'BASE']并将其重新排序为：

Experiment           IWWGCW         IWWGDW         BASE           
Lead Time                24     48      24     48      24     48  
2010-11-27 12:00:00   0.998  0.990   0.998  0.990   0.997  0.991  
2010-11-28 12:00:00   0.997  0.990   0.997  0.990   0.998  0.987  
2010-11-29 12:00:00   0.997  0.992   0.997  0.992   0.997  0.992  
2010-11-30 12:00:00   0.997  0.987   0.997  0.987   0.997  0.987  
2010-12-01 12:00:00   0.996  0.986   0.996  0.986   0.996  0.986

需要注意的是，我并不总是知道“实验”会达到什么水平。我试过了（df上面显示的多索引框架在哪里）

df2 = df.reindex_axis(['IWWGCW', 'IWWGDW', 'BASE'], axis=1, level='Experiment')

但这似乎不起作用 - 它成功完成，但返回的 DataFrame 的列顺序未更改。

我的解决方法是具有如下功能：

def reorder_columns(frame, column_name, new_order):
    """Shuffle the specified columns of the frame to match new_order."""

    index_level  = frame.columns.names.index(column_name)
    new_position = lambda t: new_order.index(t[index_level])
    new_index    = sorted(frame.columns, key=new_position)
    new_frame    = frame.reindex_axis(new_index, axis=1)
    return new_frame

我的期望在哪里reorder_columns(df, 'Experiment', ['IWWGCW', 'IWWGDW', 'BASE'])，但感觉就像我在做额外的工作。有没有更简单的方法来做到这一点？

score 30 · Accepted Answer

有一个非常简单的方法：只需在原始数据框的基础上创建一个新的数据框，并使用正确的多索引列顺序：

multi_tuples = [('IWWGCW',24), ('IWWGCW',48), ('IWWGDW',24), ('IWWGDW',48)
    , ('BASE',24), ('BASE',48)]

multi_cols = pd.MultiIndex.from_tuples(multi_tuples, names=['Experiment', 'Lead Time'])

df_ordered_multi_cols = pd.DataFrame(df_ori, columns=multi_cols)

score 12 · Accepted Answer

这是对我有用的最简单的一个：

对于您选择的级别，创建一个包含所需顺序的列的列表；
重新索引您的列并从该列表创建一个 MultiIndex 对象，请记住这将返回一个元组；
使用 MultiIndex 对象重新排序您的 DataFrame。

cols = ['IWWGCW', 'IWWGDW', 'BASE']
new_cols = df.columns.reindex(cols, level=0)
df.reindex(columns=new_cols[0]) #new_cols is a single item tuple

在一行中：

df.reindex(columns=df.columns.reindex(['IWWGCW', 'IWWGDW', 'BASE'], level=0)[0])

瞧

score 8 · Accepted Answer

我不知道任何手头的东西。创建了一张关于它的增强票：

http://github.com/pydata/pandas/issues/1864

score 3 · Accepted Answer

使用 pandas 1.3.2 的我上面评论的解决方案：

df.reindex(columns=['IWWGCW', 'IWWGDW', 'BASE'], level='Experiment')

score 2 · Accepted Answer

andrew_reece的评论应该是公认的答案。只需使用reindex()。

从github 问题复制和粘贴：

>>> df
                     vals
first second third       
mid   3rd    992     1.96
             562    12.06
      1st    73     -6.46
             818   -15.75
             658     5.90
btm   2nd    915     9.75
             474    -1.47
             905    -6.03
      1st    717     8.01
             909   -21.12
      3rd    616    11.91
             675     1.06
             579    -4.01
top   1st    241     1.79
             363     1.71
      3rd    677    13.38
             238   -16.77
             407    17.19
      2nd    728   -21.55
             36      8.09
>>> df.reindex(['top', 'mid', 'btm'], level='first')
                     vals
first second third       
top   1st    241     1.79
             363     1.71
      3rd    677    13.38
             238   -16.77
             407    17.19
      2nd    728   -21.55
             36      8.09
mid   3rd    992     1.96
             562    12.06
      1st    73     -6.46
             818   -15.75
             658     5.90
btm   2nd    915     9.75
             474    -1.47
             905    -6.03
      1st    717     8.01
             909   -21.12
      3rd    616    11.91
             675     1.06
             579    -4.01
>>> df.reindex(['1st', '2nd', '3rd'], level='second')
                     vals
first second third       
mid   1st    73     -6.46
             818   -15.75
             658     5.90
      3rd    992     1.96
             562    12.06
btm   1st    717     8.01
             909   -21.12
      2nd    915     9.75
             474    -1.47
             905    -6.03
      3rd    616    11.91
             675     1.06
             579    -4.01
top   1st    241     1.79
             363     1.71
      2nd    728   -21.55
             36      8.09
      3rd    677    13.38
             238   -16.77
             407    17.19
>>> df.reindex(['top', 'btm'], level='first').reindex(['1st', '2nd'], level='second')
                     vals
first second third       
top   1st    241     1.79
             363     1.71
      2nd    728   -21.55
             36      8.09
btm   1st    717     8.01
             909   -21.12
      2nd    915     9.75
             474    -1.47
             905    -6.03

python - 如何在特定级别重新排序多索引数据框列

5 回答 5

Related

Reference