python - 使用 MultiIndex 导出 Pandas DataFrame

Question

我刚刚发现了 pandas，它的功能给我留下了深刻的印象。我很难理解如何使用 MultiIndex 使用 DataFrame。

我有两个问题：

(1) 导出DataFrame

这是我的问题：这个数据集

import pandas as pd
import StringIO
d1 = StringIO.StringIO(
     """Gender,Employed,Region,Degree
     m,yes,east,ba
     m,yes,north,ba
     f,yes,south,ba
     f,no,east,ba
     f,no,east,bsc
     m,no,north,bsc
     m,yes,south,ma
     f,yes,west,phd
     m,no,west,phd
     m,yes,west,phd """
   )

df = pd.read_csv(d1)

# Frequencies tables
tab1 = pd.crosstab(df.Gender, df.Region)
tab2 = pd.crosstab(df.Gender, [df.Region, df.Degree])
tab3 = pd.crosstab([df.Gender, df.Employed], [df.Region, df.Degree])

# Now we export the datasets 
tab1.to_excel('H:/test_tab1.xlsx')  # OK 
tab2.to_excel('H:/test_tab2.xlsx') # fails 
tab3.to_excel('H:/test_tab3.xlsx') # fails

我能想到的一种解决方法是更改列（R 的方式）

def NewColums(DFwithMultiIndex):
       NewCol = []
       for item in DFwithMultiIndex.columns:
               NewCol.append('-'.join(item))
       return NewCol 

# New Columns 
tab2.columns = NewColums(tab2)
tab3.columns = NewColums(tab3)

# New export  
tab2.to_excel('H:/test_tab2.xlsx')  # OK
tab3.to_excel('H:/test_tab3.xlsx')  # OK

我的问题是：在我在文档中遗漏的 Pandas 中是否有更有效的方法来执行此操作？

2) 选择列

这种新结构不允许在给定变量上选择列（首先是分层索引的优势）。如何选择包含给定字符串的列（例如'-ba'）？

PS：我看到了这个相关的问题，但没有理解提出的回复

score 2 · Accepted Answer

这看起来像中的一个错误to_excel，目前我建议使用它作为一种解决方法to_csv（这似乎没有显示这个问题）。

我将此作为问题添加到 github 上。

要回答第二个问题，如果您确实需要使用to_excel...

您可以使用filter仅选择那些包含以下内容的列'-ba'：

In [21]: filter(lambda x: '-ba' in x, tab2.columns)
Out[21]: ['east-ba', 'north-ba', 'south-ba']

In [22]: tab2[filter(lambda x: '-ba' in x, tab2.columns)]
Out[22]: 
        east-ba  north-ba  south-ba
Gender                             
     f        1         0         1
     m        1         1         0

python - 使用 MultiIndex 导出 Pandas DataFrame

1 回答 1

Related

Reference