1

我一直在尝试将一组 MySQL 表中的一些数据整理到带有 MultiIndex 的 Pandas DataFrame 中。表格大致是这样的:

create table team (
  teamID           integer        NOT NULL,
  teamName         varchar(64)    NOT NULL,
  primary key      (teamID));

create table coach (
  coachID          integer        NOT NULL,
  teamID           integer        NOT NULL,
  coachName        varchar(64)    NOT NULL,
  primary key      (coachID));

create table player (
  playerID         integer        NOT NULL,
  teamID           integer        NOT NULL,
  playerName       varchar(64)    NOT NULL,
  primary key      (playerID));

每支球队可以有一名或多名教练和一名或多名球员。

这是选择和合并:

import mysql.connector
connection = mysql.connector.connect(user='root', passwd='temp', database='mydb')
team =    sql.read_frame('select * from team;',   connection)
coach =   sql.read_frame('select * from coach;',  connection)
player =  sql.read_frame('select * from player;', connection)
connection.close()

df = pd.merge(
        pd.merge(team, coach,     on='teamID'),
        player,                   on='teamID')

DataFrame 现在看起来像这样:

In [2]: df
Out[2]: 
    teamID teamName  coachID         coachName  playerID      playerName
0        1      Red        1      Rachel Evans         1       Carol Lee
1        1      Red        1      Rachel Evans         2  Abigail O'Neil
2        1      Red        1      Rachel Evans         3      Becky Hood
3        1      Red        1      Rachel Evans         4  Bridget Sawyer
4        1      Red        2       Gladys Nenn         1       Carol Lee
5        1      Red        2       Gladys Nenn         2  Abigail O'Neil
6        1      Red        2       Gladys Nenn         3      Becky Hood
7        1      Red        2       Gladys Nenn         4  Bridget Sawyer
8        2    Green        3     Reina Stevens         5        Amy Reid
9        2    Green        3     Reina Stevens         6     Angie Costa
10       2    Green        3     Reina Stevens         7     Annie Reese
11       2    Green        3     Reina Stevens         8      Barbara Lo
12       2    Green        4         Jill Hunt         5        Amy Reid
13       2    Green        4         Jill Hunt         6     Angie Costa
14       2    Green        4         Jill Hunt         7     Annie Reese
15       2    Green        4         Jill Hunt         8      Barbara Lo
16       3     Blue        5       Lynn Peters         9    Alicia Green
17       3     Blue        5       Lynn Peters        10      Beth Spire
18       3     Blue        5       Lynn Peters        11  Candace Pierce
19       3     Blue        5       Lynn Peters        12    Carmen Jones
20       3     Blue        6  Stephanie Lenter         9    Alicia Green
21       3     Blue        6  Stephanie Lenter        10      Beth Spire
22       3     Blue        6  Stephanie Lenter        11  Candace Pierce
23       3     Blue        6  Stephanie Lenter        12    Carmen Jones

现在我想创建一个 MultiIndex 来塑造这个数据,使它看起来像这样:

In [2]: df
Out[2]: 
teamID teamName  coachID  coachName  playerID      playerName
    1  Red        1      Rachel Evans    1       Carol Lee
                  2      Gladys Nenn     2       Abigail O'Neil
                                         3       Becky Hood
                                         4       Bridget Sawyer
    2  Green      3      Reina Stevens   5       Amy Reid
                  4      Jill Hunt       6       Angie Costa
                                         7       Annie Reese
                                         8       Barbara Lo

我已经能够使用直接的 Python 来做到这一点,但我希望能够利用 Pandas 强大而简洁的索引功能。

添加以下内容

df.set_index(['teamID', 'teamName', 'coachID', 'coachName', 'playerID'], inplace=True)

使前四列分层。但最后两列仍然重复:

                                                       playerName
teamID teamName coachID coachName        playerID                
1      Red      1       Rachel Evans     1              Carol Lee
                                         2         Abigail O'Neil
                                         3             Becky Hood
                                         4         Bridget Sawyer
                2       Gladys Nenn      1              Carol Lee
                                         2         Abigail O'Neil
                                         3             Becky Hood
                                         4         Bridget Sawyer
2      Green    3       Reina Stevens    5               Amy Reid
                                         6            Angie Costa
                                         7            Annie Reese
                                         8             Barbara Lo
                4       Jill Hunt        5               Amy Reid
                                         6            Angie Costa
                                         7            Annie Reese
                                         8             Barbara Lo
4

0 回答 0