我一直在尝试将一组 MySQL 表中的一些数据整理到带有 MultiIndex 的 Pandas DataFrame 中。表格大致是这样的:
create table team (
teamID integer NOT NULL,
teamName varchar(64) NOT NULL,
primary key (teamID));
create table coach (
coachID integer NOT NULL,
teamID integer NOT NULL,
coachName varchar(64) NOT NULL,
primary key (coachID));
create table player (
playerID integer NOT NULL,
teamID integer NOT NULL,
playerName varchar(64) NOT NULL,
primary key (playerID));
每支球队可以有一名或多名教练和一名或多名球员。
这是选择和合并:
import mysql.connector
connection = mysql.connector.connect(user='root', passwd='temp', database='mydb')
team = sql.read_frame('select * from team;', connection)
coach = sql.read_frame('select * from coach;', connection)
player = sql.read_frame('select * from player;', connection)
connection.close()
df = pd.merge(
pd.merge(team, coach, on='teamID'),
player, on='teamID')
DataFrame 现在看起来像这样:
In [2]: df
Out[2]:
teamID teamName coachID coachName playerID playerName
0 1 Red 1 Rachel Evans 1 Carol Lee
1 1 Red 1 Rachel Evans 2 Abigail O'Neil
2 1 Red 1 Rachel Evans 3 Becky Hood
3 1 Red 1 Rachel Evans 4 Bridget Sawyer
4 1 Red 2 Gladys Nenn 1 Carol Lee
5 1 Red 2 Gladys Nenn 2 Abigail O'Neil
6 1 Red 2 Gladys Nenn 3 Becky Hood
7 1 Red 2 Gladys Nenn 4 Bridget Sawyer
8 2 Green 3 Reina Stevens 5 Amy Reid
9 2 Green 3 Reina Stevens 6 Angie Costa
10 2 Green 3 Reina Stevens 7 Annie Reese
11 2 Green 3 Reina Stevens 8 Barbara Lo
12 2 Green 4 Jill Hunt 5 Amy Reid
13 2 Green 4 Jill Hunt 6 Angie Costa
14 2 Green 4 Jill Hunt 7 Annie Reese
15 2 Green 4 Jill Hunt 8 Barbara Lo
16 3 Blue 5 Lynn Peters 9 Alicia Green
17 3 Blue 5 Lynn Peters 10 Beth Spire
18 3 Blue 5 Lynn Peters 11 Candace Pierce
19 3 Blue 5 Lynn Peters 12 Carmen Jones
20 3 Blue 6 Stephanie Lenter 9 Alicia Green
21 3 Blue 6 Stephanie Lenter 10 Beth Spire
22 3 Blue 6 Stephanie Lenter 11 Candace Pierce
23 3 Blue 6 Stephanie Lenter 12 Carmen Jones
现在我想创建一个 MultiIndex 来塑造这个数据,使它看起来像这样:
In [2]: df
Out[2]:
teamID teamName coachID coachName playerID playerName
1 Red 1 Rachel Evans 1 Carol Lee
2 Gladys Nenn 2 Abigail O'Neil
3 Becky Hood
4 Bridget Sawyer
2 Green 3 Reina Stevens 5 Amy Reid
4 Jill Hunt 6 Angie Costa
7 Annie Reese
8 Barbara Lo
我已经能够使用直接的 Python 来做到这一点,但我希望能够利用 Pandas 强大而简洁的索引功能。
添加以下内容
df.set_index(['teamID', 'teamName', 'coachID', 'coachName', 'playerID'], inplace=True)
使前四列分层。但最后两列仍然重复:
playerName
teamID teamName coachID coachName playerID
1 Red 1 Rachel Evans 1 Carol Lee
2 Abigail O'Neil
3 Becky Hood
4 Bridget Sawyer
2 Gladys Nenn 1 Carol Lee
2 Abigail O'Neil
3 Becky Hood
4 Bridget Sawyer
2 Green 3 Reina Stevens 5 Amy Reid
6 Angie Costa
7 Annie Reese
8 Barbara Lo
4 Jill Hunt 5 Amy Reid
6 Angie Costa
7 Annie Reese
8 Barbara Lo