python - 重塑，重新组合行与列

Question

我有这样的数据集：

Block   Vector
blk_-1  0.0 2 3, 0.5 3 8, 0.7 33 5
blk_-2  1.0 4 1, 2.0 2 4
blk_-3  0.0 0 0, 6.0 0 7
blk_-4  8.0 3 0, 7.0 5 8
blk_-5  9.0 0 5, 5.0 0 2, 5.2 3 2, 5.9 5 3

dat = {'Block': ['blk_-1', 'blk_-2', 'blk_-3', 'blk_-4',  'blk_-5'],\
        'Vector': ['0.0 2 3, 0.5 3 8, 0.7 33 5',\
                   '1.0 4 1, 2.0 2 4',\
                   '0.0 0 0, 6.0 0 7',\
                   '8.0 3 0, 7.0 5 8',\
                  '9.0 0 5, 5.0 0 2, 5.2 3 2, 5.9 5 3']
       }

我想得到：

Block   Vector

blk_-1   0.0 2 3

blk_-1   0.5 3 8

blk_-1   0.7 33 5

blk_-2   1.0 4 1

blk_-2   2.0 2 4

blk_-3   0.0 0 0

blk_-3   6.0 0 7

blk_-4   8.0 3 0

blk_-4   7.0 5 8

blk_-5   9.0 0 5

blk_-5   5.0 0 2

blk_-5   5.2 3 2    

blk_-5   5.9 5 3

尝试：

df['Vector'] = df['Vector'].apply(lambda x : list(map(str, x.split(','))))

df.Vector.apply(pd.Series) \
    .merge(df, left_index = True, right_index = True) \
    .drop(["Vector"], axis = 1)

得到：

    0   1   2   3                   Block
0   0.0 2 3 0.5 3 8 0.7 33 5    NaN blk_-1
1   1.0 4 1 2.0 2 4 NaN NaN         blk_-2
2   0.0 0 0 6.0 0 7 NaN NaN         blk_-3
3   8.0 3 0 7.0 5 8 NaN NaN         blk_-4
4   9.0 0 5 5.0 0 2 5.2 3 2 5.9 5 3         blk_-5

真的是卡在了这一刻。等待您的想法和意见:)

score 3 · Accepted Answer

您可以使用拆分、分解和连接。

df[['Block']].join(df.Vector.str.split(',').explode())

    Block   Vector
0   blk_-1  0.0 2 3
0   blk_-1  0.5 3 8
0   blk_-1  0.7 33 5
1   blk_-2  1.0 4 1
1   blk_-2  2.0 2 4
2   blk_-3  0.0 0 0
2   blk_-3  6.0 0 7
3   blk_-4  8.0 3 0
3   blk_-4  7.0 5 8
4   blk_-5  9.0 0 5
4   blk_-5  5.0 0 2
4   blk_-5  5.2 3 2
4   blk_-5  5.9 5 3

score 2 · Accepted Answer

pandas 0.25+ 的解决方案 -Series.str.split列并分配回DataFrame.assign，使用DataFrame.explode和最后一个默认索引DataFrame.reset_index添加drop=True：

df = pd.DataFrame(dat)

df = df.assign(Vector=df['Vector'].str.split(',')).explode('Vector').reset_index(drop=True)
print (df)
     Block     Vector
0   blk_-1    0.0 2 3
1   blk_-1    0.5 3 8
2   blk_-1   0.7 33 5
3   blk_-2    1.0 4 1
4   blk_-2    2.0 2 4
5   blk_-3    0.0 0 0
6   blk_-3    6.0 0 7
7   blk_-4    8.0 3 0
8   blk_-4    7.0 5 8
9   blk_-5    9.0 0 5
10  blk_-5    5.0 0 2
11  blk_-5    5.2 3 2
12  blk_-5    5.9 5 3

旧熊猫版本的版本 - 使用pop+ split+ stack+ reset_index+rename新的Series，然后join是原始的：

df = (df.join(df.pop('Vector')
                .str.split(',',expand=True)
                .stack()
                .reset_index(level=1, drop=True)
                .rename('Vector')).reset_index(drop=True))
print (df)
     Block     Vector
0   blk_-1    0.0 2 3
1   blk_-1    0.5 3 8
2   blk_-1   0.7 33 5
3   blk_-2    1.0 4 1
4   blk_-2    2.0 2 4
5   blk_-3    0.0 0 0
6   blk_-3    6.0 0 7
7   blk_-4    8.0 3 0
8   blk_-4    7.0 5 8
9   blk_-5    9.0 0 5
10  blk_-5    5.0 0 2
11  blk_-5    5.2 3 2
12  blk_-5    5.9 5 3

score 1 · Accepted Answer

对于低于版本.25：

final=df.merge(df['Vector'].str.split(',',expand=True).stack().reset_index(0,name='Vector'),
left_index=True,right_on='level_0',suffixes=('_x','')).drop(['level_0','Vector_x'],1)
print(final)

    Block     Vector
0  blk_-1    0.0 2 3
1  blk_-1    0.5 3 8
2  blk_-1   0.7 33 5
0  blk_-2    1.0 4 1
1  blk_-2    2.0 2 4
0  blk_-3    0.0 0 0
1  blk_-3    6.0 0 7
0  blk_-4    8.0 3 0
1  blk_-4    7.0 5 8
0  blk_-5    9.0 0 5
1  blk_-5    5.0 0 2
2  blk_-5    5.2 3 2
3  blk_-5    5.9 5 3

python - 重塑，重新组合行与列

3 回答 3

Related

Reference