3

我有一个这样的数据框:

df = pd.DataFrame({'name': ['toto', 'tata', 'tati'], 'choices': 0})
df['choices'] = df['choices'].astype(object)
df['choices'][0] = [1,2,3]
df['choices'][1] = [5,4,3,1]
df['choices'][2] = [6,3,2,1,5,4]

print(df)

             choices  name
0           [1, 2, 3]  toto
1        [5, 4, 3, 1]  tata
2  [6, 3, 2, 1, 5, 4]  tati

我想像这样基于 df 构建一个 DataFrame

             choice  rank  name
0                 1     0  toto
1                 2     1  toto
2                 3     2  toto
3                 5     0  tata
4                 4     1  tata
5                 3     2  tata
6                 1     3  tata
7                 6     0  tati
8                 3     1  tati
9                 2     2  tati
10                1     3  tati
11                5     4  tati
12                4     5  tati

我想使用每个值的列表和索引填充新行。

我做了这个

size = df['choices'].map(len).sum()
df2 = pd.DataFrame(index=range(size), columns=df.columns)
del df2['choices']
df2['choice'] = np.nan
df2['rank'] = np.nan

k = 0
for i in df.index:
    choices = df['choices'][i]
    for rank, choice in enumerate(choices):
        df2['name'][k] = df['name'][i]
        df2['choice'][k] = choice
        df2['rank'][k] = rank
        k += 1

但我更喜欢矢量化解决方案。Python/Pandas 有可能吗?

4

1 回答 1

5
In [4]: s = df.choices.apply(Series).stack()

In [5]: s.name = 'choices' # needs a name to join

In[6]: del df['choices']

In[7]: df1 = df.join(s.reset_index(level=1))

In[8]: df1.columns = ['name', 'rank', 'choice']

In [9]: df1.sort(['name', 'rank']).reset_index(drop=True)
Out[9]: 
    name  rank  choice
0   tata     0       5
1   tata     1       4
2   tata     2       3
3   tata     3       1
4   tati     0       6
5   tati     1       3
6   tati     2       2
7   tati     3       1
8   tati     4       5
9   tati     5       4
10  toto     0       1
11  toto     1       2
12  toto     2       3

这与我的这个解决方案有关,但在您的情况下,您使用的是索引(排名)而不是删除它。

于 2013-09-30T20:11:32.470 回答