0

将这种数据结构存储在 pandas 中的惯用方法是什么:

### Option 1
df = pd.DataFrame(data = [
    {'kws' : np.array([0,0,0]), 'x' : i, 'y', i} for i in range(10)
])

# df.x and df.y works as expected
# the list and array casting is required because df.kws is
# an array of arrays
np.array(list(df.kws))

# this causes problems when trying to assign as well though:
# for any other data type, this would set all kws in df to the rhs [1,2,3]
# but since the rhs is a list, it tried to do an element-wise assignment and
# errors saying that the length of df and the length of the rhs do not match
df.kws = [1,2,3]

### Option 2
df = pd.DataFrame(data = [
    {'kw_0' : 0, 'kw_1' : 0, 'kw_2' : 0, 'x' : i, 'y', i} for i in range(10)
])

# retrieving 2d array:
df[sorted([c for c in df if c.startswith('kw_')])].values

# batch set :
kws = [1,2,3]
for i, kw in enumerate(kws) :
    df['kw_'+i] = kw

这些解决方案对我来说都不合适。一方面,它们都不允许在不复制所有数据的情况下检索二维矩阵。有没有更好的方法来处理这种混合维度的数据,或者这只是 pandas 目前还没有完成的任务?

4

1 回答 1

1

只需使用列多索引,文档

In [31]: df = pd.DataFrame([ {'kw_0' : 0, 'kw_1' : 0, 'kw_2' : 0, 'x' : i, 'y': i} for i in range(10) ])

In [32]: df
Out[32]: 
   kw_0  kw_1  kw_2  x  y
0     0     0     0  0  0
1     0     0     0  1  1
2     0     0     0  2  2
3     0     0     0  3  3
4     0     0     0  4  4
5     0     0     0  5  5
6     0     0     0  6  6
7     0     0     0  7  7
8     0     0     0  8  8
9     0     0     0  9  9

In [33]: df.columns = MultiIndex.from_tuples([('kw',0),('kw',1),('kw',2),('value','x'),('value','y')])

In [34]: df
Out[34]: 
   kw        value   
    0  1  2      x  y
0   0  0  0      0  0
1   0  0  0      1  1
2   0  0  0      2  2
3   0  0  0      3  3
4   0  0  0      4  4
5   0  0  0      5  5
6   0  0  0      6  6
7   0  0  0      7  7
8   0  0  0      8  8
9   0  0  0      9  9

选择很容易

In [35]: df['kw']
Out[35]: 
   0  1  2
0  0  0  0
1  0  0  0
2  0  0  0
3  0  0  0
4  0  0  0
5  0  0  0
6  0  0  0
7  0  0  0
8  0  0  0
9  0  0  0

设置太

In [36]: df.loc[1,'kw'] = [4,5,6]

In [37]: df
Out[37]: 
   kw        value   
    0  1  2      x  y
0   0  0  0      0  0
1   4  5  6      1  1
2   0  0  0      2  2
3   0  0  0      3  3
4   0  0  0      4  4
5   0  0  0      5  5
6   0  0  0      6  6
7   0  0  0      7  7
8   0  0  0      8  8
9   0  0  0      9  9

或者,您可以使用 2 个数据帧,索引相同,并在需要时合并/合并。

于 2013-11-06T17:34:56.077 回答