- OP 中的方法有效,但效率不高。它可能似乎永远运行,因为数据集很长。
.groupby
在列上使用,并使用唯一值作为键'method'
创建 a dict
,并使用.
DataFrames
'method'
dict-comprehension
.groupby
返回一个groupby
对象,其中包含有关组的信息,其中g
是'method'
每个组的唯一值,并且d
是该DataFrame
组的。
- 每个in 的
value
,将是一个, 可以以标准方式访问 , 。key
df_dict
DataFrame
df_dict['key']
- 原来的问题想要 a
list
of DataFrames
,可以用 alist-comprehension
df_list = [d for _, d in df.groupby('method')]
import pandas as pd
import seaborn as sns # for test dataset
# load data for example
df = sns.load_dataset('planets')
# display(df.head())
method number orbital_period mass distance year
0 Radial Velocity 1 269.300 7.10 77.40 2006
1 Radial Velocity 1 874.774 2.21 56.95 2008
2 Radial Velocity 1 763.000 2.60 19.84 2011
3 Radial Velocity 1 326.030 19.40 110.62 2007
4 Radial Velocity 1 516.220 10.50 119.47 2009
# Using a dict-comprehension, the unique 'method' value will be the key
df_dict = {g: d for g, d in df.groupby('method')}
print(df_dict.keys())
[out]:
dict_keys(['Astrometry', 'Eclipse Timing Variations', 'Imaging', 'Microlensing', 'Orbital Brightness Modulation', 'Pulsar Timing', 'Pulsation Timing Variations', 'Radial Velocity', 'Transit', 'Transit Timing Variations'])
# or a specific name for the key, using enumerate (e.g. df1, df2, etc.)
df_dict = {f'df{i}': d for i, (g, d) in enumerate(df.groupby('method'))}
print(df_dict.keys())
[out]:
dict_keys(['df0', 'df1', 'df2', 'df3', 'df4', 'df5', 'df6', 'df7', 'df8', 'df9'])
df_dict['df1].head(3)
或者df_dict['Astrometry'].head(3)
- 这个组只有2个
method number orbital_period mass distance year
113 Astrometry 1 246.36 NaN 20.77 2013
537 Astrometry 1 1016.00 NaN 14.98 2010
df_dict['df2].head(3)
或者df_dict['Eclipse Timing Variations'].head(3)
method number orbital_period mass distance year
32 Eclipse Timing Variations 1 10220.0 6.05 NaN 2009
37 Eclipse Timing Variations 2 5767.0 NaN 130.72 2008
38 Eclipse Timing Variations 2 3321.0 NaN 130.72 2008
df_dict['df3].head(3)
或者df_dict['Imaging'].head(3)
method number orbital_period mass distance year
29 Imaging 1 NaN NaN 45.52 2005
30 Imaging 1 NaN NaN 165.00 2007
31 Imaging 1 NaN NaN 140.00 2004
或者
DataFrames
这是使用熊猫创建单独的手动方法:布尔索引
- 这类似于接受的答案,但
.loc
不是必需的。
- 这是创建一对额外的可接受的方法
DataFrames
。
- 创建多个对象的 Pythonic 方法是将它们放在容器中(例如
dict
、list
、generator
等),如上所示。
df1 = df[df.method == 'Astrometry']
df2 = df[df.method == 'Eclipse Timing Variations']