我有一个包含所有数字列的数据框:
import pandas as pd
import numpy as np
np.random.seed(1001)
df = pd.DataFrame(np.random.randn(10, 2), columns=['A', 'B'])
我想创建包含 和 的所有值的通用分A
位数B
。两者都有一些缺失值。创建公共分位数后,我想对数据框中的值进行编码,以根据值所在的分位数显示标签。我可以为每一列按列执行,但是如何在数据帧上执行呢?
我认为你可以先使用,然后:stack
DataFrame
qcut
unstack
import pandas as pd
import numpy as np
np.random.seed(1001)
df = pd.DataFrame(np.random.randn(10, 2), columns=['A', 'B'])
df.ix[0,'A'] = np.nan
df.ix[2,'A'] = np.nan
df.ix[3,'B'] = np.nan
print (df)
A B
0 NaN -0.896065
1 -0.306299 -1.339934
2 NaN -0.641727
3 1.307946 NaN
4 0.829115 -0.023299
5 -0.208564 -0.916620
6 -1.074743 -0.086143
7 1.175839 -1.635092
8 1.228194 1.076386
9 0.394773 -0.387701
bins = np.linspace(-1, 1, 5)
print (pd.qcut(df.stack(), bins).unstack())
A B
0 NaN (-1.635, -0.209]
1 (-1.635, -0.209] [-1.34, -0.0861]
2 NaN (-1.635, -0.209]
3 (-0.209, 1.308] NaN
4 (-0.209, 1.308] (-0.209, 1.308]
5 (-1.635, -0.209] (-1.635, -0.209]
6 (-1.635, -0.209] (-0.209, 1.308]
7 (-0.209, 1.308] NaN
8 (-0.209, 1.308] (-0.209, 1.308]
9 (-0.209, 1.308] (-1.635, -0.209]