我有两个要合并的 Pandas 数据框,例如:
dfA = pd.DataFrame({'id':[1,2,3,4,5,6,7,8,9,10],'colA':[3,4,2,4,3,4,5,4,5,6],'colB':[7,6,5,6,5,7,8,7,6,7],'colC':[False,True,True,False,False,True,False,True,True,True]})
dfB = pd.DataFrame({'id':[2,5,7,8],'colD':[1,9,7,3]})
print("Before\n====")
print('dfA dtypes\n------')
print(dfA.dtypes)
print('\ndfA\n---')
print(dfA)
print('\ndfB\n---')
print(dfB)
dfA = pd.merge(left=dfA,right=dfB,how='left',on='id')
print("\nAfter\n=====")
print(dfA)
这会产生以下输出:
Before
====
dfA dtypes
------
colA int64
colB int64
colC bool
id int64
dtype: object
dfA
---
colA colB colC id
0 3 7 False 1
1 4 6 True 2
2 2 5 True 3
3 4 6 False 4
4 3 5 False 5
5 4 7 True 6
6 5 8 False 7
7 4 7 True 8
8 5 6 True 9
9 6 7 True 10
dfB
---
colD id
0 1 2
1 9 5
2 7 7
3 3 8
After
=====
colA colB colC id colD
0 3 7 False 1 NaN
1 4 6 True 2 1.0
2 2 5 True 3 NaN
3 4 6 False 4 NaN
4 3 5 False 5 9.0
5 4 7 True 6 NaN
6 5 8 False 7 7.0
7 4 7 True 8 3.0
8 5 6 True 9 NaN
9 6 7 True 10 NaN
...这正是我所期望和想要的。但是,如果我在合并之前将其中一列转换为分类变量,如下所示:
dfA = pd.DataFrame({'id':[1,2,3,4,5,6,7,8,9,10],'colA':[3,4,2,4,3,4,5,4,5,6],'colB':[7,6,5,6,5,7,8,7,6,7],'colC':[False,True,True,False,False,True,False,True,True,True]})
dfA['colC'] = dfA['colC'].astype('category',categories=[True,False],ordered=True)
dfB = pd.DataFrame({'id':[2,5,7,8],'colD':[1,9,7,3]})
dfA = pd.merge(left=dfA,right=dfB,how='left',on='id')
...合并失败并出现错误:
/Users/.../env3/lib/python3.4/site-packages/pandas/core/internals.py in __init__(self, values, placement, ndim, fastpath)
104 ndim = values.ndim
105 elif values.ndim != ndim:
--> 106 raise ValueError('Wrong number of dimensions')
107 self.ndim = ndim
108
ValueError: Wrong number of dimensions
但是当我检查每个数据框的维度时(使用 df.ndim()),两者都有 2 个维度。为什么这种看似无害的更改会导致 pd.merge() 失败 - 或者我没有理解类别类型的用途?
我正在使用 Python 3.4.1 和 Pandas 0.20.1