我有一个名为“combi”的数据框(大小:(1771077, 38))。当我尝试运行以下代码时:
dum = ['Date_ID', 'Distribution_Type','Fixed_CostFactor','Product_Description','Order_ID','Quantity','Amount','SalesMgr_Name', 'Sales_Type', 'Product_Description','Product_Category', 'Product_Group', 'Brand', Unit_of_Measure', 'Pack_Type','Cost_Price', 'Sales_Price', PlantName','City_Name','Population', 'Customer_Name', 'Customer_Since', 'Industry', 'Customer_Group','Month_Name', 'Quarter_No']
dummies = pd.get_dummies(combi[dum])
它给出了这个错误:
ValueErrorTraceback (most recent call last)
<ipython-input-19-1227d2ff4df6> in <module>()
4 'Industry', 'Customer_Group','Month_Name', 'Quarter_No']
5
----> 6 dummies = pd.get_dummies(combi[dum])
7 dummies.columns
/usr/lib64/python2.7/site-packages/pandas/core/reshape/reshape.pyc in get_dummies(data, prefix, prefix_sep, dummy_na, columns, sparse, drop_first)
1206 dummy = _get_dummies_1d(data[col], prefix=pre, prefix_sep=sep,
1207 dummy_na=dummy_na, sparse=sparse,
-> 1208 drop_first=drop_first)
1209 with_dummies.append(dummy)
1210 result = concat(with_dummies, axis=1)
/usr/lib64/python2.7/site-packages/pandas/core/reshape/reshape.pyc in _get_dummies_1d(data, prefix, prefix_sep, dummy_na, sparse, drop_first)
1218 sparse=False, drop_first=False):
1219 # Series avoids inconsistent NaN handling
-> 1220 codes, levels = _factorize_from_iterable(Series(data))
1221
1222 def get_empty_Frame(data, sparse):
/usr/lib64/python2.7/site-packages/pandas/core/series.pyc in __init__(self, data, index, dtype, name, copy, fastpath)
246 else:
247 data = _sanitize_array(data, index, dtype, copy,
--> 248 raise_cast_failure=True)
249
250 data = SingleBlockManager(data, index, fastpath=True)
/usr/lib64/python2.7/site-packages/pandas/core/series.pyc in _sanitize_array(data, index, dtype, copy, raise_cast_failure)
3027 raise Exception('Data must be 1-dimensional')
3028 else:
-> 3029 subarr = _asarray_tuplesafe(data, dtype=dtype)
3030
3031 # This is to prevent mixed-type Series getting all casted to
/usr/lib64/python2.7/site-packages/pandas/core/common.pyc in _asarray_tuplesafe(values, dtype)
378 except ValueError:
379 # we have a list-of-list
--> 380 result[:] = [tuple(x) for x in values]
381
382 return result
ValueError: cannot copy sequence with size 2 to array axis with dimension 1771077
但是当我运行它时,它没有给出任何错误:
dummies = pd.get_dummies(combi)
有人可以告诉我出了什么问题,我该如何解决?我希望只使用原始列的一个子集。