使用pyarrow 0.6.0(或更低版本),以下代码段会导致 Python 解释器崩溃:
data = pd.DataFrame({'a': [1, True]})
pa.Table.from_pandas(data)
“Python 解释器已停止工作”(在 windows 下)
经过一些调查,根据这个Jira 问题在pyarrow 0.7.0中解决了这个问题,更准确地说,这个提交使用与问题中相同的代码段,现在我们没有使解释器崩溃,而是得到以下错误:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "table.pxi", line 755, in pyarrow.lib.Table.from_pandas
File "C:\Temp\tt\Tools\Anaconda3.4.3.1\envs\GMF_test3\lib\site-packages\pyarrow\pandas_compat.py", line 227, in dataframe_to_arrays
col, type=type, timestamps_to_ms=timestamps_to_ms
File "array.pxi", line 225, in pyarrow.lib.Array.from_pandas
File "error.pxi", line 77, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Error converting from Python objects to Int64: Got Python object of type bool but can only handle these ty
pes: integer
解决此问题的一种可能性是当您掌握数据时,在发生异常时转换具有混合 dtype 的列,如下所示(并且可能记录异常,因为这不是常见错误):
import pandas as pd
import pyarrow as pa
import logging
logger = logging.getLogger(__name__)
data = pd.DataFrame({'a': [1, True], 'b': [1, 2]})
def convert_type_if_needed(type_to_select, df, col_name):
types = []
for i in df[col_name]:
types.append(type(i))
if type_to_select in types:
return df.astype({col_name: type_to_select})
else:
raise TypeError(str(type_to_select) + " is not in the dataframe, conversion impossible")
try:
table = pa.Table.from_pandas(data)
except pa.lib.ArrowInvalid as e:
logger.warning(e)
data = convert_type_if_needed(int, data, 'a')
table = pa.Table.from_pandas(data)
print(table)
最终产生:
pyarrow.Table
Error converting from Python objects to Int64: Got Python object of type bool but can only handle these types: integer
a: int32
b: int64
__index_level_0__: int64
metadata
--------
{b'pandas': b'{"columns": [{"name": "a", "numpy_type": "int32", "pandas_type":'
b' "int32", "metadata": null}, {"name": "b", "numpy_type": "int64"'
b', "pandas_type": "int64", "metadata": null}, {"name": "__index_l'
b'evel_0__", "numpy_type": "int64", "pandas_type": "int64", "metad'
b'ata": null}], "index_columns": ["__index_level_0__"], "pandas_ve'
b'rsion": "0.20.3"}'}