我编写了一个简单的代码来读取带有 pandas 的 read_csv 的 .csv(完全取决于 pandas 类型推断)。我收到错误消息:
arrow_table = pa.Table.from_pandas(df)"): Error converting to Python objects to String/UTF8
我在互联网上找不到任何有用的东西来解决这个问题。如何在 pyarrow.from_pandas(type= ...) 中使用 'type' 参数
谢谢你。
$ python pqwrite2.py
pyarrow version = 0.7.1
from_size = 298877474 bytes
sys:1: DtypeWarning: Columns (23,28) have mixed types. Specify dtype option on import or set low_memory=False.
id int64
...
pid object
mnemonic object
supplier_key float64
generic object
trade_name object
description object
strength object
form object
ndc object
note object
pack_size float64
pack_size_text object
pack_type object
route_description object
...
status object
hidden_flag object
updated float64
created_at object
updated_at object
medid object
dtype: object
write_to_parquet(df, parquet_output/h_billing_codes.SNAPPY.parquet, SNAPPY) ...
ERROR:root:2017-12-13 02:22:48 EXCEPTION IN (pqwrite2.py, LINE 23 "arrow_table = pa.Table.from_pandas(df)"): Error converting to Python objects to String/UTF8: Got Python object of type float but can only handle these types: str, bytes
2017-12-13 02:22:48 EXCEPTION IN (pqwrite2.py, LINE 23 "arrow_table = pa.Table.from_pandas(df)"): Error converting to Python objects to String/UTF8: Got Python object of type float but can only handle these types: str, bytes