0

我需要将带有字典值的字典转换为镶木地板,我的数据如下所示:

{"KEY":{"2018-12-06":250.0,"2018-12-07":234.0}}

我正在转换为 pandas 数据框,然后写入 pyarrow 表:

import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq

data = {"KEY":{"2018-12-06":250.0,"2018-12-07":234.0}}
df = pd.DataFrame.from_dict(data, orient='index')
table = pa.Table.from_pandas(df, preserve_index=False)
pq.write_table(table, 'file.parquet', flavor='spark')

我最终得到的数据只有日期和值,但没有字典的键:

{"2018-12-06":250.0,"2018-12-07":234.0}

我需要的是也有数据的关键:

{"KEY": {"2018-12-06":250.0,"2018-12-07":234.0}}
4

2 回答 2

3

如果您想保留索引,那么您应该这样指定;设置preserve_index=True

table = pa.Table.from_pandas(df, preserve_index=True)

pq.write_table(table, 'file.parquet', flavor='spark')
pq.read_table('file.parquet').to_pandas()  # Index is preserved.

     2018-12-06  2018-12-07
KEY       250.0       234.0
于 2018-12-05T21:14:44.480 回答
0

我正在观察一个相关但独立的问题,其中 DateTimeIndex 的频率类型在从熊猫到表的往返过程中没有保留。

例如:

    >>> import pandas as pd
    >>> import pyarrow as pa
    >>> from collections import OrderedDict
    >>>
    >>>
    >>> pd.__version__
    '1.1.5'
    >>>
    >>> pa.__version__
    '4.0.1'
    >>>
    >>> dates = pd.date_range(start='2016-04-01', periods=4, name='DATE')
    >>> dict_data = OrderedDict()
    >>> dict_data['A'] = list('AABB')
    >>> dict_data['B'] = list('abab')
    >>> dict_data['C'] = list('wxyz')
    >>> dict_data['D'] = range(0, 4)
    >>> df = pd.DataFrame.from_dict(dict_data)
    >>> df = df.set_index(dates)
    >>>
    >>> df.index
    DatetimeIndex(['2016-04-01', '2016-04-02', '2016-04-03', '2016-04-04'], dtype='datetime64[ns]', name='DATE', freq='D')
    >>>
    >>> table = pa.Table.from_pandas(df, preserve_index=True)
    >>> df2 = table.to_pandas()
    >>> df2.index
    DatetimeIndex(['2016-04-01', '2016-04-02', '2016-04-03', '2016-04-04'], dtype='datetime64[ns]', name='DATE', freq=None)
于 2021-06-24T23:11:00.317 回答