python - 熊猫时间段数据类型打印为数字？

Question

我有一个 Pandas Dataframe，其中包含我转换为 pandas TimeSeries 的日期。

从那里，我想在 DF 中添加一个列，该列与日期列相同，只是周期格式，频率设置为月。

问题是，在数据框中，周期列打印为数字（2009-1 打印为 468，2009-2 打印为 469，等等）。

当我在 DF 之外创建一个单独的 PeriodIndex 对象时，这不是问题。

我究竟做错了什么？

我用来将未格式化的时间列转换为 DateTime 的代码：

subset['Created On'] = pd.to_datetime(subset['Created On'])

使用句点创建列的代码：

subset['Month'] = pd.PeriodIndex(subset['Created On'],freq='M')

创建单独的 PeriodIndex 对象并以月份格式正确显示日期的代码：

months = pd.PeriodIndex(subset['Created On'],freq='M')

编辑：

根据评论中的要求，subset[:1].to_dict() 输出：

#[Out]# {'Created On': {12822544: <Timestamp: 2009-01-01 00:00:00>}, 'City': {12822544: 'BROOKLYN'}, 'Borough': {12822544: 'Unspecified'}, 'Location': {12822544: '(40.65662129596871, -73.95806621423951)'}, 'Closed Date': {12822544: '01/07/2009 12:00 AM'}}

请注意，自从我的 OP 之后，我失去了会话，不得不将数据重新上传到 DF。此时，我只使用 pd.to_datetime 方法将“创建时间”列转换为时间戳。从那时起，我尝试使用：

subset['Created On'].resample('M')

导致错误：

TypeError: Only valid with DatetimIndex or PeriodIndex

也许问题的一部分是我没有使用日期列作为 DF 索引？如果是这样，那将无法正常工作，因为它包含大量非唯一值，并且我已经在使用更能代表索引的唯一 ID 字段。

score 2 · Accepted Answer

这是一个错误。作为临时解决方法，您可以执行以下操作：

subset['Month'] = pd.PeriodIndex(subset['Created On'],freq='M').asobject

http://github.com/pydata/pandas/issues/2281

python - 熊猫时间段数据类型打印为数字？

1 回答 1

Related

Reference