2

I have a df with certain features as object types which I want to convert to datetypes. When I attempt to convert using pd.to_datetime, some of these features return an "Out of bounds timestamp" error message. To address this, I add "errors= coerce" argument, then seek to drop all NAs which result. For example:

pd.to_datetime(df[date_features], infer_datetime_format = True, errors = 'coerce')
df[date_features].dropna(inplace= True)

Yet, this doesn't seem to convert the features to 'datetime:' ("maturity_date" is one of the date_features I am trying to convert to datetime).

df.[maturity_date].describe()

count        3355323
unique         11954
top       2015-12-01
freq           29607
Name: maturity_date, dtype: object

Furthermore, if I again try to convert maturity_date using pd.to_datetime without "coerce" I get the "Out of bounds" timestamp.

I hope I have described this problem thoroughly.

Any thoughts?

4

1 回答 1

4

pd.to_datetime不是就地操作。您的代码执行转换,并继续丢弃结果。正确的做法是将结果分配回来,就像这样 -

df['date_features'] = pd.to_datetime(df.date_features, errors='coerce')

此外,不要调用dropna属于数据框的列,因为这不会修改数据框(即使使用inplace=True)。相反,使用dropna属性调用数据框-subset

df.dropna(subset='date_features', inplace=True)

现在,正如观察到的那样,maturity_date看起来像这样 -

results["maturity_date"].head()

0   2017-04-01
1   2017-04-01
2   2017-04-01
3   2016-01-15
4   2016-01-15
Name: maturity_date, dtype: datetime64[ns]

如您所见,dtypeisdatetime64表示此操作有效。如果您调用describe(),它会执行一些标准聚合并将结果作为新系列返回。该系列的显示方式与其他任何系列相同,包括dtype适用于的描述,而不是它所描述的列。

于 2017-12-18T17:36:21.327 回答