37

我有一个DataFrame名为date. 我们如何将“日期”列转换/解析为DateTime对象?

我使用sql.read_frame(). date该列的一个示例是2013-04-04

我想要做的是选择数据框中的所有行,这些行在特定时期内具有日期列,例如 after2013-04-01和 before 2013-04-04

我在下面的尝试给出了错误'Series' object has no attribute 'read'

试图

import dateutil

df['date'] = dateutil.parser.parse(df['date'])

错误

AttributeError                            Traceback (most recent call last)
<ipython-input-636-9b19aa5f989c> in <module>()
     15 
     16 # Parse 'Date' Column to Datetime
---> 17 df['date'] = dateutil.parser.parse(df['date'])
     18 
     19 # SELECT RECENT SALES

C:\Python27\lib\site-packages\dateutil\parser.pyc in parse(timestr, parserinfo, **kwargs)
    695         return parser(parserinfo).parse(timestr, **kwargs)
    696     else:
--> 697         return DEFAULTPARSER.parse(timestr, **kwargs)
    698 
    699 

C:\Python27\lib\site-packages\dateutil\parser.pyc in parse(self, timestr, default, ignoretz, tzinfos, **kwargs)
    299             default = datetime.datetime.now().replace(hour=0, minute=0,
    300                                                       second=0, microsecond=0)
--> 301         res = self._parse(timestr, **kwargs)
    302         if res is None:
    303             raise ValueError, "unknown string format"

C:\Python27\lib\site-packages\dateutil\parser.pyc in _parse(self, timestr, dayfirst, yearfirst, fuzzy)
    347             yearfirst = info.yearfirst
    348         res = self._result()
--> 349         l = _timelex.split(timestr)
    350         try:
    351 

C:\Python27\lib\site-packages\dateutil\parser.pyc in split(cls, s)
    141 
    142     def split(cls, s):
--> 143         return list(cls(s))
    144     split = classmethod(split)
    145 

C:\Python27\lib\site-packages\dateutil\parser.pyc in next(self)
    135 
    136     def next(self):
--> 137         token = self.get_token()
    138         if token is None:
    139             raise StopIteration

C:\Python27\lib\site-packages\dateutil\parser.pyc in get_token(self)
     66                 nextchar = self.charstack.pop(0)
     67             else:
---> 68                 nextchar = self.instream.read(1)
     69                 while nextchar == '\x00':
     70                     nextchar = self.instream.read(1)

AttributeError: 'Series' object has no attribute 'read'

df['date'].apply(dateutil.parser.parse)给我错误AttributeError: 'datetime.date' object has no attribute 'read'

df['date'].truncate(after='2013/04/01')给出错误TypeError: can't compare datetime.datetime to long

df['date'].dtype返回dtype('O')。它已经是一个datetime对象了吗?

4

5 回答 5

62

Pandas 知道对象日期时间,但是当您使用某些导入函数时,它会被视为字符串。因此,您需要做的是确保将列设置为日期时间类型而不是字符串。然后,您可以进行查询。

df['date']  = pd.to_datetime(df['date'])
df_masked = df[(df['date'] > datetime.date(2012,4,1)) & (df['date'] < datetime.date(2012,4,4))]
于 2014-06-27T07:53:45.180 回答
9

You probably need apply, so something like:

df['date'] = df['date'].apply(dateutil.parser.parse)

Without an example of the column I can't guarantee this will work, but something in that direction should help you to carry on.

于 2013-05-07T07:56:53.660 回答
6

pandas already reads that as a datetime object! So what you want is to select rows between two dates and you can do that by masking:

df_masked = df[(df.date > '2012-04-01') & (df.date < '2012-04-04')]

Because you said that you were getting an error from the string for some reason, try this:

df_masked = df[(df.date > datetime.date(2012,4,1)) & (df.date < datetime.date(2012,4,4))]
于 2013-05-07T13:23:25.633 回答
5

不要datetime.date与熊猫混淆pd.Timestamp

“熊猫datetime系列”包含pd.Timestamp元素,而不是 datetime.date元素。熊猫的推荐解决方案:

s = pd.to_datetime(s)    # convert series to Pandas
mask = s > '2018-03-10'  # calculate Boolean mask against Pandas-compatible object

最佳答案有问题:

  • @RyanSaxe 接受的答案的第一次尝试不起作用;第二个答案效率低下。
  • 从 Pandas v0.23.0 开始,@Keith 高度赞成的答案不起作用;它给了TypeError

任何好的 Pandas 解决方案都必须确保:

  1. 该系列是 Pandasdatetime系列,而不是objectdtype。
  2. datetime系列与兼容的对象(例如pd.Timestamp,或格式正确的字符串)进行比较。

这是一个带有基准测试的演示,证明一次性转换成本可以通过一次操作立即抵消:

from datetime import date

L = [date(2018, 1, 10), date(2018, 5, 20), date(2018, 10, 30), date(2018, 11, 11)]
s = pd.Series(L*10**5)

a = s > date(2018, 3, 10)             # accepted solution #2, inefficient
b = pd.to_datetime(s) > '2018-03-10'  # more efficient, including datetime conversion

assert a.equals(b)                    # check solutions give same result

%timeit s > date(2018, 3, 10)                  # 40.5 ms
%timeit pd.to_datetime(s) > '2018-03-10'       # 33.7 ms

s = pd.to_datetime(s)

%timeit s > '2018-03-10'                       # 2.85 ms
于 2018-10-19T22:43:51.493 回答
2

You should iterate over the items and parse them independently, then construct a new list.

df['date'] = [dateutil.parser.parse(x) for x in df['date']]
于 2013-06-20T11:47:40.080 回答