Performing a performance profiling, I was quite surprised to identidy pd.to_datetime
as a large drag to performance (62sec spent out of 91sec in my use case). So I may not be using the function as I should.
Simple example case, I need to convert timestamp = 623289600000000000L
in a date/timestamp format.
import datetime
import time
import pandas as pd
timestamp = 623289600000000000L
timeit pd.to_datetime(timestamp, unit = 'ns')
10000 loops, best of 3: 46.9 us per loop
In [3]: timeit time.ctime(timestamp/10**9)
1000000 loops, best of 3: 904 ns per loop
timeit time.localtime(timestamp/10**9)
1000000 loops, best of 3: 1.13 us per loop
timeit datetime.datetime.fromtimestamp(timestamp/10**9)
1000000 loops, best of 3: 1.51 us per loop
timeit datetime.datetime.utcfromtimestamp(timestamp/10**9)
1000000 loops, best of 3: 1.29 us per loop
I awware these functions each returns a different object, however pd.to_datetime
is by far the slowest. Is that expected?
I now use datetime.datetime.utcfromtimestamp
in my code and it works fine. However, I would have rather keep using Pandas. Plus Pandas handles fine pre-1970 dates (see below). Would you be able to provide some guidance?
pd.to_datetime
has one advantage: it support negative input / pre-1970-01-01 dates. That is also quite important for my use case.
timestamp =-445645400000000000L
pd.to_datetime(timestamp, unit = 'ns')
Timestamp('1955-11-18 01:36:40', tz=None)
datetime.datetime.utcfromtimestamp(timestamp/10**9)
Traceback (most recent call last):
File "<ipython-input-9-99b040d30a3e>", line 1, in <module>
datetime.datetime.utcfromtimestamp(timestamp/10**9)
ValueError: timestamp out of range for platform localtime()/gmtime() function
I use Python 2.7.5 and Pandas 0.12.0 on Windows 7.