10

鉴于此基准日期:

base_date = "10/29 06:58 AM"

我想在包含最接近日期的列表中找到一个元组base_date,但它不能是更早的日期。

list_date = [('10/30 02:18 PM', '-103', '-107'), ('10/30 02:17 PM', '+100', '-110'), \
             ('10/29 02:15 AM', '-101', '-109') 

所以这里的输出应该是('10/30 02:17 PM', '+100', '-110')(它不能是第三个元组,因为那里的日期早于基准日期)

我的问题是,它是否存在用于此类日期比较的任何模块?我尝试首先将数据全部更改为AM格式,然后进行比较,但我的代码因大量切片而变得丑陋。

@编辑:

要测试的大清单:

[('10/30 02:18 PM', '+13 -103', '-13 -107'), ('10/30 02:17 PM', '+13 +100', '-13 -110'), ('10/30 02:15 PM', '+13 -101', '-13 -109'), ('10/30 02:14 PM', '+13 -103', '-13 -107'), ('10/30 01:59 PM', '+13 -105', '-13 -105'), ('10/30 01:46 PM', '+13 -106', '-13 -104'), ('10/30 01:37 PM', '+13 -105', '-13 -105'), ('10/30 01:24 PM', '+13 -107', '-13 -103'), ('10/30 01:23 PM', '+13 -106', '-13 -104'), ('10/30 01:05 PM', '+13 -103', '-13 -107'), ('10/30 01:02 PM', '+13 -104', '-13 -106'), ('10/30 12:55 PM', '+13 -103', '-13 -107'), ('10/30 12:51 PM', '+13.5 -110', '-13.5 +100'), ('10/30 12:44 PM', '+13.5 -108', '-13.5 -102'), ('10/30 12:38 PM', '+13.5 -107', '-13.5 -103'), ('10/30 12:35 PM', '+13 -102', '-13 -108'), ('10/30 12:34 PM', '+13 -103', '-13 -107'), ('10/30 12:06 PM', '+13.5 -110', '-13.5 +100'), ('10/30 11:57 AM', '+13.5 -108', '-13.5 -102'), ('10/30 11:36 AM', '+13.5 -107', '-13.5 -103'), ('10/30 09:01 AM', '+13.5 -110', '-13.5 +100'), ('10/30 08:59 AM', '+13.5 -108', '-13.5 -102'), ('10/30 08:13 AM', '+13.5 -105', '-13.5 -105'), ('10/30 06:11 AM', '+13.5 +100', '-13.5 -110'), ('10/30 06:09 AM', '+13.5 -105', '-13.5 -105'), ('10/30 06:04 AM', '+13.5 -110', '-13.5 +100'), ('10/30 05:32 AM', '+13.5 -105', '-13.5 -105'), ('10/30 04:48 AM', '+13.5 -107', '-13.5 -103'), ('10/30 12:51 AM', '+13.5 -110', '-13.5 +100'), ('10/29 01:31 PM', '+13.5 -105', '-13.5 -105'), ('10/29 01:31 PM', '+13 +103', '-13 -113'), ('10/29 01:28 PM', '+13 -102', '-13 -108'), ('10/29 07:59 AM', '+13 -105', '-13 -105'), ('10/29 07:20 AM', '+13 -103', '-13 -107'), ('10/29 07:14 AM', '+13 -105', '-13 -105'), ('10/29 04:47 AM', '+13 +100', '-13 -110'), ('10/29 04:14 AM', '+13 -105', '-13 -105'), ('10/28 08:17 PM', '+12.5 +100', '-12.5 -110'), ('10/28 12:52 PM', '+12.5 -105', '-12.5 -105')]

要测试的大清单2:

[('10/30 04:30 PM', '+1.5 -111', '-1.5 +101'), ('10/30 04:24 PM', '+1.5 -110', '-1.5     +100'), ('10/30 04:21 PM', '+1.5 -111', '-1.5 +101'), ('10/30 04:15 PM', '+1.5 -112', '-1.5 +102'), ('10/30 04:14 PM', '+1.5 -110', '-1.5 +100'), ('10/30 03:57 PM', '+1.5 -111', '-1.5 +101'), ('10/30 03:40 PM', '+1.5 -110', '-1.5 +100'), ('10/30 03:31 PM', '+1.5 -111', '-1.5 +101'), ('10/30 03:30 PM', '+1.5 -109', '-1.5 -101'), ('10/30 03:25 PM', '+1.5 -107', '-1.5 -103'), ('10/30 03:24 PM', '+1.5 -110', '-1.5 +100'), ('10/30 03:23 PM', '+1.5 -108', '-1.5 -102'), ('10/30 03:22 PM', '+1.5 -106', '-1.5 -104'), ('10/30 02:14 PM', '+1.5 -104', '-1.5 -106'), ('10/30 01:41 PM', '+1.5 -105', '-1.5 -105'), ('10/30 01:37 PM', '+1.5 -107', '-1.5 -103'), ('10/30 01:36 PM', '+1.5 -105', '-1.5 -105'), ('10/30 01:06 PM', '+1.5 -103', '-1.5 -107'), ('10/30 12:56 PM', '+2 -111', '-2 +101'), ('10/30 12:53 PM', '+2 -110', '-2 +100'), ('10/30 12:50 PM', '+2 -113', '-2 +103'), ('10/30 12:49 PM', '+2 -112', '-2 +102'), ('10/30 12:46 PM', '+2 -113', '-2 +103'), ('10/30 12:45 PM', '+2 -110', '-2 +100'), ('10/30 12:43 PM', '+2 -108', '-2 -102'), ('10/30 12:38 PM', '+2.5 -116', '-2.5 +106'), ('10/30 12:38 PM', '+2.5 -113', '-2.5 +103'), ('10/30 12:37 PM', '+2.5 -110', '-2.5 +100'), ('10/30 10:30 AM', '+2.5 -105', '-2.5 -105'), ('10/30 10:07 AM', '+3 -113', '-3 +103'), ('10/30 09:55 AM', '+3 -112', '-3 +102'), ('10/30 09:51 AM', '+3 -110', '-3 +100'), ('10/30 09:32 AM', '+3 -109', '-3 -101'), ('10/30 06:04 AM', '+3 -110', '-3 +100'), ('10/30 03:16 AM', '+3 -107', '-3 -103'), ('10/30 03:14 AM', '+3.5 -116', '-3.5 +106'), ('10/30 01:03 AM', '+3.5 -115', '-3.5 +105'), ('10/30 12:17 AM', '+3.5 -110', '-3.5 +100'), ('10/29 08:52 PM', '+3.5 -108', '-3.5 -102'), ('10/29 01:31 PM', '+3.5 -105', '-3.5 -105'), ('10/29 06:48 AM', '+3.5 -110', '-3.5 +100'), ('10/29 06:47 AM', '+3.5 -109', '-3.5 -101'), ('10/29 05:39 AM', '+3.5 -113', '-3.5 +103'), ('10/29 03:34 AM', '+3.5 -108', '-3.5 -102'), ('10/29 12:44 AM', '+3.5 -110', '-3.5 +100'), ('10/29 12:41 AM', '+3.5 -107', '-3.5 -103'), ('10/29 12:40 AM', '+3.5 -105', '-3.5 -105'), ('10/28 12:52 PM', '+4 -105', '-4 -105')]
4

8 回答 8

13

这可以使用datetime模块来完成,该模块能够将日期字符串解析为日期时间对象,该对象支持与日期的比较和算术:

from datetime import datetime

# function for parsing strings using specific format
get_datetime = lambda s: datetime.strptime(s, "%m/%d %I:%M %p")

base = get_datetime(base_date)
later = filter(lambda d: get_datetime(d[0]) > base, list_date)
closest_date = min(later, key = lambda d: get_datetime(d[0]))
于 2013-06-22T09:57:12.080 回答
12
>>> from datetime import timedelta, datetime
>>> base_date = "10/29 06:58 AM"
>>> b_d = datetime.strptime(base_date, "%m/%d %I:%M %p")
def func(x):
    d =  datetime.strptime(x[0], "%m/%d %I:%M %p")
    delta =  d - b_d if d > b_d else timedelta.max
    return delta
... 
>>> min(list_date, key = func)
('10/30 02:17 PM', '+100', '-110')

datetime.strptime将日期转换为日期时间对象,所以b_d现在看起来像这样:

>>> b_d
datetime.datetime(1900, 10, 29, 6, 58)

现在我们可以编写一个可以传递给key参数的函数min

delta =  d - b_d if d > b_d else timedelta.max

如果d > b_d即如果传递给的日期min大于base_date然后将它们的差异分配给delta其他分配timedelta.max给它。

>>> timedelta.max
datetime.timedelta(999999999, 86399, 999999)

更新:

>>> from datetime import timedelta, datetime
>>> base_date = '10/29 06:59 AM'
>>> b_d = datetime.strptime(base_date, "%m/%d %I:%M %p")
>>> def func(x):
...         d =  datetime.strptime(x[0], "%m/%d %I:%M %p")
...         delta =  d - b_d if d > b_d else timedelta.max
...         return delta
... 
>>> lis2 = [('10/30 04:30 PM', '+1.5 -111', '-1.5 +101'), ('10/30 04:24 PM', '+1.5 -110', '-1.5     +100'), ('10/30 04:21 PM', '+1.5 -111', '-1.5 +101'), ('10/30 04:15 PM', '+1.5 -112', '-1.5 +102'), ('10/30 04:14 PM', '+1.5 -110', '-1.5 +100'), ('10/30 03:57 PM', '+1.5 -111', '-1.5 +101'), ('10/30 03:40 PM', '+1.5 -110', '-1.5 +100'), ('10/30 03:31 PM', '+1.5 -111', '-1.5 +101'), ('10/30 03:30 PM', '+1.5 -109', '-1.5 -101'), ('10/30 03:25 PM', '+1.5 -107', '-1.5 -103'), ('10/30 03:24 PM', '+1.5 -110', '-1.5 +100'), ('10/30 03:23 PM', '+1.5 -108', '-1.5 -102'), ('10/30 03:22 PM', '+1.5 -106', '-1.5 -104'), ('10/30 02:14 PM', '+1.5 -104', '-1.5 -106'), ('10/30 01:41 PM', '+1.5 -105', '-1.5 -105'), ('10/30 01:37 PM', '+1.5 -107', '-1.5 -103'), ('10/30 01:36 PM', '+1.5 -105', '-1.5 -105'), ('10/30 01:06 PM', '+1.5 -103', '-1.5 -107'), ('10/30 12:56 PM', '+2 -111', '-2 +101'), ('10/30 12:53 PM', '+2 -110', '-2 +100'), ('10/30 12:50 PM', '+2 -113', '-2 +103'), ('10/30 12:49 PM', '+2 -112', '-2 +102'), ('10/30 12:46 PM', '+2 -113', '-2 +103'), ('10/30 12:45 PM', '+2 -110', '-2 +100'), ('10/30 12:43 PM', '+2 -108', '-2 -102'), ('10/30 12:38 PM', '+2.5 -116', '-2.5 +106'), ('10/30 12:38 PM', '+2.5 -113', '-2.5 +103'), ('10/30 12:37 PM', '+2.5 -110', '-2.5 +100'), ('10/30 10:30 AM', '+2.5 -105', '-2.5 -105'), ('10/30 10:07 AM', '+3 -113', '-3 +103'), ('10/30 09:55 AM', '+3 -112', '-3 +102'), ('10/30 09:51 AM', '+3 -110', '-3 +100'), ('10/30 09:32 AM', '+3 -109', '-3 -101'), ('10/30 06:04 AM', '+3 -110', '-3 +100'), ('10/30 03:16 AM', '+3 -107', '-3 -103'), ('10/30 03:14 AM', '+3.5 -116', '-3.5 +106'), ('10/30 01:03 AM', '+3.5 -115', '-3.5 +105'), ('10/30 12:17 AM', '+3.5 -110', '-3.5 +100'), ('10/29 08:52 PM', '+3.5 -108', '-3.5 -102'), ('10/29 01:31 PM', '+3.5 -105', '-3.5 -105'), ('10/29 06:48 AM', '+3.5 -110', '-3.5 +100'), ('10/29 06:47 AM', '+3.5 -109', '-3.5 -101'), ('10/29 05:39 AM', '+3.5 -113', '-3.5 +103'), ('10/29 03:34 AM', '+3.5 -108', '-3.5 -102'), ('10/29 12:44 AM', '+3.5 -110', '-3.5 +100'), ('10/29 12:41 AM', '+3.5 -107', '-3.5 -103'), ('10/29 12:40 AM', '+3.5 -105', '-3.5 -105'), ('10/28 12:52 PM', '+4 -105', '-4 -105')]
>>> min(lis2, key = func)
('10/29 01:31 PM', '+3.5 -105', '-3.5 -105')

时间比较:

脚本:

from datetime import datetime, timedelta
import sys
import time
list_date = [('10/30 04:30 PM', '+1.5 -111', '-1.5 +101'), ('10/30 04:24 PM', '+1.5 -110', '-1.5     +100'), ('10/30 04:21 PM', '+1.5 -111', '-1.5 +101'), ('10/30 04:15 PM', '+1.5 -112', '-1.5 +102'), ('10/30 04:14 PM', '+1.5 -110', '-1.5 +100'), ('10/30 03:57 PM', '+1.5 -111', '-1.5 +101'), ('10/30 03:40 PM', '+1.5 -110', '-1.5 +100'), ('10/30 03:31 PM', '+1.5 -111', '-1.5 +101'), ('10/30 03:30 PM', '+1.5 -109', '-1.5 -101'), ('10/30 03:25 PM', '+1.5 -107', '-1.5 -103'), ('10/30 03:24 PM', '+1.5 -110', '-1.5 +100'), ('10/30 03:23 PM', '+1.5 -108', '-1.5 -102'), ('10/30 03:22 PM', '+1.5 -106', '-1.5 -104'), ('10/30 02:14 PM', '+1.5 -104', '-1.5 -106'), ('10/30 01:41 PM', '+1.5 -105', '-1.5 -105'), ('10/30 01:37 PM', '+1.5 -107', '-1.5 -103'), ('10/30 01:36 PM', '+1.5 -105', '-1.5 -105'), ('10/30 01:06 PM', '+1.5 -103', '-1.5 -107'), ('10/30 12:56 PM', '+2 -111', '-2 +101'), ('10/30 12:53 PM', '+2 -110', '-2 +100'), ('10/30 12:50 PM', '+2 -113', '-2 +103'), ('10/30 12:49 PM', '+2 -112', '-2 +102'), ('10/30 12:46 PM', '+2 -113', '-2 +103'), ('10/30 12:45 PM', '+2 -110', '-2 +100'), ('10/30 12:43 PM', '+2 -108', '-2 -102'), ('10/30 12:38 PM', '+2.5 -116', '-2.5 +106'), ('10/30 12:38 PM', '+2.5 -113', '-2.5 +103'), ('10/30 12:37 PM', '+2.5 -110', '-2.5 +100'), ('10/30 10:30 AM', '+2.5 -105', '-2.5 -105'), ('10/30 10:07 AM', '+3 -113', '-3 +103'), ('10/30 09:55 AM', '+3 -112', '-3 +102'), ('10/30 09:51 AM', '+3 -110', '-3 +100'), ('10/30 09:32 AM', '+3 -109', '-3 -101'), ('10/30 06:04 AM', '+3 -110', '-3 +100'), ('10/30 03:16 AM', '+3 -107', '-3 -103'), ('10/30 03:14 AM', '+3.5 -116', '-3.5 +106'), ('10/30 01:03 AM', '+3.5 -115', '-3.5 +105'), ('10/30 12:17 AM', '+3.5 -110', '-3.5 +100'), ('10/29 08:52 PM', '+3.5 -108', '-3.5 -102'), ('10/29 01:31 PM', '+3.5 -105', '-3.5 -105'), ('10/29 06:48 AM', '+3.5 -110', '-3.5 +100'), ('10/29 06:47 AM', '+3.5 -109', '-3.5 -101'), ('10/29 05:39 AM', '+3.5 -113', '-3.5 +103'), ('10/29 03:34 AM', '+3.5 -108', '-3.5 -102'), ('10/29 12:44 AM', '+3.5 -110', '-3.5 +100'), ('10/29 12:41 AM', '+3.5 -107', '-3.5 -103'), ('10/29 12:40 AM', '+3.5 -105', '-3.5 -105'), ('10/28 12:52 PM', '+4 -105', '-4 -105')]

base_date = "10/29 06:58 AM"

def func1(list_date):
    #http://stackoverflow.com/a/17249420/846892
    get_datetime = lambda s: datetime.strptime(s, "%m/%d %I:%M %p")
    base = get_datetime(base_date)
    later = filter(lambda d: get_datetime(d[0]) > base, list_date)
    return min(later, key = lambda d: get_datetime(d[0]))

def func2(list_date):
    #http://stackoverflow.com/a/17249470/846892
    b_d = datetime.strptime(base_date, "%m/%d %I:%M %p")
    def func(x):
       d =  datetime.strptime(x[0], "%m/%d %I:%M %p")
       delta =  d - b_d if d > b_d else timedelta.max
       return delta
    return min(list_date, key = func)

def func3(list_date):
    #http://stackoverflow.com/a/17249529/846892
    fmt = '%m/%d %I:%M %p'
    d = datetime.strptime(base_date, fmt)
    def foo(x):
        return (datetime.strptime(x[0],fmt)-d).total_seconds() > 0
    return sorted(list_date, key=foo)[-1]

def func4(list_date):
    #http://stackoverflow.com/a/17249441/846892
    fmt = '%m/%d %I:%M %p'
    base_d = datetime.strptime(base_date, fmt)
    candidates = ((datetime.strptime(d, fmt), d, x, y) for d, x, y in list_date)
    candidates = min((dt, d, x, y) for dt, d, x, y in candidates if dt > base_d)
    return  candidates[1:]

结果:

>>> from so import *

#check output irst
>>> func1(list_date)
('10/29 01:31 PM', '+3.5 -105', '-3.5 -105')
>>> func2(list_date)
('10/29 01:31 PM', '+3.5 -105', '-3.5 -105')
>>> func3(list_date)
('10/29 01:31 PM', '+3.5 -105', '-3.5 -105')
>>> func4(list_date)
('10/29 01:31 PM', '+3.5 -105', '-3.5 -105')

>>> %timeit func1(list_date)
100 loops, best of 3: 3.07 ms per loop
>>> %timeit func2(list_date)
100 loops, best of 3: 1.59 ms per loop      #winner
>>> %timeit func3(list_date)
100 loops, best of 3: 1.91 ms per loop
>>> %timeit func4(list_date)
1000 loops, best of 3: 2.02 ms per loop

#increase the input size
>>> list_date = list_date *10**3
>>> len(list_date)
48000
>>> %timeit func1(list_date)
1 loops, best of 3: 3.6 s per loop
>>> %timeit func2(list_date)            #winner
1 loops, best of 3: 1.99 s per loop      
>>> %timeit func3(list_date)
1 loops, best of 3: 2.09 s per loop
>>> %timeit func4(list_date)
1 loops, best of 3: 2.02 s per loop


#increase the input size again

>>> list_date = list_date *10
>>> len(list_date)
480000
>>> %timeit func1(list_date)
1 loops, best of 3: 36.4 s per loop
>>> %timeit func2(list_date)                  #winner
1 loops, best of 3: 20.2 s per loop           
>>> %timeit func3(list_date)
1 loops, best of 3: 22.8 s per loop
>>> %timeit func4(list_date)
1 loops, best of 3: 22.7 s per loop
于 2013-06-22T10:03:46.997 回答
2

装饰、过滤、查找最近的日期、取消装饰

>>> base_date = "10/29 06:58 AM"
>>> list_date = [
...     ('10/30 02:18 PM', '-103', '-107'),
...     ('10/30 02:17 PM', '+100', '-110'),
...     ('10/29 02:15 AM', '-101', '-109')
... ]
>>> import datetime
>>> fmt = '%m/%d %H:%M %p'
>>> base_d = datetime.datetime.strptime(base_date, fmt)
>>> candidates = ((datetime.datetime.strptime(d, fmt), d, x, y) for d, x, y in list_date)
>>> candidates = min((dt, d, x, y) for dt, d, x, y in candidates if dt > base_d)
>>> print candidates[1:]
('10/30 02:17 PM', '+100', '-110')
于 2013-06-22T09:59:24.180 回答
2

您可以考虑将日期列表放入 Pandas 索引中,然后使用 'truncate' 或 'get_loc' 函数。

import pandas as pd

##Initial inputs
list_date = [('10/30 02:18 PM', '-103', '-107'),('10/29 02:15 AM', '-101', '-109') , ('10/30 02:17 PM', '+100', '-110'), \
             ]  # reordered to show the method is input order insensitive
base_date = "10/29 06:58 AM"


##Make a data frame with data
df=pd.DataFrame(list_date)
df.columns=['date','val1','val2']
dateIndex=pd.to_datetime(df['date'], format='%m/%d %I:%M %p')
df=df.set_index(dateIndex) 
df=df.sort_index(ascending=False) #earliest comes on top 

##Find the result
base_dateObj=pd.to_datetime(base_date, format='%m/%d %I:%M %p')
result=df.truncate(after=base_dateObj).iloc[-1]  #take the bottom value, or the 1st after the base date
(result['date'],result['val1'], result['val2']) # result is ('10/30 02:17 PM', '+100', '-110')

参考:这个链接

于 2016-10-22T05:24:06.580 回答
1

线性搜索?

import sys
import time

base_date = "10/29 06:58 AM"

def str_to_my_time(my_str):
    return time.mktime(time.strptime(my_str, "%m/%d %I:%M %p")) 
                # assume year 1900...

base_dt = str_to_my_time(base_date)

list_date = [('10/30 02:18 PM', '-103', '-107'), 
             ('10/30 02:17 PM', '+100', '-110'),
             ('10/29 02:15 AM', '-101', '-109')]


best_delta = sys.maxint
best_match = None

for t in list_date:
    the_dt = str_to_my_time(t[0])
    delta_sec = the_dt - base_dt
    if (delta_sec >= 0) and (delta_sec < best_delta):
        best_delta = delta_sec
        best_match = t

print best_match, best_delta

生产:

('10/30 02:17 PM', '+100', '-110') 112740.0
于 2013-06-22T09:58:00.700 回答
1
import time
import sys

#The Function
def to_sec(date_string):
    return time.mktime(time.strptime(date_string, '%m/%d %I:%M %p'))


#The Test
base_date = "10/29 06:58 AM"
base_date_sec = to_sec(base_date)
result = None
difference = sys.maxint
list_date = [
        ('10/30 02:18 PM', '-103', '-107'),
        ('10/30 02:17 PM', '+100', '-110'), 
        ('10/29 02:15 AM', '-101', '-109') ]
for date_str in list_date:
    diff_sec = to_sec(date_str[0])-base_date_sec
    if diff_sec >= 0 and diff_sec < difference:
        result = date_str
        difference = diff_sec
print result
于 2013-06-22T10:01:14.820 回答
1
import datetime

fmt = '%m/%d %H:%M %p'
d = datetime.datetime.strptime(base_date, fmt)
def foo(x):
   return (datetime.datetime.strptime(x[0],fmt)-d).total_seconds() > 0
sorted(list_date, key=foo)[-1]
于 2013-06-22T10:11:00.290 回答
0

我正在查找这个问题并找到了一些答案,其中大部分都检查了所有元素。我对日期进行了排序(假设大多数人都这样做),所以如果你也这样做,请使用 numpy:

import numpy as np
// dates is a numpy array of np.datetime64 objects
dates = np.array([date1, date2, date3, ...], dtype=np.datetime64)
timestamp = np.datetime64('Your date')
np.searchsorted(dates, timestamp)

searchsorted使用二进制搜索,它使用日期排序的事实,因此非常有效。如果您使用熊猫,这是可能的:

dates = df.index # df is a DatetimeIndex-ed dataframe
timestamp = pd.to_datetime('your date here', format='its format')
np.searchsorted(dates, timestamp)

该函数返回最近日期的索引(如果搜索的日期包含在日期中,则返回其索引[如果不需要,使用 side='right' 作为函数的参数]),因此要获取日期这样做:

dates[np.searchsorted(dates, timestamp)]
于 2018-11-20T14:46:01.893 回答