python - 在 python 列表中保持日期顺序的同时插入缺失的日期

Question

我有一个包含 [yyyy, value] 项目的列表列表，每个子列表都按递增的年份排序。这是一个示例：

A = [
    [[2008, 5], [2009, 5], [2010, 2], [2011, 5], [2013, 17]], 
    [[2008, 6], [2009, 3], [2011, 1], [2013, 6]], [[2013, 9]], 
    [[2008, 4], [2011, 1], [2013, 4]], 
    [[2010, 3], [2011, 3], [2013, 1]], 
    [[2008, 2], [2011, 4], [2013, 1]], 
    [[2009, 1], [2010, 1], [2011, 3], [2013, 3]], 
    [[2010, 1], [2011, 1], [2013, 5]], 
    [[2011, 1], [2013, 4]], 
    [[2009, 1], [2013, 4]], 
    [[2008, 1], [2013, 3]], 
    [[2009, 1], [2013, 2]], 
    [[2013, 2]], 
    [[2011, 1], [2013, 1]],
    [[2013, 1]], 
    [[2013, 1]], 
    [[2011, 1]], 
    [[2011, 1]]
    ]

我需要在 min(year) 和 max(year) 之间插入所有缺失的年份，并确保保留订单。因此，例如，取 A 的第一个子列表：

[2008, 5], [2009, 5], [2010, 2], [2011, 5], [2013, 17]

应该看起来像：

[min_year, 0]...[2008, 5], [2009, 5], [2010, 2], [2011, 5], [2012, 0],[2013, 17],..[max_year, 0]

此外，如果任何子列表仅包含单个项目，则应对其应用相同的过程，以便原始值保留其假定的顺序，并正确插入其余的最小到最大（年份，值）项目。

有任何想法吗？

谢谢。

score 3 · Accepted Answer

怎么样：

import numpy as np

def np_fill(data,min_year,max_year):

    #Setup empty array
    year_range=np.arange(min_year,max_year+1)
    unit=np.dstack((year_range,np.zeros(max_year-min_year+1)))
    overall=np.tile(unit,(len(data),1,1)).astype(np.int)

    #Change the list to a list of ndarrays
    data=map(np.array,data)

    for num,line in enumerate(data):

        #Find correct indices and update overall array
        index=np.searchsorted(year_range,line[:,0])
        overall[num,index,1]=line[:,1]
    return overall

运行代码：

print np_fill(A,2008,2013)[:2]

[[[2008    5]
  [2009    5]
  [2010    2]
  [2011    5]
  [2012    0]
  [2013   17]]

 [[2008    6]
  [2009    3]
  [2010    0]
  [2011    1]
  [2012    0]
  [2013    6]]]


print np_fill(A,2008,2013).shape
(18, 6, 2)

您在 A 的第二行中有 2013 年的副本，不确定这是否是有目的的。

有几次因为好奇，源代码可以在这里找到。如果您发现错误，请告诉我。

对于开始年份/结束年份 - (2008,2013)：

np_fill took 0.0454630851746 seconds.
tehsockz_fill took 0.00737619400024 seconds.
zeke_fill_fill took 0.0146050453186 seconds.

有点期待 - 转换为 numpy 数组需要很多时间。为了收支平衡，看起来这些年的跨度需要大约 30 年：

对于开始年份/结束年份 - (1985,2013)：

np_fill took 0.049400806427 seconds.
tehsockz_fill took 0.0425939559937 seconds.
zeke_fill_fill took 0.0748357772827 seconds.

Numpy 当然从那里开始做得更好。如果出于某种原因需要返回一个 numpy 数组，numpy 算法总是更快。

score 3 · Accepted Answer

minyear = 2008
maxyear = 2013
new_a = []
for group in A:
    group = group
    years = [point[0] for point in group]
    print years
    for year in range(minyear,maxyear+1):
        if year not in years:
            group.append([year,0])
    new_a.append(sorted(group))
print new_a

这会产生：

[   [[2008, 5], [2009, 5], [2010, 2], [2011, 5], [2012, 0], [2013, 17]],
    [[2008, 6], [2009, 3], [2010, 0], [2011, 1], [2012, 0], [2013, 6]],
    [[2008, 0], [2009, 0], [2010, 0], [2011, 0], [2012, 0], [2013, 9]],
    [[2008, 4], [2009, 0], [2010, 0], [2011, 1], [2012, 0], [2013, 4]],
    [[2008, 0], [2009, 0], [2010, 3], [2011, 3], [2012, 0], [2013, 1]],
    [[2008, 2], [2009, 0], [2010, 0], [2011, 4], [2012, 0], [2013, 1]],
    [[2008, 0], [2009, 1], [2010, 1], [2011, 3], [2012, 0], [2013, 3]],
    [[2008, 0], [2009, 0], [2010, 1], [2011, 1], [2012, 0], [2013, 5]],
    [[2008, 0], [2009, 0], [2010, 0], [2011, 1], [2012, 0], [2013, 4]],
    [[2008, 0], [2009, 1], [2010, 0], [2011, 0], [2012, 0], [2013, 4]],
    [[2008, 1], [2009, 0], [2010, 0], [2011, 0], [2012, 0], [2013, 3]],
    [[2008, 0], [2009, 1], [2010, 0], [2011, 0], [2012, 0], [2013, 2]],
    [[2008, 0], [2009, 0], [2010, 0], [2011, 0], [2012, 0], [2013, 2]],
    [[2008, 0], [2009, 0], [2010, 0], [2011, 1], [2012, 0], [2013, 1]],
    [[2008, 0], [2009, 0], [2010, 0], [2011, 0], [2012, 0], [2013, 1]],
    [[2008, 0], [2009, 0], [2010, 0], [2011, 0], [2012, 0], [2013, 1]],
    [[2008, 0], [2009, 0], [2010, 0], [2011, 1], [2012, 0], [2013, 0]],
    [[2008, 0], [2009, 0], [2010, 0], [2011, 1], [2012, 0], [2013, 0]]]

score 3 · Accepted Answer

给你，希望你喜欢！

min_year = 2007 # for testing purposes I used these years
max_year = 2014

final_list = [] # you're going to be adding to this list the corrected values

for outer in A: # start by iterating through each outer list in A
    active_years = {} # use this dictionary to keep track of which years are in each list and their values; sorry if you don't know about dictionaries

    for inner in outer: # now iterate through each year in each of the outer lists and create a dictionary entry for each (print to see what it's doing)
        active_years[inner[0]] = inner[1] # see who I'm creating a new key-value pair with the key as the year given by the 0th index of inner

    new_outer = [] # this will be your new outer list
    for year in range(min_year, max_year + 1): # now add to your active_years dictionary all the other years and give them value 0
        if year not in active_years.keys(): # only add the years not in your dictionary already
            active_years[year] = 0

    for entry in active_years.keys(): # we now iterate through each key, in order
        new_outer += [[entry, active_years[entry]]] # create your new outer list, watch carefully the brackets
    final_list += [new_outer] # add to the final_list

print final_list # presto

python - 在 python 列表中保持日期顺序的同时插入缺失的日期

3 回答 3

Related

Reference