0

我有一个列表列表

[['Id', 'fname', 'lname', 'gender', 'startdate'],
['100', 'John', 'Jackson', 'M', '08/09/2000'],
['101', 'Jenny', 'Hobbs', 'F', '01/13/1995'],
['100', 'John', 'Jackson', 'M', '08/09/1995']]

我想删除 ID == ID AND St​​artDate < StartDate 的重复列表。保留具有最近开始日期的唯一 ID 的列表。

[['Id', 'fname', 'lname', 'gender', 'startdate'],
['100', 'John', 'Jackson', 'M', '08/09/2000'],
['101', 'Jenny', 'Hobbs', 'F', '01/13/1995']]

任何帮助都会很棒

4

4 回答 4

4

在按日期顺序对行进行排序后,按 ID 将行填充到字典中。您自己唯一需要做的就是在使用它之前删除标题。

import time

data = [['100', 'John', 'Jackson', 'M', '08/09/2000'],
['101', 'Jenny', 'Hobbs', 'F', '01/13/1995'],
['100', 'John', 'Jackson', 'M', '08/09/1995']]

data = sorted(data, key=lambda x:time.strptime(x[4], '%m/%d/%Y'))   # sort data in ascending date order

keys = [x[0] for x in data]
print keys

d = dict(zip(keys,data))                 # add to dictionary ... most recent values overwrite older ones

print d.values()

生成输出:

[['100', 'John', 'Jackson', 'M', '08/09/2000'], ['101', 'Jenny', 'Hobbs', 'F', '01/13/1995']]
于 2012-06-07T22:00:15.830 回答
1

类似于@Maria Zverina,但更结构化:

import time

data = [
    ['100', 'John', 'Jackson', 'M', '08/09/2000'],
    ['101', 'Jenny', 'Hobbs', 'F', '01/13/1995'],
    ['100', 'John', 'Jackson', 'M', '08/09/1995']
]

# sort by date, ascending
data.sort(key=lambda d: time.strptime(d[4], "%m/%d/%Y"))

# load into a dict, key on ID, later data overwrites earlier
latest = dict((d[0], d) for d in data)

# return to list, sorted by ID
data = sorted(latest.itervalues(), key=lambda d: int(d[0]))

返回

# most recent data for each ID, sorted by ID:
[
    ['100', 'John', 'Jackson', 'M', '08/09/2000'],
    ['101', 'Jenny', 'Hobbs', 'F', '01/13/1995']
]
于 2012-06-07T22:32:53.483 回答
0

这是另一个解决方案。我只是在找到它们时将它们放入一组中。该orig变量包含列表的原始列表,并且res是删除重复的列表列表。

mod_set  = set()
res = list()
for x in orig:
    if x[0] not in mod_set:
            res.append(x)
            mod_set.add(x[0])
于 2012-06-07T22:15:03.240 回答
0

这是一个小脚本来做你想做的事:

import time

mylist = [['100', 'John', 'Jackson', 'M', '08/09/2000'],
['101', 'Jenny', 'Hobbs', 'F', '01/13/1995'],
['100', 'John', 'Jackson', 'M', '08/09/1995']]

dict = {} 
for sublist in mylist: 
   id,fname,lname,gender,startdate = sublist 
   if not id in dict: 
      dict[id] = [fname,lname,gender,startdate] 
   else: 
      olddate = dict[id][3] 
      if time.strptime(startdate,'%d/%m/%Y') > time.strptime(olddate,'%d/%m/%Y'): 
         dict[id] = [fname,lname,gender,startdate] 

print dict

Output: {'100': ['John', 'Jackson', 'M', '08/09/2000'], '101': ['Jenny', 'Hobbs', 'F', '01/13/1995']}

最后dict将包含指向最新记录的唯一 ID。

于 2012-06-07T22:26:56.627 回答