1

我有一个代码可以给我这样的列表:

  Name  id number week number
    Piata   4            6    
    Mali    2          20,5    
    Goerge  5           4    
    Gooki   3         24,64,6   
    Mali    5          45,9
    Piata   6           1    
    Piata  12          2,7,8,27,16 etc..

使用以下代码:

import csv
from datetime import date

datedict = defaultdict(set)
with open('d:/info.csv', 'r') as csvfile:
    filereader = csv.reader(csvfile, 'excel')
    #passing the header
    read_header = False
    start_date=date(year=2009,month=1,day=1)
    #print((seen_date - start_date).days)
    tdic = {}
    for row in filereader: 
        if not read_header:
            read_header = True
            continue

    # reading the rest rows
        name,id,firstseen = row[0],row[1],row[3]
        try:
            seen_date = datetime.datetime.strptime(firstseen, '%d/%m/%Y').date()               
            deltadays = (seen_date-start_date).days
            deltaweeks = deltadays/7 + 1
            key = name,id
            currentvalue = tdic.get(key, set())
            currentvalue.add(deltaweeks)
            tdic[key] = currentvalue

        except ValueError:
            print('Date value error')
            pass

现在,我想将我的列表转换为一个列表,该列表为我提供每个名称的 id 数量及其周数,如下面的列表:

Name     number of ids      weeknumbers
Mali         2                20,5,45,9
Piata        3               1,6,2,7,8,27,16
Goerge       1                   4
Gooki        1                 24,64,6

谁能帮我写这部分的代码?

4

2 回答 2

0

鉴于:

tdict = {('Mali', 5): set([9, 45]), ('Gooki', 3): set([24, 64, 6]), ('Goerge', 5): set([4]), ('Mali', 2): set([20, 5]), ('Piata', 4): set([4]), ('Piata', 6): set([1]), ('Piata', 12): set([8, 16, 2, 27, 7])}

然后输出上面的结果:

names = {}
for ((name, id), more_weeks) in tdict.items():
  (ids, weeks) = names.get(name, (0, set()))
  ids = ids + 1
  weeks = weeks.union(more_weeks)
  names[name] = (ids, weeks)

for (name, (id, weeks)) in names.items():
  print("%s, %s, %s" % (name, id, weeks)
于 2013-04-22T14:34:14.247 回答
0

既然您的 csv 文件看起来有标题(您目前正在忽略),为什么不使用 aDictReader而不是标准reader类?如果您不提供字段名,DictReader则会假定第一行包含它们,这也将使您不必跳过循环中的第一行。

这似乎是一个使用模块的defaultdict好机会。Countercollections

import csv
from datetime import date
from collections import defaultdict, Counter


datedict = defaultdict(set)
namecounter = Counter()
with open('d:/info.csv', 'r') as csvfile:
    filereader = csv.DictReader(csvfile)
    start_date=date(year=2009,month=1,day=1)

    for row in filereader: 
        name,id,firstseen = row['name'], row['id'], row['firstseen']

        try:
            seen_date = datetime.datetime.strptime(firstseen, '%d/%m/%Y').date() 
        except ValueError:
            print('Date value error')
            pass

        deltadays = (seen_date-start_date).days
        deltaweeks = deltadays/7 + 1

        datedict[name].add(deltaweeks)
        namecounter.update([name])  # Without putting name into a list, update will index each character

这假设它(name, id)是唯一的。如果不是这种情况,那么您可以使用另一个defaultdictfor namecounter。我还移动了 try-except 语句,因此它在您正在测试的内容中更加明确。

于 2013-04-22T15:14:02.560 回答