0

Let's consider that I have two lists

Person 1 :

    2012-08      person 1             23
    2012-09      person 1             63 
    2012-10      person 1             99  
    2012-11      person 1             62 

and

Person 2 :

    2012-08      person 2             45
    2012-09      person 2             69 
    2012-10      person 2             12  
    2012-11      person 2             53 

What's your suggestion if I'de like to have a tabular data with the following pattern:

Date        Person 1       Person 2
-----       ---------      ---------
2012-08      23             45
2012-09      63             69 
2012-10      99             12  
2012-11      62             53 

UPDATE:

Here is the list :

List1 = [(u'201206', u'Customer_1', 0.19048299999999993), (u'201207', u'Customer_1', 15.409000999998593), (u'201208', u'Customer_1', 71.1695730000299), (u'201209', u'Customer_1', 135.73918600011424), (u'201210', u'Customer_1', 235.26299999991522), (u'201211', u'Customer_1', 271.768984999485), (u'201212', u'Customer_1', 355.90968299883934), (u'201301', u'Customer_1', 508.39194049821526), (u'201302', u'Customer_1', 631.136656500077), (u'201303', u'Customer_1', 901.9127695088399), (u'201304', u'Customer_1', 951.9143960094264)]

List 2 = [(None, None, None), (None, None, None), (None, None, None), (None, None, None), (None, None, None), (None, None, None), (None, None, None), (u'201301', u'Customer_2', 3.7276289999999657), (u'201302', u'Customer_2', 25.39122749999623), (u'201303', u'Customer_2', 186.77777299985306), (u'201304', u'Customer_2', 387.97834699805617)]
4

3 回答 3

2

用于在处理itertools.izip()时组合两个输入序列:

import itertools

reader1 = csv.reader(file1)
reader2 = csv.reader(file2)

for row1, row2 in itertools.izip(reader1, reader2):
    # process row1 and row2 together.

这也适用于列表;izip()使长列表的合并变得高效;它是zip()函数的迭代器版本,在 python 2 中,它在内存中实现了整个组合列表。

如果您可以将创建输入列表的功能重新组装到生成器中,请使用:

def function_for_list1(inputfilename):
    with open(inputfilename, 'rb') as f:
        reader = csv.reader(f)
        for row in reader:
            # process row
            yield row

def function_for_list2(inputfilename):
    with open(inputfilename, 'rb') as f:
        reader = csv.reader(f)
        for row in reader:
            # process row
            yield row

for row1, row2 in itertools.izip(function_for_list1(somename), function_for_list2(someothername)):
    # process row1 and row2 together

这种安排使您可以处理千兆字节的信息,同时只在内存中保存处理一小组行所需的信息。

于 2013-05-26T20:38:51.080 回答
0
l1=[ ['2012-08','person 1',23], ['2012-09','person 1',63], 
        ['2012-10','person 1',99], ['2012-11','person 1',62]]

l2=[ ['2012-08','person 2',45], ['2012-09','person 2',69],
['2012-10','person 2',12], ['2012-11','person 2',53]]

h1 = { x:z for x,y,z in l1}
h2 = { x:z for x,y,z in l2}

print "{:<10}{:<10}{:<10}".format("Date", "Person 1", "Person 2")
print "{:<10}{:<10}{:<10}".format('-'*5, '-'*8, '-'*8)
for d in sorted(h1): print "{:<10} {:<10}{:<10}".format(d,h1[d],h2[d])

输出

Date      Person 1  Person 2  
-----     --------  --------  
2012-08    23        45        
2012-09    63        69        
2012-10    99        12        
2012-11    62        53        
于 2013-05-26T20:48:50.207 回答
0

如果 Python 不是必需的,并且两个 CSV 文件的生成发生在一个普通的旧 bash 脚本中,那么您可以组合joinawk(甚至cut)。

例子:

假设这个文件被称为one

2012-08 person1 23
2012-09 person1 63 
2012-10 person1 99  
2012-11 person1 62 

这个文件叫做two

2012-08 person2 45
2012-09 person2 69 
2012-10 person2 12  
2012-11 person2 53 

然后命令

join one two | awk '{print $1 " " $3 " " $5}'

将输出:

2012-08 23 45
2012-09 63 69
2012-10 99 12
2012-11 62 53

将 CSV 标头放在输出上,或选择不同的分隔符并不困难。

请注意,一个警告是这两个文件必须在连接列上排序才能正常工作。但是您可能已经知道这一点,因为您说这两个 CSV 文件非常庞大。因此,您可能不想一次将它们全部读入内存。恕我直言,普通的 Unix 工具非常适合这种事情。

于 2013-05-26T20:48:54.473 回答