0

更新:解决方案

我设法让以下代码工作

import collections
from lxml import etree
## Up here is code for getting an .xml input file from the user, opening that file, etc. ##
## This part is in a for loop that goes over each order in the xml file ##
## This all would have an extra indent because it is under this: for order in root.xpath('//order'): ##
itemlist = []
    ## This part looks through the .xml file for the order it is currently iterating and puts the items into a list ##
    for element in order.iter('items'):
        itemlist.append ("%s" % str.upper((element.get('type'))))
    ## This part 'sanitizes' the order name from the .xml file for use as a key ##
    for element in order.iter('order'):
        ordername = element.get('name')
        strippedordername = re.sub('[/\()!@#$%^&*()]', '', ordername)
        allordernames.append (strippedordername)
        print strippedordername
        #print itemlist
        ## This bit compiles a shopping list of items in a special dict subclass called a Counter. ##
        ordercounter.update(itemlist)
        ## This part makes a dict with order names for its keys and their corresponding Counter of items as its values ##
        ordersdictsdict[strippedordername] = collections.Counter(itemlist)
zeros = dict((k,0) for k in ordercounter.keys())
for cntr in ordersdictsdict.values():
    cntr.update(zeros)

#print ordercounter
#print ordersdictsdict
key_order = list(ordercounter.keys())
print key_order
with open(out_file,'w') as fout:
    fout.write('Order,'+','.join(key_order)+'\n')
    fout.write('Totals,'+','.join(str(ordercounter[k]) for k in key_order)+'\n') 
    for ordername,dct in ordersdictsdict.items():
        fout.write(ordername+','+','.join(str(dct[k]) for k in key_order)+'\n')
fout.closed

输出最终看起来像这样:

Order,Spam,Eggs,Baked Beans,Sausage
Totals,13,1,1,1
Order for Joe,2,1,0,1
Order for Jill,11,0,1,0

我有的

我的脚本接受输入的 xml 文件并对其进行解析,查找订单名称,然后查找订单内容。一个xml文件中可以有多个订单。然后我有一个柜台,可以统计所有订单中的所有物品,并给我一个总的购物清单。

鉴于这两个样本订单:

Order for Joe: Spam, Egg, Sausage, Spam
Order for Jill: Spam, Spam, Spam, Spam, Spam, Spam, Spam, Beaked Beans, Spam, Spam, Spam, Spam

计数器看起来像这样: Counter({'Spam': 13,'Baked Beans' 1, 'Egg': 1, 'Sausage': 1})

然后我将其写入 csv 文件,使其看起来像这样:

Item,Count
Spam,13
Baked Bean,1
Egg,1
Sausage,1

我想要的是

虽然总购物清单很好,但我想扩展我的输出 csv 文件,以包括每个订单名称的购物清单。我不在乎订单名称是行还是列。我也并不真正关心不在该顺序中的项目的单元格是 a0还是空,但我将0在我的示例中使用。

订单名称为行的示例所需输出

Order Name,Spam,Baked Beans,Egg,Sausage
Totals,13,1,1,1
Order for Joe,2,0,1,1
Order for Jill,11,1,0,0

以订单名称作为列的示例所需输出

Item,Totals,Order for Joe,Order for Jill
Spam,13,2,11
Baked Beans,1,0,1
Egg,1,1,0
Sausage,1,1,0

笔记

我希望这个脚本适用于任何输入文件——当然,如果输入只包含一个订单,那么Totals将匹配该订单名称。我必须首先制作一个总计计数器(以便我拥有相关订单的所有可能项目),然后在 csv 中填写每个订单的计数。换句话说,我无法通过将项目写入硬编码来启动我的 csv 文件,因为下一个输入文件可能在订单中有不同的项目。

4

3 回答 3

1

为什么不能Counter对输入文件的每一行都使用 a?

from collections import Counter
d = {}  
#*1* Alternatively, could use : d = defaultdict(Counter)
with open(inputfile) as input_file:
    for line in input_file:
        for_who, items = line[:-1].split(':',1)
        d[for_who] = Counter(items.split(','))  
        #Alternatively, if using defaultdict at *1*, d.update(items.split(','))
        #This allows "joe" to register multiple shopping lists which get summed into 1

#get totals by `sum`ming your Counters values:
totals = sum(d.values())

#Now add a 0-dict to each of the dictionaries just to make sure they have all the keys
zeros = dict((k,0) for k in totals)
for cntr in d.values():
    cntr.update(zeros)

key_order = list(totals.keys())  #list for py2k
with open(output_file,'w') as fout:
    fout.write('Order '+','.join(key_order)+'\n')
    fout.write('Totals,'+','.join(str(totals[k]) for k in key_order)+'\n') 
    for person,dct in d.items():
        fout.write(person+','+','.join(str(dct[k]) for k in key_order)+'\n') 

如果您的项目的名称中可以包含逗号,您可能需要更加棘手地处理引用(Think csvmodule for that stuff),但这应该为您提供一个很好的起点。

于 2012-12-13T15:50:52.140 回答
1

您可以使用 csv.DictWriter 来管理您的输出。

您将为每个单独的订单组装一长串计数器,加上一个包含总数的计数器。

当您阅读输入时,请按如下方式处理输入:

  1. 使用 .update 将顺序中的每个项目添加到“总计”字典中
  2. 通过创建新的将订单中的每个项目添加到“订单”字典
  3. 为每个柜台添加一个“订单名称”键,带有订单名称
  4. 创建您的 DictWriter 实例,字段名称为 totals.keys()
于 2012-12-13T15:51:31.337 回答
1

我建议使用嵌套collections.defaultdict集在 0 处初始化。

假设您的输入文件如下所示:

Order for Joe: Spam, Egg, Sausage, Spam
Order for Jill: Spam, Spam, Spam, Spam, Spam, Spam, Spam, Beaked Beans, Spam, Spam, Spam, Spam

然后,您可以获得总计和单个订单计数,如下所示:

answer = collections.defaultdict(collections.defaultdict(int))
with open('path/to/input') as infile:
    for line in infile:
        name, _, orders = line.partition(":")
        name = name.rpartition(' ')[-1]
        orders = orders.strip().split(',')
        for order in orders:
            answer['total'][order] += 1
            answer[name][order] += 1
with open('path/to/output') as outfile:
    keys = sorted(answer['total'])
    outfile.write("Order Name,%s" %(','.join(keys)))
    outfile.write('total,%s' %(','.join(answer['total'][k] for k in keys)))
    for name, orders in answer.iteritems():
        if name != 'total':
            outfile.write('%s,%s' %(name, ','.join(answer[name][k] for k in keys)))
于 2012-12-13T15:52:09.923 回答