0

我正在寻找解析 csv 文件并聚合 2 列。

csv文件中的数据:

'IP Address', Severity
10.0.0.1, High
10.0.0.1, High
10.0.0.1, Low
10.0.0.1, Medium
10.0.0.2, Medium
10.0.0.2, High
10.0.0.2, Low
10.0.0.3, Medium
10.0.0.3, High
10.0.0.3, Medium

我希望获得以下方面的输出:

'IP Address', Severity
10.0.0.1, High:2, Medium:1, Low:1
10.0.0.2, High:1, Medium:1, Low:1
10.0.0.3, High:1, Medium:2, Low:0

或(理想情况下)

'IP Address', High, Medium, Low
10.0.0.1, 2, 1, 1
10.0.0.2, 1, 1, 1
10.0.0.3, 1, 2, 0

我最接近的是这里: Parse CSV file and aggregate the values

我似乎无法聚合字符串(严重性)变量。

这个数据怎么输出?

任何帮助表示赞赏。

4

2 回答 2

1

这是我的解决方案,ag.py:

import collections
import csv
import sys

output = collections.defaultdict(collections.Counter)

with open(sys.argv[1]) as infile:
    reader = csv.reader(infile)
    reader.next() # Skip header line
    for ip,level in reader:
        level = level.strip() # Remove surrounding spaces
        output[ip][level] += 1

print "'IP Address',High,Medium,Low"
for ip, count in output.items():
    print '{0},{1[High]},{1[Medium]},{1[Low]}'.format(ip, count)

要运行解决方案,请发出以下命令:

python ag.py data.csv

讨论

  • output是一个字典,其键是 IP,值是collections.Counter对象。
  • 每个计数器对象对特定 IP 计数“高”、“中”和“低”
  • 我的解决方案打印到标准输出,您可以修改它以打印到文件
于 2013-07-08T14:30:11.147 回答
1
import csv 
from collections import defaultdict

with open('text.txt') as f, open('ofile.csv','w+') as g:
    reader,writer = csv.reader(f), csv.writer(g)
    results = defaultdict(list)
    next(reader) #skip header line
    for ip,severity in reader:
        results[ip].append(severity)
    writer.writerow(["'IP Adress'"," High"," Medium"," Low"]) #Write headers
    for ip,severities in sorted(results.iteritems()):
        writer.writerow([ip]+[severities.count(t) for t in [" High"," Medium"," Low"]])

产生:

'IP Adress', High, Medium, Low
10.0.0.1,2,1,1
10.0.0.2,1,1,1
10.0.0.3,1,2,0
于 2013-07-08T12:48:48.450 回答