python - 使用 Python 解析 CSV

Question

我有以下 csv 文件，其中包含三个字段 Vulnerability Title、Vulnerability Severity Level、Asset IP Address（显示漏洞名称、漏洞级别和存在该漏洞的 IP 地址）。我正在尝试打印一份报告，该报告将在其旁边的严重性列中列出漏洞以及具有该漏洞的 IP 地址的最后一列列表。

Vulnerability Title Vulnerability Severity Level    Asset IP Address
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566)   4   10.103.64.10
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566)   4   10.103.64.10
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566)   4   10.103.65.10
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566)   4   10.103.65.164
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566)   4   10.103.64.10
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566)   4   10.10.30.81
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566)   4   10.10.30.81
TLS/SSL Server Supports RC4 Cipher Algorithms (CVE-2013-2566)   4   10.10.50.82
TLS/SSL Server Supports Weak Cipher Algorithms  6   10.103.65.164
Weak Cryptographic Key  3   10.103.64.10
Unencrypted Telnet Service Available    4   10.10.30.81
Unencrypted Telnet Service Available    4   10.10.50.82
TLS/SSL Server Supports Anonymous Cipher Suites with no Key Authentication  6   10.103.65.164
TLS/SSL Server Supports The Use of Static Key Ciphers   3   10.103.64.10
TLS/SSL Server Supports The Use of Static Key Ciphers   3   10.103.65.10
TLS/SSL Server Supports The Use of Static Key Ciphers   3   10.103.65.100
TLS/SSL Server Supports The Use of Static Key Ciphers   3   10.103.65.164
TLS/SSL Server Supports The Use of Static Key Ciphers   3   10.103.65.164
TLS/SSL Server Supports The Use of Static Key Ciphers   3   10.103.64.10
TLS/SSL Server Supports The Use of Static Key Ciphers   3   10.10.30.81

我想重新创建一个 csv 文件，该文件使用漏洞标题选项卡作为键，并创建第二个名为漏洞严重级别的选项卡，最后一个选项卡将包含所有具有漏洞的 IP 地址

import csv
from pprint import pprint
from collections import defaultdict
import glob
x= glob.glob("/root/*.csv")

d = defaultdict()
n = defaultdict()
for items in x:
        with open(items, 'rb') as f:
                reader = csv.DictReader(f, delimiter=',')
                for row in reader:
                        a = row["Vulnerability Title"]
                        b = row["Vulnerability Severity Level"], row["Asset IP Address"]
                        c = row["Asset IP Address"]
        #               d = row["Vulnerability Proof"]
                        d.setdefault(a, []).append(b)
        f.close()
pprint(d)
with open('results/ipaddress.csv', 'wb') as csv_file:
        writer = csv.writer(csv_file)
        for key, value in d.items():
                for x,y in value:
                        n.setdefault(y, []).append(x)
#                       print x
                        writer.writerow([key,n])

with open('results/ipaddress2.csv', 'wb') as csv2_file:
        writer = csv.writer(csv2_file)
        for key, value in d.items():
             n.setdefault(value, []).append(key)
             writer.writerow([key,n])

因为我不能很好地解释。让我试着简化

假设我有以下 csv

Car model   owner
Honda   Blue    James
Toyota  Blue    Tom
Chevy   Green   James
Chevy   Green   Tom

我正在尝试按以下方式创建此 csv：

Car model   owner
Honda   Blue    James
Toyota  Blue    Tom
Chevy   Green   James,Tom

两种解决方案都是正确的。这也是我的最终脚本

import csv
import pandas as pd

df = pd.read_csv('test.csv', names=['Vulnerability Title', 'Vulnerability Severity Level','Asset IP Address'])
#print df
grouped = df.groupby(['Vulnerability Title','Vulnerability Severity Level'])

groups = grouped.groups
#print groups
new_data = [k + (v['Asset IP Address'].tolist(),) for k, v in grouped]
new_df = pd.DataFrame(new_data, columns=['Vulnerability Title' ,'Vulnerability Severity Level', 'Asset IP Address'])

print new_df
new_df.to_csv('final.csv')

谢谢你

score 1 · Accepted Answer

以您的汽车为例回答。本质上，我正在创建一个以汽车品牌为键的字典和一个二元组。元组的第一个元素是颜色，第二个元素是所有者列表。）：

import csv

car_dict = {}
with open('<file_to_read>', 'rb') as fi:
    reader = csv.reader(fi)
    for f in reader:
        if f[0] in car_dict:
            car_dict[f[0]][1].append(f[2]) 
        else:
            car_dict[f[0]] = (f[1], [f[2]])

with open('<file_to_write>', 'wb') as ou:
    for k in car_dict:
        out_string ='{}\t{}\t{}\n'.format(k, car_dict[k][0], ','.join(car_dict[k][1]))
        ou.write(out_string)

score 1 · Accepted Answer

操作结构化日期时，尤其是大型数据集。我想建议你使用pandas。

对于您的问题，我将为您举一个 pandas groupby 功能的示例作为解决方案。假设你有数据：

data = [['vt1', 3, '10.0.0.1'], ['vt1', 3, '10.0.0.2'], 
        ['vt2', 4, '10.0.10.10']]

pandas 的操作日期非常繁琐：

import pandas as pd

df = pd.DataFrame(data=data, columns=['title', 'level', 'ip'])
grouped = df.groupby(['title', 'level'])

然后

groups = grouped.groups

将是您几乎需要的字典。

print(groups)
{('vt1', 3): [0, 1], ('vt2', 4): [2]}

[0,1]表示行标签。实际上，您可以迭代这些组以应用您想要的任何操作。例如，如果要将它们保存到 csv 文件中：

new_data = [k + (v['ip'].tolist(),) for k, v in grouped]
new_df = pd.DataFrame(new_data, columns=['title', 'level', 'ips'])

现在让我们看看 new_df 是什么：

  title  level                   ips
0   vt1      3  [10.0.0.1, 10.0.0.2]
1   vt2      4          [10.0.10.10]

这就是你需要的。最后，保存到文件：

new_df.to_csv(filename)

我强烈建议你学习 pandas 数据操作。您可能会发现这更容易和更清洁。

python - 使用 Python 解析 CSV

2 回答 2

Related

Reference