如果您只想根据 ID加入每个 CSV 行,请不要使用DictReader
. 字典键必须是唯一的,但是您正在生成具有多个EXECUTION_STATUS
andRELEASE
等列的行。
此外,您将如何处理一个或两个输入 CSV 文件没有输入的 id?
使用常规阅读器并存储按文件名键入的每一行。fieldnames
也列个清单:
import csv
from collections import defaultdict
result = defaultdict(dict)
filenames = ("FR1.1.csv", "FR2.0.csv", "FR2.5.csv")
lengths = {}
fieldnames = ["TEST_ID"]
for csvfile in filenames:
with open(csvfile, 'rb') as infile:
reader = csv.reader(infile)
headers = next(reader, []) # read first line, headers
fieldnames.extend(headers[1:]) # all but the first column name
lengths[csvfile] = len(headers) - 1 # keep track of how many items to backfill
for row in reader:
result[row[0]][csvfile] = row[1:] # all but the first column
with open("out.csv", "wb") as outfile:
writer = csv.writer(outfile)
writer.writerow(fieldnames)
for id_ in sorted(result):
row = [id_]
data = result[id_]
for filename in filenames:
row.extend(data.get(filename) or [''] * lengths[filename])
writer.writerow(row)
此代码存储每个文件名的行,以便您以后可以从每个文件构建一整行,但如果该文件中缺少该行,仍可以填写空白。
另一种方法是通过为每个列附加一个数字或文件名来使列名唯一;这样你的DictReader
方法也可以工作。
以上给出:
TEST_ID, RELEASE , COMPILE_STATUS , EXECUTION_STATUS, RELEASE , COMPILE_STATUS , EXECUTION_STATUS, RELEASE , COMPILE_STATUS , EXECUTION_STATUS
FC/B_019.config , FR1.1 , COMPILE_PASSED , EXECUTION_PASSED, FR2.0 , COMPILE_PASSED , EXECUTION_PASSED, FR2.5 , COMPILE_PASSED , EXECUTION_PASSED
FC/B_020.config , FR1.1 , COMPILE_PASSED , EXECUTION_PASSED, FR2.0 , COMPILE_PASSED , EXECUTION_PASSED, FR2.5 , COMPILE_PASSED , EXECUTION_PASSED
FC/B_021.config , FR1.1 , COMPILE_FAILED , EXECUTION_FAILED, FR2.0 , COMPILE_FAILED , EXECUTION_FAILED, FR2.5 , COMPILE_FAILED , EXECUTION_FAILED
如果您需要根据其中一个输入文件来排序,则从第一个读取循环中省略该输入文件;相反,在编写输出循环时读取该文件并使用其第一列查找其他文件数据:
import csv
from collections import defaultdict
result = defaultdict(dict)
filenames = ("FR2.0.csv", "FR2.5.csv")
lengths = {}
fieldnames = []
for csvfile in filenames:
with open(csvfile, 'rb') as infile:
reader = csv.reader(infile)
headers = next(reader, []) # read first line, headers
fieldnames.extend(headers[1:]) # all but the first column name
lengths[csvfile] = len(headers) - 1 # keep track of how many items to backfill
for row in reader:
result[row[0]][csvfile] = row[1:] # all but the first column
with open("FR1.1.csv", "rb") as infile, open("out.csv", "wb") as outfile:
reader = csv.reader(infile)
headers = next(reader, []) # read first line, headers
writer = csv.writer(outfile)
writer.writerow(headers + fieldnames)
for row in sorted(reader):
data = result[row[0]]
for filename in filenames:
row.extend(data.get(filename) or [''] * lengths[filename])
writer.writerow(row)
这确实意味着其他两个文件中的任何额外TEST_ID
值都将被忽略。
如果您想保留所有TEST_ID
s ,那么我会使用collections.OrderedDict()
; 在后面的文件中找到的newTEST_ID
将被添加到末尾:
import csv
from collections import OrderedDict
result = OrderedDict(dict)
filenames = ("FR1.1.csv", "FR2.0.csv", "FR2.5.csv")
lengths = {}
fieldnames = ["TEST_ID"]
for csvfile in filenames:
with open(csvfile, 'rb') as infile:
reader = csv.reader(infile)
headers = next(reader, []) # read first line, headers
fieldnames.extend(headers[1:]) # all but the first column name
lengths[csvfile] = len(headers) - 1 # keep track of how many items to backfill
for row in reader:
if row[0] not in result:
result[row[0]] = {}
result[row[0]][csvfile] = row[1:] # all but the first column
with open("out.csv", "wb") as outfile:
writer = csv.writer(outfile)
writer.writerow(fieldnames)
for id_ in result:
row = [id_]
data = result[id_]
for filename in filenames:
row.extend(data.get(filename) or [''] * lengths[filename])
writer.writerow(row)
维护插入顺序的OrderedDict
条目;所以FR1.1.csv
设置所有键的顺序,但是FR2.0.csv
在第一个文件中找不到的任何 id 都将附加到最后的字典中,依此类推。
对于 < 2.7 的 Python 版本,安装一个反向端口(请参阅OrderedDict 了解旧版本的 python)或手动跟踪 ID 顺序:
import csv
from collections import defaultdict
result = defaultdict(dict)
filenames = ("FR1.1.csv", "FR2.0.csv", "FR2.5.csv")
lengths = {}
fieldnames = ["TEST_ID"]
ids, seen = [], set()
for csvfile in filenames:
with open(csvfile, 'rb') as infile:
reader = csv.reader(infile)
headers = next(reader, []) # read first line, headers
fieldnames.extend(headers[1:]) # all but the first column name
lengths[csvfile] = len(headers) - 1 # keep track of how many items to backfill
for row in reader:
id_ = row[0]
# track ordering
if id_ not in seen:
seen.add(id_)
ids.append(id_)
result[id_][csvfile] = row[1:] # all but the first column
with open("out.csv", "wb") as outfile:
writer = csv.writer(outfile)
writer.writerow(fieldnames)
for id_ in ids:
row = [id_]
data = result[id_]
for filename in filenames:
row.extend(data.get(filename) or [''] * lengths[filename])
writer.writerow(row)