python - 遍历两个文件以创建一个新文件，该文件将第二个文件的字段附加到第一个文件的字段

Question

我是 Python 新手。我尝试使用来自@mgilson、@endolith和@zackbloom zack 示例中的答案的逻辑

我在主记录的第一个字段前面放置了一堆空白列。
我的 out_file 是空的（很可能是因为两个文件中的列无法匹配。

我怎样才能解决这个问题？最终结果应如下所示：

('PUDO_id','Load_id','carrier_id','PUDO_from_company','PUDOItem_id';'PUDO_id';'PUDOItem_make')              
('1','1','14','FMH MATERIAL HANDLING SOLUTIONS','1','1','CROWN','TR3520 / TWR3520','TUGGERS')
('2','2','7','WIESE USA','2','2','CAT','NDC100','3','2','CAT','NDC100','4','2',' 2 BATTERIES')

注意：在第 3 行的输出中，它将子文件中的 3 行附加到数组中，而前 2 行仅附加了子文件中的 1 行。这是由 pri[0] 和 sub[1] 中的值比较 TRUE 确定的。

这是我的代码基于@Zack Bloom：

def build_set(filename):
    # A set stores a collection of unique items.  Both adding items and searching for them
    # are quick, so it's perfect for this application.
    found = set()

    with open(filename) as f:
        for line in f:
                # Tuples, unlike lists, cannot be changed, which is a requirement for anything
                # being stored in a set.
                line = line.replace('"','')
                line = line.replace("'","")
                line = line.replace('\n','')
                found.add(tuple(sorted(line.split(';'))))
    return found

set_primary_records = build_set('C:\\temp\\oz\\loads_pudo.csv')
set_sub_records     = build_set('C:\\temp\\oz\\pudo_items.csv')
record                  = []

with open('C:\\temp\\oz\\loads_pudo_out.csv', 'w') as out_file:
   # Using with to open files ensures that they are properly closed, even if the code
   # raises an exception.

    for pri in set_primary_records :
        for sub in set_sub_records :
            #out_file.write(" ".join(res) + "\n")
            if sub[1] == pri [0] :
                record = pri.extend(sub)
            out_file.write(record + '\n')

样本源数据（主要记录）：

PUDO_id;"Load_id";"carrier_id";"PUDO_from_company"              
1;"1";"14";"FMH MATERIAL HANDLING SOLUTIONS"                
2;"2";"7";"WIESE USA"

示例源数据（子记录）：

PUDOItem_id;"PUDO_id";"PUDOItem_make"
1;"1";"CROWN";"TR3520 / TWR3520";"TUGGERS"
2;"2";" CAT";"NDC100"
3;"2";"CAT";"NDC100"
4;"2";" 2 BATTERIES"
5;"11";"MIDLAND"

score 1 · Accepted Answer

extend 属性不适用于 build_set 创建的元组。元组是不可变的，但它们可以与普通的 Python 字符串函数连接或切片。

例如：

with open('C:\\temp\\oz\\loads_pudo_out.csv', 'w') as out_file:
    for pri in set_primary_records :
        for sub in set_sub_records :
            if sub[1] == pri[0] :
                record = pri + sub
                out_file.write(str(record)[1:-1] + '\n')

这与上面的代码相同，只是修改为允许元组连接。在写入行中，我们将记录转换为字符串，并在附加 '\n' 之前去掉开始和结束括号。也许有更好/更漂亮的方法可以做到这一点，但我也是 Python 新手。

编辑：要获得您期望的输出，需要进行一些更改：

# On this line, remove the sort() as we do not wish to change tuple item order..
found.add(tuple(line.split(';')))

...

with open('C:\\temp\\loads_out.csv', 'w') as out_file:
    for pri in set_primary_records:
        record = pri                        # record tuple is set in main loop
        for sub in set_sub_records:
            if sub[1] == pri[0]:
                record += sub               # for each match, sub appended to record
        out_file.write(str(record) + '\n')  # removed stripping of brackets

python - 遍历两个文件以创建一个新文件，该文件将第二个文件的字段附加到第一个文件的字段

1 回答 1

Related

Reference