0

一些示例数据:

title1|title2|title3|title4|merge
test|data|here|and
test|data|343|AND
",3|data|343|and

我尝试对此进行编码:

import csv
import StringIO

storedoutput = StringIO.StringIO()
fields = ('title1', 'title2', 'title3', 'title4', 'merge')
with open('file.csv', 'rb') as input_csv:
    reader = csv.DictReader(input_csv, fields, delimiter='|')
    for counter, row in enumerate(reader):
        counter += 1
        #print row
        if counter != 1:
            for field in fields:
                if field == "merge":
                    row['merge'] = ("%s%s%s" % (row["title1"], row["title3"], row["title4"]))
                    print row
                    storedoutput.writelines(','.join(map(str, row)) + '\n')

contents = storedoutput.getvalue()
storedoutput.close()

print "".join(contents)

with open('file.csv', 'rb') as input_csv:
    input_csv = input_csv.read().strip()

output_csv = []
output_csv.append(contents.strip())

if "".join(output_csv) != input_csv:
    with open('file.csv', 'wb') as new_csv:
        new_csv.write("".join(output_csv))

输出应该是

title1|title2|title3|title4|merge
test|data|here|and|testhereand
test|data|343|AND|test343AND
",3|data|343|and|",3343and

供您在运行此代码时参考,第一次打印它会打印我希望然后出现在输出 csv 中的行。但是,第二次打印将标题行打印 x 次,其中 x 是行数。

任何输入或更正或工作代码将不胜感激。

4

3 回答 3

2

最后一行中的双引号肯定会弄乱 csv.DictReader()。这有效:

new_lines = []
with open('file.csv', 'rb') as f:
    # skip the first line
    new_lines.append(f.next().strip())
    for line in f:
        # the newline and split the fields
        line = line.strip().split('|')
        # exctract the field data you want
        title1, title3, title4 = line[0], line[2], line[3]
        # turn the field data into a string and append in to the rest
        line.append(''.join([title1, title3, title4]))
        # save the new line for later
        new_lines.append('|'.join(line))

with open('file.csv', 'w') as f:
    # make one long string and write it to the new file
    f.write('\n'.join(new_lines))
于 2013-10-19T02:31:38.190 回答
2

我认为我们可以让这变得简单。我承认,与流氓打交道"有点麻烦,因为你必须努力告诉 Python 你不想担心它。

import csv

with open('file.csv', 'rb') as input_csv, open("new_file.csv", "wb") as output_csv:
    reader = csv.DictReader(input_csv, delimiter='|', quoting=csv.QUOTE_NONE)
    writer = csv.DictWriter(output_csv, reader.fieldnames, delimiter="|",quoting=csv.QUOTE_NONE, quotechar=None)

    merge_cols = "title1", "title3", "title4"

    writer.writeheader()

    for row in reader:
        row["merge"] = ''.join(row[col] for col in merge_cols)
        writer.writerow(row)

生产

$ cat new_file.csv 
title1|title2|title3|title4|merge
test|data|here|and|testhereand
test|data|343|AND|test343AND
",3|data|343|and|",3343and

请注意,即使您希望更新原始文件,我也拒绝了。为什么?这是一个坏主意,因为这样您就可以在处理数据时破坏您的数据。

我怎么能这么肯定?因为这正是我第一次运行您的代码时所做的,而且我知道得更清楚。;^)

于 2013-10-19T02:12:46.000 回答
0
import csv
import StringIO

stored_output = StringIO.StringIO()

with open('file.csv', 'rb') as input_csv:
    reader = csv.DictReader(input_csv, delimiter='|', quoting=csv.QUOTE_NONE)
    writer = csv.DictWriter(stored_output, reader.fieldnames, delimiter="|",quoting=csv.QUOTE_NONE, quotechar=None)

    merge_cols = "title1", "title3", "title4"

    writer.writeheader()

    for row in reader:
        row["merge"] = ''.join(row[col] for col in merge_cols)
        writer.writerow(row)

    contents = stored_output.getvalue()
    stored_output.close()
    print contents

with open('file.csv', 'rb') as input_csv:
    input_csv = input_csv.read().strip()

if input_csv != contents.strip():
    with open('file.csv', 'wb') as new_csv:
        new_csv.write("".join(contents))
于 2013-10-19T02:46:53.010 回答