python - 创建符合标准的文件

Question

我有一个逗号分隔的文件。线条看起来像这样...

1,2,3,4,5
6,7,8
9,10
11,12,13,14,15

我需要在所有行中恰好有 5 列。所以新文件将是...

1,2,3,4,5
6,7,8,,
9,10,,,
11,12,13,14,15

换句话说，如果一行中的逗号少于 4 个。在末尾添加所需的数字。有人告诉我有一个 python 模块可以做同样的事情。我在哪里可以找到这样的模块？awk 是否更适合此类任务？

score 2 · Accepted Answer

您正在寻找的模块是csvmodule。您仍然需要确保您的列表满足您的最小长度要求：

with open('output.csv', 'wb') as output:
    input = csv.reader(open('faultyfile.csv', 'rb'))
    output = csv.writer(output, dialect=input.dialect)
    for line in input:
        if len(line) < 5:
            line.extend([''] * (5 - len(line)))
        output.writerow(line)

score 2 · Accepted Answer

如果您不介意使用 awk，那么很容易：

$ cat data.txt 
1,2,3,4,5
6,7,8
9,10
11,12,13,14,15

$ awk -F, 'BEGIN {OFS=","} {print $1,$2,$3,$4,$5}' data.txt 
1,2,3,4,5
6,7,8,,
9,10,,,
11,12,13,14,15

score 1 · Accepted Answer

def correct_file(fname):
    with open(fname) as f:
         data = [ line[:-1]+(4-line.count(','))*',' + '\n' for line in f ]
    with open(fname,'w'):
         f.writelines(data)

如评论中所述，当您确实不需要时，这会将整个文件读入内存。要一次性完成所有操作：

import shutil
def correct_file(fname):
    with open(fname,'r') as fin, open('temp','w') as fout:
        for line in fin:
           new = line[:-1]+(4-line.count(','))*',' + '\n'
           fout.write(new)
    shutil.move('temp',fname)

这将使任何名为的文件temp在当前目录中消失。当然，您总是可以使用该tempfile模块来解决这个问题......

对于稍微冗长但防弹（？）的版本：

import shutil
import tempfile
import atexit
import os

def try_delete(fname):
    try:
       os.unlink(fname)
    except OSError:
       if os.path.exists(fname):
          print "Couldn't delete existing file",fname

def correct_file(fname):
    with open(fname,'r') as fin, tempfile.NamedTemporaryFile('w',delete=False) as fout:
        atexit.register(lambda f=fout.name: try_delete(f)) #Need a closure here ...
        for line in fin:
           new = line[:-1]+(4-line.count(','))*',' + '\n'
           fout.write(new)
    shutil.move(fout.name,fname) #This should get rid of the temporary file ...

score 1 · Accepted Answer

with open('somefile.txt') as f:
      rows = []
      for line in f:
          rows.append(line.split(","))

max_cols = len(max(rows,key=len))
for row in rows:
    row.extend(['']*(max_cols-len(row))

print "\n".join(str(r) for r in rows)

如果您确定它总是长 n 项（在本例中为 5 项），并且在打开文件之前您总是会知道……这样做会更节省内存（类似这样）

 with open("f1","r"):
      with open("f2","w"):
          for line in f1:
              f2.write(line+(","*(4-line.count(",")))+"\n")

score 0 · Accepted Answer

0

这可能对您有用（GNU sed）：

 sed ':a;s/,/&/4;t;s/$/,/;ta' file

于 2012-09-20T17:28:11.773 回答

python - 创建符合标准的文件

5 回答 5

Related

Reference