python - 从大型 CSV (1GB) 中删除逗号

Question

我有一个大的 CSV 文件 (1GB)，我想从中删除逗号。数据都是正整数。我尝试过的方法包括使用空格作为分隔符的 dlmwrite，但输出随后以十进制格式输出。我也尝试过使用 fprintf 命令，但后来我失去了矩阵的形状（即所有数据都出现在一行或一列中）。

因此，

有没有一种简单的方法可以从 CSV (input.txt) 中读取：

1, 2, 3, 4, 5
2, 3, 4, 5, 6

然后以如下形式输出到文本文件（output.txt）：

1 2 3 4 5
2 3 4 5 6

score 11 · Accepted Answer

在 Python 中，如果格式真的那么简单（并且每个逗号后面已经有一个空格）：

with open("infile.csv") as infile, open("outfile.csv", "w") as outfile:
    for line in infile:
        outfile.write(line.replace(",", ""))

如果您不能确定空格：

import re
with open("infile.csv") as infile, open("outfile.csv", "w") as outfile:
    for line in infile:
        outfile.write(re.sub(r"\s*,\s*", " ", line))

score 1 · Accepted Answer

就个人而言，我喜欢使用 sed，一个替换字符串的命令行程序。

此应用程序可在 linux 上使用，也可通过 cygwin 安装在 Windows 中使用。

使用

sed -i 's/,/ /g' filename

文件中的所有逗号都替换为空格。

score 0 · Accepted Answer

python 有一个csv用于 CSV 文件 i/o 的模块。

import csv
with open("input.txt") as infile:
    with open("output.txt", "w") as outfile:
        for line in csv.reader(infile):
            outfile.write(' '.join(line)+'\n')

同样matlab有一个csvread函数

M = csvread('input.txt');
dlmwrite('output.txt', M, 'delimiter', ' ', 'precision', '%ld');

但是这段代码有问题。首先，您以单个大块读取文件，而不是逐行读取：您可能会耗尽内存。Secondcsvread总是返回一个double数组，因此在读取大整数时可能会丢失精度。最后，如果input.txt有可变数量的列，则矩阵M为零填充。

强烈推荐python解决方案！

score 0 · Accepted Answer

您可以使用 fgetl 从文件描述符中逐行读取，如下所示：

fid=fopen('file.csv');
if (fid==-1)
    return
end
sl=fgetl(fid);        
while (~feof(fid))
    sl=fgetl(fid);  
    icol=find(sl==',');
end  

fclose(fid);

在 sl 中，您可以将 , 替换为空格并再次写入磁盘

python - 从大型 CSV (1GB) 中删除逗号

4 回答 4

Related

Reference