csv - 将 CSV 中的 "" 转换为 \" 的命令

Question

从外部来源，我得到巨大的 CSV 文件（大约 16GB），其中的字段可选地用双引号 (") 括起来。字段由分号 (;) 分隔。当字段在内容中包含双引号时，它会被转义作为两个双引号。

目前，我正在将这些导入到 MySQL 数据库中，该数据库了解"".

我正在考虑迁移到 Amazon Redshift，但他们（或者可能是一般的 PostgreSQL）要求使用反斜杠将引号转义为\".

现在我正在寻找最快的命令行工具（可能是 awk、sed？）以及转换文件的确切语法。

示例输入：

"""start of line";"""beginning "" middle and end """;"end of line"""
12345;"Tell me an ""intelligent"" joke; I tell you one in return"
54321;"Your mom is ""nice"""
"";"";""
"However, if;""Quotes""; are present"

示例输出：

"\"start of line";"\"beginning \" middle and end \"";"end of line\""
12345;"Tell me an \"intelligent\" joke; I tell you one in return"
54321;"Your mom is \"nice\""
"";"";""
"However, if;\"Quotes\"; are present"

编辑：添加了更多测试。

score 3 · Accepted Answer

有几个边缘情况需要注意：

如果双引号在字符串的开头怎么办？
如果该字符串是第一个字段怎么办？
包含空字符串的字段

sed -r '
    # at the start of a line or the start of a field, 
    # replace """ with "\"
    s/(^|;)"""/\1"\\"/g

    # replace any doubled double-quote with an escaped double-quote.
    # this affects any "inner" quote pair as well as end of field or end of line
    # if there is an escaped quote from the previous command, don't be fooled by
    # a proceeding quote.
    s/([^\\])""/\1\\"/g

    # the above step will destroy empty strings. fix them here.  this uses a
    # conditional loop: if there are 2 consecutive empty fields, they will
    # share a delimited, so we have to process the line more than once
    :fix_empty_fields
    s/(^|;)\\"($|;)/\1""\2/g
    tfix_empty_fields
' <<'END'

"""start of line";"""beginning "" middle and end """;"end of line"""
"";"";"";"""";"""""";"";""

END

"\"start of line";"\"beginning \" middle and end \"";"end of line\""
"";"";"";"\"";"\"\"";"";""

Sed 是一个高效的工具，但是对于 16GB 的文件需要一段时间。而且你最好有至少 16GB 的可用磁盘空间来写入更新的文件（即使 sed 的-i就地编辑在幕后使用临时文件）

参考：GNU sed 手册，sed 循环命令

score 0 · Accepted Answer

0

这条线应该工作：

sed 's/""/\\"/g' file

于 2013-05-29T14:31:22.517 回答

score 0 · Accepted Answer

与sed：

sed 's/""/\\"/g' input_file

测试：

$ cat n.txt 
12345;"Tell me an ""intelligent"" joke; I tell you one in return"
54321;"Your mom is ""nice"""

$ sed 's/""/\\"/g' n.txt 
12345;"Tell me an \"intelligent\" joke; I tell you one in return"
54321;"Your mom is \"nice\""

score 0 · Accepted Answer

sed正如您在帖子中建议的那样，我会使用：

$ sed 's@""@\\"@g' input
12345;"Tell me an \"intelligent\" joke; I tell you one in return"
54321;"Your mom is \"nice\""

score 0 · Accepted Answer

我会去使用 sed：

$ sed 's:"":\\":g' your_csv.csv

在以下方面进行测试时：

new """
test ""
"hows "" this "" "

我有：

new \""
test \"
"hows \" this \" "

csv - 将 CSV 中的 "" 转换为 \" 的命令

5 回答 5

测试：

Related

Reference