regex - How do I replace double quotes within csv fields but not add a trailing double quote to each line?

Question

I have a csv file named data_export_20130206-F.csv. It contains data that contains double quotes (") which is making it very messy to parse.

File looks kind of like this (but with more fields)

"stuff","zipcode"
"<?xml version="1.0" encoding="utf-8" ?>","90210"

I want to "escape" the quotes that are within the fields so it will look like this (Note: the quotes within the xml have been doubled):

"stuff","zipcode"
"<?xml version=""1.0"" encoding=""utf-8"" ?>","90210"

But when I run this:

cat data_export_20130206-F.csv| sed -E 's@([^,])(\")([^,])@\1""\3@g'

Unfortunately, It adds an additional double quote at the end of each line making the document invalid.

"stuff","zipcode""
"<?xml version=""1.0"" encoding=""utf-8"" ?>","90210""

How do I replace double quotes within csv fields but not add a trailing double quote to each line?

score 0 · Accepted Answer

另一种方法是在第二遍中去掉外部双引号：

sed -E 's@([^,])(\")([^,])@\1""\3@g' data_export_20130206-F.csv | sed 's,"\("$\),\1,'

或者简单地通过压缩所有引用重复tr（但如果任何字段以引号结尾，这将中断）：

sed -E 's@([^,])(\")([^,])@\1""\3@g' data_export_20130206-F.csv | tr -s '"'

如果由于某种原因您仍然会删除换行符，请在替换时读取它们：

sed -E 's@([^,])(\")([^,])@\1""\3@g' data_export_20130206-F.csv | sed 's,""$,"\n,'

score 0 · Accepted Answer

这是一个脆弱的解决方案，但它适用于您提供的输入。

perl -pe 's/(?:^"|"(?=,)|"$|(?<=,)")//g;s/"/""/g;s/^/"/;s/$/"/;s/(?:(?=,)|(?<=,))/"/g' FILENAME

注意引号内的逗号会破坏这一点。鉴于您的输入，产生了以下输出。

"stuff","zipcode"
"<?xml version=""1.0"" encoding=""utf-8"" ?>","90210"

score 0 · Accepted Answer

确保在 final 之前没有尾随空格，"否则您的替换将匹配它。您也可以使用sed修剪尾随空格：

sed 's/\s\+$//' x.csv | sed -E 's@([^,])(\")([^,])@\1""\3@g'

3 回答 3