1

I have a csv file named data_export_20130206-F.csv. It contains data that contains double quotes (") which is making it very messy to parse.

File looks kind of like this (but with more fields)

"stuff","zipcode"
"<?xml version="1.0" encoding="utf-8" ?>","90210"

I want to "escape" the quotes that are within the fields so it will look like this (Note: the quotes within the xml have been doubled):

"stuff","zipcode"
"<?xml version=""1.0"" encoding=""utf-8"" ?>","90210"

But when I run this:

cat data_export_20130206-F.csv| sed -E 's@([^,])(\")([^,])@\1""\3@g'

Unfortunately, It adds an additional double quote at the end of each line making the document invalid.

"stuff","zipcode""
"<?xml version=""1.0"" encoding=""utf-8"" ?>","90210""

How do I replace double quotes within csv fields but not add a trailing double quote to each line?

4

3 回答 3

0

另一种方法是在第二遍中去掉外部双引号:

sed -E 's@([^,])(\")([^,])@\1""\3@g' data_export_20130206-F.csv | sed 's,"\("$\),\1,'

或者简单地通过压缩所有引用重复tr(但如果任何字段以引号结尾,这将中断):

sed -E 's@([^,])(\")([^,])@\1""\3@g' data_export_20130206-F.csv | tr -s '"'

如果由于某种原因您仍然会删除换行符,请在替换时读取它们:

sed -E 's@([^,])(\")([^,])@\1""\3@g' data_export_20130206-F.csv | sed 's,""$,"\n,'
于 2013-02-07T23:48:58.890 回答
0

这是一个脆弱的解决方案,但它适用于您提供的输入。

perl -pe 's/(?:^"|"(?=,)|"$|(?<=,)")//g;s/"/""/g;s/^/"/;s/$/"/;s/(?:(?=,)|(?<=,))/"/g' FILENAME

注意引号内的逗号会破坏这一点。鉴于您的输入,产生了以下输出。

"stuff","zipcode"
"<?xml version=""1.0"" encoding=""utf-8"" ?>","90210"
于 2013-02-08T12:01:16.817 回答
0

确保在 final 之前没有尾随空格,"否则您的替换将匹配它。您也可以使用sed修剪尾随空格:

sed 's/\s\+$//' x.csv | sed -E 's@([^,])(\")([^,])@\1""\3@g'
于 2013-02-07T23:46:26.553 回答