这是一般 CSV 解析问题的一个特例。通用解决方案由 Lorance Stinson (google Stinson awk CSV parser
) 提供,但恕我直言,处理此特定问题的最简单方法是将双引号内的换行符转换为其他字符,以每记录单行格式对文件执行任何操作,然后转换回来,例如:
$ cat file
"Test_data1" "Test_data2" "1s" "452" "Test
data643" "
" "4d" "System" "Institute"
"Test_data3" "Test_data4" "2s" "563" "Test
data754" "
" "5d" "Non System" "Association"
要转换为单行:
$ awk -v FS= '{for (i=1;i<=NF;i++) if ($i=="\"") inQ=!inQ; ORS=(inQ?"♥":"\n") }1' file
"Test_data1" "Test_data2" "1s" "452" "Test♥data643" "♥" "4d" "System" "Institute"
"Test_data3" "Test_data4" "2s" "563" "Test♥data754" "♥" "5d" "Non System" "Association"
转换回来很简单tr
:
$ awk -v FS= '{for (i=1;i<=NF;i++) if ($i=="\"") inQ=!inQ; ORS=(inQ?"♥":"\n") }1' file | tr '♥' '
\n'
"Test_data1" "Test_data2" "1s" "452" "Test
data643" "
" "4d" "System" "Institute"
"Test_data3" "Test_data4" "2s" "563" "Test
data754" "
" "5d" "Non System" "Association"
上面使用 control-C 作为引号内的换行符的替换,选择您喜欢的任何字符(或字符串,如果您想使用 awk 或 sed 而不是 tr 转换回换行符)。
只需在 awk 和 tr 之间插入命令即可对原始文件执行任何操作,例如反向排序:
$ awk -v FS= '{for (i=1;i<=NF;i++) if ($i=="\"") inQ=!inQ; ORS=(inQ?"♥":"\n") }1' file | sort -r | tr '♥' '\n'
"Test_data3" "Test_data4" "2s" "563" "Test
data754" "
" "5d" "Non System" "Association"
"Test_data1" "Test_data2" "1s" "452" "Test
data643" "
" "4d" "System" "Institute"