bash - 从 csv 保留行中删除换行符

Question

我有一个导出的 CSV，有些行在记录中间有一个换行符（ASCII 012）。我需要用空格替换它，但保留每条记录的新行以加载它。

大多数行都很好，但也有一些行：

输入：

10 , ,"2007-07-30 13.26.21.598000" ,1922 ,0 , , , ,"Special Needs List Rows updated :
Row 1 : Instruction: other :Comment: pump runs all of the water for the insd's home" ,10003 ,524 ,"cc:2023" , , ,2023 , , ,"CCR" ,"INSERT" ,"2011-12-03 01.25.39.759555" ,"2011-12-03 01.25.39.759555"

输出：

10 , ,"2007-07-30 13.26.21.598000" ,1922 ,0 , , , ,"Special Needs List Rows updated :Row 1 : Instruction: other :Comment: pump runs all of the water for the insd's home" ,10003 ,524 ,"cc:2023" , , ,2023 , , ,"CCR" ,"INSERT" ,"2011-12-03 01.25.39.759555" ,"2011-12-03 01.25.39.759555"

我一直在研究 Awk，但无法真正理解如何保留实际行。

另一个例子：

输入：

9~~"2007-08-01 16.14.45.099000"~2215~0~~~~"Exposure closed (Unnecessary) : Garage door working
Claim Withdrawn"~~701~"cc:6007"~~564~6007~~~"CCR"~"INSERT"~"2011-12-03 01.25.39.759555"~"2011-12-03 01.25.39.759555"
4~~"2007-08-01 16.14.49.333000"~1923~0~~~~"Assigned to user Leanne Hamshere in group GIO Home Processing (Team 3)"~~912~"cc:6008"~~~6008~~~"CCR"~"INSERT"~"2011-12-03 01.25.39.759555"~"2011-12-03 01.25.39.759555"

输出：

9~~"2007-08-01 16.14.45.099000"~2215~0~~~~"Exposure closed (Unnecessary) : Garage door working Claim Withdrawn"~~701~"cc:6007"~~564~6007~~~"CCR"~"INSERT"~"2011-12-03 01.25.39.759555"~"2011-12-03 01.25.39.759555"
4~~"2007-08-01 16.14.49.333000"~1923~0~~~~"Assigned to user Leanne Hamshere in group GIO Home Processing (Team 3)"~~912~"cc:6008"~~~6008~~~"CCR"~"INSERT"~"2011-12-03 01.25.39.759555"~"2011-12-03 01.25.39.759555"

score 4 · Accepted Answer

一种使用方式GNU awk：

awk -f script.awk file.txt

内容script.awk：

BEGIN {
    FS = "[,~]"
}

NF < 21 {
    line = (line ? line OFS : line) $0
    fields = fields + NF
}

fields >= 21 {
    print line
    line=""
    fields=0
}

NF == 21 {
    print
}

或者，您可以使用此单线：

awk -F "[,~]" 'NF < 21 { line = (line ? line OFS : line) $0; fields = fields + NF } fields >= 21 { print line; line=""; fields=0 } NF == 21 { print }' file.txt

解释：

我对您的预期输出进行了观察：似乎每一行都应该包含 21 个字段。因此，如果您的行包含少于 21 个字段，请存储该行并存储字段数。当我们循环到下一行时，该行将使用空格连接到存储的行，以及总计的字段数。如果此字段数大于或等于 21（虚线字段的总和将增加到 22），则打印存储的行。否则，如果该行包含 21 个字段（NF == 21），则打印它。HTH。

score 2 · Accepted Answer

试试这个单行：

awk '{if(t){print;t=0;next;}x=$0;n=gsub(/"/,"",x);if(n%2){printf $0" ";t=1;}else print $0}' file

想法：计算"一行的数量。如果计数是奇数，则加入下一行，否则当前行将被视为完整行。

score 2 · Accepted Answer

我认为sed是你的选择。我假设所有记录都以非冒号字符结尾，因此如果一行以冒号结尾，则将其识别为异常并应连接到前一行。

这是代码：

cat data | sed -e '/[^"]$/N' -e 's/\n//g'

第一次执行-e '/[^"]$/N'匹配异常情况，并在不清空缓冲区的情况下读取下一条记录。然后-e 's/\n//g'删除换行符。

bash - 从 csv 保留行中删除换行符

3 回答 3

Related

Reference