1

我有一个导出的 CSV,有些行在记录中间有一个换行符(ASCII 012)。我需要用空格替换它,但保留每条记录的新行以加载它。

大多数行都很好,但也有一些行:

输入:

10 , ,"2007-07-30 13.26.21.598000" ,1922 ,0 , , , ,"Special Needs List Rows updated :
Row 1 : Instruction: other :Comment: pump runs all of the water for the insd's home" ,10003 ,524 ,"cc:2023" , , ,2023 , , ,"CCR" ,"INSERT" ,"2011-12-03 01.25.39.759555" ,"2011-12-03 01.25.39.759555"

输出:

10 , ,"2007-07-30 13.26.21.598000" ,1922 ,0 , , , ,"Special Needs List Rows updated :Row 1 : Instruction: other :Comment: pump runs all of the water for the insd's home" ,10003 ,524 ,"cc:2023" , , ,2023 , , ,"CCR" ,"INSERT" ,"2011-12-03 01.25.39.759555" ,"2011-12-03 01.25.39.759555"

我一直在研究 Awk,但无法真正理解如何保留实际行。

另一个例子:

输入:

9~~"2007-08-01 16.14.45.099000"~2215~0~~~~"Exposure closed (Unnecessary) : Garage door working
Claim Withdrawn"~~701~"cc:6007"~~564~6007~~~"CCR"~"INSERT"~"2011-12-03 01.25.39.759555"~"2011-12-03 01.25.39.759555"
4~~"2007-08-01 16.14.49.333000"~1923~0~~~~"Assigned to user Leanne Hamshere in group GIO Home Processing (Team 3)"~~912~"cc:6008"~~~6008~~~"CCR"~"INSERT"~"2011-12-03 01.25.39.759555"~"2011-12-03 01.25.39.759555"

输出:

9~~"2007-08-01 16.14.45.099000"~2215~0~~~~"Exposure closed (Unnecessary) : Garage door working Claim Withdrawn"~~701~"cc:6007"~~564~6007~~~"CCR"~"INSERT"~"2011-12-03 01.25.39.759555"~"2011-12-03 01.25.39.759555"
4~~"2007-08-01 16.14.49.333000"~1923~0~~~~"Assigned to user Leanne Hamshere in group GIO Home Processing (Team 3)"~~912~"cc:6008"~~~6008~~~"CCR"~"INSERT"~"2011-12-03 01.25.39.759555"~"2011-12-03 01.25.39.759555"
4

3 回答 3

4

一种使用方式GNU awk

awk -f script.awk file.txt

内容script.awk

BEGIN {
    FS = "[,~]"
}

NF < 21 {
    line = (line ? line OFS : line) $0
    fields = fields + NF
}

fields >= 21 {
    print line
    line=""
    fields=0
}

NF == 21 {
    print
}

或者,您可以使用此单线:

awk -F "[,~]" 'NF < 21 { line = (line ? line OFS : line) $0; fields = fields + NF } fields >= 21 { print line; line=""; fields=0 } NF == 21 { print }' file.txt

解释:

我对您的预期输出进行了观察:似乎每一行都应该包含 21 个字段。因此,如果您的行包含少于 21 个字段,请存储该行并存储字段数。当我们循环到下一行时,该行将使用空格连接到存储的行,以及总计的字段数。如果此字段数大于或等于 21(虚线字段的总和将增加到 22),则打印存储的行。否则,如果该行包含 21 个字段(NF == 21),则打印它。HTH。

于 2012-09-25T07:49:54.487 回答
2

试试这个单行:

awk '{if(t){print;t=0;next;}x=$0;n=gsub(/"/,"",x);if(n%2){printf $0" ";t=1;}else print $0}' file

想法:计算"一行的数量。如果计数是奇数,则加入下一行,否则当前行将被视为完整行。

于 2012-09-25T13:28:20.627 回答
2

我认为sed是你的选择。我假设所有记录都以非冒号字符结尾,因此如果一行以冒号结尾,则将其识别为异常并应连接到前一行。

这是代码:

cat data | sed -e '/[^"]$/N' -e 's/\n//g'

第一次执行-e '/[^"]$/N'匹配异常情况,并在不清空缓冲区的情况下读取下一条记录。然后-e 's/\n//g'删除换行符。

于 2012-09-25T01:11:24.940 回答