2

我正在尝试找到一种方法来选择性地从文件中删除换行符。删除所有这些我没有问题..但我需要留下一些。

这是错误输入文件的示例。请注意,许可证 ID COO789 和 COO012 的行在我需要删除的描述字段中嵌入了换行符。

"Permit Id","Permit Name","Description","Start Date","End Date"
"COO123","Music Festival",,"02/12/2013","02/12/2013"
"COO456","Race Weekend",,"02/23/2013","02/23/2013"
"COO789","Basketball Final 8 Championships - Media vs. Politicians
Skills Competition",,"02/22/2013","02/22/2013"
"COO012","Dragonboat race 
weekend",,"05/11/2013","05/11/2013"

这是我需要文件看起来如何的示例:

"Permit Number/Id","Permit Name","Description","Start Date","End Date"
"COO123","Music Festival",,"02/12/2013","02/12/2013"
"COO456","Race Weekend",,"02/23/2013","02/23/2013"
"COO789","Basketball Final 8 Championships - Media vs. Politicians Skills Competition",,"02/22/2013","02/22/2013"
"COO012","Dragonboat race weekend",,"05/11/2013","05/11/2013"

注意:我确实通过删除一些额外的列来简化文件。不过,逻辑应该能够容纳任意数量的列。实际的完整标题行包含所有列。从技术上讲,我希望在描述和位置列中找到“额外的”换行符。

"Permit Number/Id","Permit Name","Description","Start Date","End Date","Custom Status","Owner Name","Total Expected Attendance","Location"

我尝试过 sed、cut、tr、nawk 等。对任何可以做到这一点的解决方案都开放......可以从 unix 脚本中调用。

谢谢!!!

4

2 回答 2

1

如果您必须仅从“描述”和“位置”字段中删除换行符,您将需要一个适当的 csv 解析器(想想 Text::CSV)。您也可以使用 相当轻松地做到这一点GNU awk,但不幸的是,您无法gawk在 Solaris 上访问。因此,下一个最佳解决方案是将不以双引号开头的行连接到前一行。您可以使用sed. 我写这个时考虑到兼容性:

sed -e :a -e '$!N; s/ *\n\([^"]\)/ \1/; ta' -e 'P;D' file

结果:

"Permit Id","Permit Name","Description","Start Date","End Date"
"COO123","Music Festival",,"02/12/2013","02/12/2013"
"COO456","Race Weekend",,"02/23/2013","02/23/2013"
"COO789","Basketball Final 8 Championships - Media vs. Politicians Skills Competition",,"02/22/2013","02/22/2013"
"COO012","Dragonboat race weekend",,"05/11/2013","05/11/2013"
于 2013-02-12T16:03:52.877 回答
0
sed ':a;N;$!ba;s/ \n/ /g'

将整个文件读入模式空间,然后删除直接出现在空格之后的所有换行符 - 假设所有错误的换行符都符合此模式。如果没有,什么时候应该删除换行符?

于 2013-02-12T14:49:49.400 回答