1

我有一个平面文件,其中的线条看起来像

KEYWORD|DATA STRING HERE|32|50135|ANOTHER DATA STRING
KEYWORD|STRING OF DATA|1333|552555666|ANOTHER STRING
KEYWORD|STRING OF MORE DATA|4522452|5345245245|REALLY REALLY REALLY REALLY
LONGSTRING THAT INSERTED A LINE BREAK WHEN I WAS EXTRACTING FROM SQLPLUS/ORACLE
KEYWORD|.....

我该如何去删除换行符,以便

KEYWORD|STRING OF MORE DATA|4522452|5345245245|REALLY REALLY REALLY REALLY
LONGSTRING THAT INSERTED A LINE BREAK WHEN I WAS EXTRACTING FROM SQLPLUS/ORACLE

变成

KEYWORD|STRING OF MORE DATA|4522452|5345245245|REALLY REALLY REALLY REALLY LONGSTRING THAT INSERTED A LINE BREAK WHEN I WAS EXTRACTING FROM SQLPLUS/ORACLE

这是在 HP-UNIX 环境中,我可以将文件移动到另一个系统(安装了 powershell 和 ruby​​ 的 windows 框)。

4

6 回答 6

2

I don't know what tools are you using, but you can use this regex to match every \n (or maybe \r) that isn't followed by KEYWORD so you can replace it for SPACE and you would have it.

DEMO

Regex: \r(?!KEYWORD) (With global modifier)

于 2012-11-27T12:30:08.073 回答
2

Ruby 的 Array 有一个很好的方法,称为slice_before它继承自 Enumerable,在这里可以起到帮助作用:

require 'pp'

text = 'KEYWORD|DATA STRING HERE|32|50135|ANOTHER DATA STRING
KEYWORD|STRING OF DATA|1333|552555666|ANOTHER STRING
KEYWORD|STRING OF MORE DATA|4522452|5345245245|REALLY REALLY REALLY REALLY
LONGSTRING THAT INSERTED A LINE BREAK WHEN I WAS EXTRACTING FROM SQLPLUS/ORACLE
KEYWORD|.....'

pp text.split("\n").slice_before(/^KEYWORD/).map{ |a| a.join(' ') }

=> ["KEYWORD|DATA STRING HERE|32|50135|ANOTHER DATA STRING",
 "KEYWORD|STRING OF DATA|1333|552555666|ANOTHER STRING",
 "KEYWORD|STRING OF MORE DATA|4522452|5345245245|REALLY REALLY REALLY REALLY LONGSTRING THAT INSERTED A LINE BREAK WHEN I WAS EXTRACTING FROM SQLPLUS/ORACLE",
 "KEYWORD|....."]

此代码只是在换行符处拆分您的文本,然后用于slice_before将结果数组分成子数组,每个子数组一个以 . 开头的文本块/^KEYWORD/。然后它遍历生成的子数组,用一个空格将它们连接起来。任何未预先拆分的行都将被单独保留。那些被打破的被重新加入。

对于实际使用,您可能希望pp用常规替换puts.

至于使用 Ruby 将代码移至 Windows,为什么?在 HP-Unix 上安装 Ruby 并在那里运行它。这是更自然的合身。

于 2012-11-27T14:30:44.530 回答
1

Powershell方式:

[System.IO.File]::ReadAllText( "c:\myfile.txt" ) -replace "`r`n(?!KEYWORD)", ' '
于 2012-11-27T15:01:16.753 回答
1

这可能对您有用(GNU sed):

sed ':a;$!{N;/\n.*|/!{s/\n/ /;ba}};P;D' file

在模式空间中保留两行,如果第二行不包含空格,则|用空格替换换行符并重复,直到它包含或到达文件末尾。

这假设最后一个字段是溢出的字段,否则使用KEYWORD这样的:

sed ':a;$!{N;/\nKEYWORD/!{s/\n/ /;ba}};P;D' file
于 2012-11-27T14:54:17.940 回答
1

这个简短的 awk oneliner 应该可以完成这项工作:

awk '/^KEYWORD/{print ""}{printf $0}' file
于 2012-11-27T13:34:35.907 回答
0

您可以为此使用sedawk(首选) »

  • sed -n 's|\r||g;$!{1{x;d};H};${H;x;s|\n\(KEYWORD\)|\r\1|g;
    s|\n||g;s|\r|\n|g;p}' file.txt

  • awk 'BEGIN{ORS="";}NR==1{print;next;}/^KEYWORD/
    {print"\n";print;next;}{print;}' file.txt


注意: 将每个命令 ( sed, awk) 写在一行中

于 2012-11-27T12:56:42.763 回答