1

\x20*[\n\r]+我有一个损坏的文本文件,\xa0如果下一行(如果存在)不以特定模式开头,我需要替换它DATA\t。如果此类行以空格开头\x20+,则也应将其删除。

我可以这样做sed吗?文本文件大小约为 1MB。


数据示例:

DATA     132942, "I love you", 2398, "Hi how are you"
DATA     78793, "It is 
me", 4322, "My name is Frank"
DATA     24121, "Where
   are
you", 52432, "I am

here"
DATA     43242, "End of story", 432432, "The end"

=>

DATA     132942, "I love you", 2398, "Hi how are you"
DATA     78793, "It is me", 4322, "My name is Frank"
DATA     24121, "Where are you", 52432, "I am here"
DATA     43242, "End of story", 432432, "The end"
4

3 回答 3

1
cat input.txt | sed '{:q;N;s/\x20*[\n\r]\+/\xa0/g;t q}' | sed 's/\xa0DATA/\nDATA/g'
于 2013-09-11T19:46:36.417 回答
1

一种在 Ruby 中执行此操作的方法:

ruby -e 'puts File.read(ARGV.shift).gsub(/ *\r?\n *(?!DATA[[:space:]])/, " ").gsub(/ +$/m, "")' file

输出:

DATA    132942, "I love you", 2398, "Hi how are you"
DATA    78793, "It is me", 4322, "My name is Frank"
DATA    24121, "Where are you", 52432, "I am here"
DATA    43242, "End of story", 432432, "The end"
于 2013-09-11T19:42:25.557 回答
1

这可能对您有用(GNU sed):

sed ':a;$!N;/\nDATA/!s/\s*\n\s*/ /;ta;P;D' file
于 2013-09-11T20:33:32.813 回答