2

I'm trying to do something but i'm not sure how to do. I have a file of 23 Mo :

: (blah  
  :aaaaaaaaaaaaaaaaaaaa  
  (bbbbbbbbbbbbbbbbbbbb
: (bloh
  cccccccc
  dddddddd

...

And so on. What il like to do is remove all line jump "\n" except when \n is followed by ": ("
So the final file would be:

: (blah  :aaaaaaaaaaaaaaaaaaaa (bbbbbbbbbbbbbbbbbbbb        
: (bloh  cccccccc  dddddddd
...

I have several idea to do it, the first one is:
- remove all "\n" with sed
- replace all ": (" by "\n: ("
but the problem is the file is 23MO and I don't know how to manage this on one line file of 23mo.

A second idea but still I don't know how to do at all, is:
- remove every "\n" except when it matchs in the pattern "\n: ("
I don't know how.
I'm limited to bash perl sed grep and awk as ressource.
I'd really love to have your inputs.

Have a nice day.

4

5 回答 5

7

我们可以通过定义 awk 的记录和字段分隔变量来完成大部分工作:

awk 'NR==1 {next} {$1=$1;  print ": (" $0}' RS=': \(' FS='\n' OFS="" filename

由于文件以我们定义的记录分隔符开头,因此我们跳过了一条空的第一条记录。

相同的程序,更具可读性

awk '
    BEGIN {FS="\n"; OFS=""; RS=": \("; prefix=": ("}
    NR==1 {next} 
    {$1=$1; print prefix $0}
' filename
于 2013-06-09T13:27:48.553 回答
4

这可能对您有用(GNU sed):

sed -r ':a;$!N;s/\n([^:])/\1/;ta;P;D' file

删除每个,\n除非它与模式“\n:”不匹配

于 2013-06-09T13:51:27.417 回答
4

一种方法awk

$ awk '/^: [(]/&&NR>1{printf "%s",ORS}{printf "%s",$0}END{printf "%s",ORS}' file
: (blah  :aaaaaaaaaaaaaaaaaaaa  (bbbbbbbbbbbbbbbbbbbb
: (bloh  cccccccc  dddddddd
于 2013-06-09T13:06:32.510 回答
2

我找到了 GNU sed 的另一个代码。

sed  -n ':k;N;/\n:\s*(/{$!P;$p;D};s/\n/ /;$p;bk' file
于 2013-06-09T21:38:50.793 回答
1

你确实提到了perl,所以......

perl -pe 'print "\n" if $.>1 && /^: \(/; chomp if ! eof' file

或用于 v5.10 及更高版本

perl -pE 'say "" if $.>1 && /^: \(/; chomp if ! eof' file
于 2013-06-09T13:51:26.173 回答