3

I have a text file of several hundreds of terms in the following format:

[Term]  
id: id1  
name: name1  
xref: type1:aab  
xref: type2:cdc  

[Term]  
id: id2  
name: name2  
xref: type1:aba  
xref: type3:fee 

I need to extract all terms with an xref of type1 and write them to a new file in the same format. I was planning to use a regular expression like this:

/\[Term\](.*)type1(.*)[^\[Term\]]/g

to find the corresponding terms but I don't know how to search for a regex over multiple lines. Should I read the original text file as a string or rather line for line? Any help would be very much appreciated.

4

2 回答 2

4

试试这个正则表达式:

/(?s)\[Term\].*?xref: type1.*?(?=\[Term\])/g

此正则表达式具有以下显着变化:

  • (?s)打开“点匹配换行符”
  • .*?是一个非贪婪的表达式。使用.*将消耗[Term]文件中的所有内容
  • 删除了周围不必要的分组.*?
  • 添加了轻微的改进以匹配外部参照,而不仅仅是 type1 任何地方
  • 删除了以下术语标记的不正确语法
  • 添加了前瞻以匹配但不包括下一个[Term]标记
于 2013-07-22T11:47:23.977 回答
2

另一种方法可能是使用$/变量将块拆分为空行,每个块用换行符拆分,然后为每一行运行正则表达式。因此,当其中一个匹配时打印并读取下一个块。单线示例:

perl -ne '
    BEGIN { $/ = q|| }
    my @lines = split /\n/;  
    for my $line ( @lines ) {
        if ( $line =~ m/xref:\s*type1/ ) {     
            printf qq|%s|, $_;
            last;
        }
    }
' infile

假设输入文件如下:

[Term]
id: id1
name: name1
xref: type1:aab
xref: type2:cdc

[Term]
id: id2
name: name1
xref: type6:aba
xref: type3:fee

[Term]
id: id2
name: name1
xref: type1:aba
xref: type3:fee

[Term]
id: id2
name: name1
xref: type4:aba
xref: type3:fee

[Term]  
id: id2  
name: name1  
xref: type1:aba  
xref: type3:fee

它产生:

[Term]  
id: id1  
name: name1  
xref: type1:aab  
xref: type2:cdc  

[Term]  
id: id2  
name: name1  
xref: type1:aba  
xref: type3:fee 

[Term]  
id: id2  
name: name1  
xref: type1:aba  
xref: type3:fee

正如你所看到的,只有那些有一行xref: type1的才会被打印出来。

于 2013-07-22T12:04:22.970 回答