regex - Using Perl regex to find and extract matches over multiple lines

Question

I have a text file of several hundreds of terms in the following format:

[Term]  
id: id1  
name: name1  
xref: type1:aab  
xref: type2:cdc  

[Term]  
id: id2  
name: name2  
xref: type1:aba  
xref: type3:fee

I need to extract all terms with an xref of type1 and write them to a new file in the same format. I was planning to use a regular expression like this:

/\[Term\](.*)type1(.*)[^\[Term\]]/g

to find the corresponding terms but I don't know how to search for a regex over multiple lines. Should I read the original text file as a string or rather line for line? Any help would be very much appreciated.

score 4 · Accepted Answer

试试这个正则表达式：

/(?s)\[Term\].*?xref: type1.*?(?=\[Term\])/g

此正则表达式具有以下显着变化：

(?s)打开“点匹配换行符”
.*?是一个非贪婪的表达式。使用.*将消耗[Term]文件中的所有内容
删除了周围不必要的分组.*?
添加了轻微的改进以匹配外部参照，而不仅仅是 type1 任何地方
删除了以下术语标记的不正确语法
添加了前瞻以匹配但不包括下一个[Term]标记

score 2 · Accepted Answer

另一种方法可能是使用$/变量将块拆分为空行，每个块用换行符拆分，然后为每一行运行正则表达式。因此，当其中一个匹配时打印并读取下一个块。单线示例：

perl -ne '
    BEGIN { $/ = q|| }
    my @lines = split /\n/;  
    for my $line ( @lines ) {
        if ( $line =~ m/xref:\s*type1/ ) {     
            printf qq|%s|, $_;
            last;
        }
    }
' infile

假设输入文件如下：

[Term]
id: id1
name: name1
xref: type1:aab
xref: type2:cdc

[Term]
id: id2
name: name1
xref: type6:aba
xref: type3:fee

[Term]
id: id2
name: name1
xref: type1:aba
xref: type3:fee

[Term]
id: id2
name: name1
xref: type4:aba
xref: type3:fee

[Term]  
id: id2  
name: name1  
xref: type1:aba  
xref: type3:fee

它产生：

[Term]  
id: id1  
name: name1  
xref: type1:aab  
xref: type2:cdc  

[Term]  
id: id2  
name: name1  
xref: type1:aba  
xref: type3:fee 

[Term]  
id: id2  
name: name1  
xref: type1:aba  
xref: type3:fee

正如你所看到的，只有那些有一行xref: type1的才会被打印出来。

regex - Using Perl regex to find and extract matches over multiple lines

2 回答 2

Related

Reference