regex - Multiline match with irregular new line

Question

I have text file with many entries like this:

[...]
Wind: 83,476,224
Solution: (category,runs)~
0.235,6.52312667,~
0.98962,14.33858333,~
sdasd,cccc,~
0.996052905,sdsd
EnterValues: 656,136,1
Speed: 48,32
State: 2,102,83,476,224
[...]

From above part I would like to extract:

Solution: (category,runs)~
0.235,6.52312667,~
0.98962,14.33858333,~
sdasd,cccc,~
0.996052905,sdsd

It would be simple if EnterValues: exists after every Solution:, unfortunately it doesn't. Sometime it is Speed, sometime something different. I don't know how to construct the end of regex (I assume it should be sth like this:Solution:.*?(?<!~)\n).

My file has \n as a delimiter of new line.

score 1 · Accepted Answer

正如我所见，您首先将所有文件读入内存，但这不是一个好的做法。尝试使用触发器运算符：

while ( <$fh> ) {
   if ( /Solution:/ ... !/~$/ ) {
      print $_, "\n";
   }
}

我现在无法测试它，但我认为这应该可以正常工作。

score 1 · Accepted Answer

您需要的是应用具有正则表达式功能的“记录分隔符”。不幸的是，您不能使用$/，因为它不能是正则表达式。但是，您可以将整个文件读入一行，并使用正则表达式拆分该行：

use strict;
use warnings;
use Data::Dumper;

my $str = do { 
    local $/;   # disable input record separator
    <DATA>;     # slurp the file
};
my @lines = split /^(?=\pL+:)/m, $str;  # lines begin with letters + colon
print Dumper \@lines;

__DATA__
Wind: 83,476,224
Solution: (category,runs)~
0.235,6.52312667,~
0.98962,14.33858333,~
sdasd,cccc,~
0.996052905,sdsd
EnterValues: 656,136,1
Speed: 48,32
State: 2,102,83,476,224

输出：

$VAR1 = [
          'Wind: 83,476,224
',
          'Solution: (category,runs)~
0.235,6.52312667,~
0.98962,14.33858333,~
sdasd,cccc,~
0.996052905,sdsd
',
          'EnterValues: 656,136,1
',
          'Speed: 48,32
',
          'State: 2,102,83,476,224
'

我假设您将对这些变量进行某种后处理，但我将把它留给您。从这里开始的一种方法是在换行符上拆分值。

score 0 · Accepted Answer

您可以匹配 fromSolution到单词后跟冒号，

my ($solution) = $text =~ /(Solution:.*?) \w+: /xs;

regex - Multiline match with irregular new line

3 回答 3

Related

Reference