2

我正在寻找一种模式来匹配这个:

(单词单词单词单词) | 1234.5678% | (1234)

我想将(单词单词单词单词)保留为$ 1,将(1234)保留为$ 2

输入文件如下所示:

Header Crap | More Header Crap|Header Crap | More Header Crap|(words words words words) | 1234.5678% | (1234) | (words words words words) | 1234.5678%        |   (1234)(words words words words) | 1234.5678% | (1234) | (words words words words) |   1234.5678% | (1234)(words words words words) | 1234.5678% | (1234) | (words words words words) | 1234.5678% | (1234) | (words words words words) | 1234.5678% | (1234) | (words words words words) | 1234.5678% | (1234)

我认为这个问题与输入有关。它以一个大块的形式出现(IE $_ 是一大串数据,需要对其进行解析才能找到匹配项)

我尝试过的事情:

while ($_ =~ /(.*)\|{1}\d*?\.{1}\d*?%{1}\|{1}(\d*)/ {
do stuff with $1 and $2
}

@matches = $_ =~ /(.*)\|{1}\d*?\.{1}\d*?%{1}\|{1}(\d*)/

还有一大堆其他类似的变体。我只是在寻找正确方向的指导。任何帮助将不胜感激!

4

5 回答 5

1

在此处使用非贪婪量词:

while ($_ =~ /(.*?)\|{1}\d*?\.{1}\d*?%{1}\|{1}(\d*)/) {
                 ^

我不知道你的括号是文字还是什么,但如果是文字,你需要转义它们:

while ($_ =~ /(\(.*?\))\|{1}\d*?\.{1}\d*?%{1}\|{1}(\(\d*\))/) {
               ^^   ^^                              ^^  ^^

正如@Tim 提到的,不需要{1}量词(恢复文字括号):

while ($_ =~ /(.*?)\|\d*?\.\d*?%\|(\d*)/) {
于 2013-07-26T22:06:33.810 回答
1

Text::CSV解析这种类型的分隔字段通常更容易。

像这样,例如:

use Text::CSV;
use String::Util 'trim';

my $csv = Text::CSV->new({
    sep_char => '|'
});

$csv->parse('(words words words words) | 1234.5678% | (1234)');
foreach ($csv->fields) {
    my $field = trim $_;
    print "$field\n";
}
于 2013-07-26T22:19:35.150 回答
1

原来正则表达式并不是真正的问题。Binmode 模式似乎是答案。我正在从 linux 转到 windows 环境(我的错没有在上面提到这一点:()并且需要处理奇怪的行尾问题这基本上是我最终使用的:

if (open FILE1, $_) {
        binmode($_);
            @file = <FILE1>;
            foreach (@file) {
                if ($_ =~ /(.*?)\|.*?\|(.*?)\|\n/g) {
                    print "$1\n $2\n";
                }
            }
        }   

感谢所有的帮助!

于 2013-07-29T16:23:01.460 回答
0

您可以使用此模式:

/(\(\w+ \w+ \w+ \w+\)) *\| *\d+(?:\.\d+)?% *\| *(\(\d+\))/

该模式的特殊之处在于它接受管道周围的任意数量的空格|

对于更一般的模式,您可以将四个替换\w+[^)]+

/(\([^)]+\)) *\| *\d+(?:\.\d+)?% *\| *(\(\d+\))/

例子:

#!/usr/bin/perl

use strict;

my $string = 'Header Crap | More Header Crap|Header Crap | More Header Crap|(words words words words) | 1234.5678% | (1234) | (words words words words) | 1234.5678%        |   (1234)(words words words words) | 1234.5678% | (1234) | (words words words words) |   1234.5678% | (1234)(words words words words) | 1234.5678% | (1234) | (words words words words) | 1234.5678% | (1234) | (words words words words) | 1234.5678% | (1234) | (words words words words) | 1234.5678% | (1234)';

while($string =~ /(\([^)]+\)) *\| *\d+(?:\.\d+)?% *\| *(\(\d+\))/g) {
    print $1 . " " . $2 . "\n";
}
于 2013-07-26T22:08:55.290 回答
0
use strict;
use warnings;
use 5.014;  

my $str = <<END_OF_STRING;
Header Crap | More Header Crap|Header Crap | More Header
Crap|(words words 1 words words) | 1234.5678% | (1234 1) | 
(words words 2 words words) | 1234.5678% |(1234 2)(words words 3 words words) 
| 1234.5678% | (1234 3) | (words words 4 words words) |  
1234.5678% | (1234 4)(words words 5 words words) | 
1234.5678% | (1234 5) | (words words 6 words words) | 1234.5678% |
(1234 6) | (words words 7 words words) | 1234.5678% | (1234 7) | 
(words words 8 words words) | 1234.5678% | (1234 8)
END_OF_STRING

my $paren_clause = <<END_OF_CLAUSE;
(
    [(]     #An opening parenthesis
    [^)]+   #followed by not a closing parenthesis, one or more times
    [)]     #followed by a closing parenthesis.
)
END_OF_CLAUSE

my $not_paren_clause = "[^(]+";  #Not an opening parenthesis, one or more times

my $pattern = <<END_OF_PATTERN;
    $paren_clause 
    $not_paren_clause
    $paren_clause
END_OF_PATTERN

while ($str =~ /$pattern/xmsg) {
    say "$1 $2";
}

--output:--
(words words 1 words words) (1234 1)
(words words 2 words words) (1234 2)
(words words 3 words words) (1234 3)
(words words 4 words words) (1234 4)
(words words 5 words words) (1234 5)
(words words 6 words words) (1234 6)
(words words 7 words words) (1234 7)
(words words 8 words words) (1234 8)
于 2013-07-27T03:25:03.093 回答