perl - 解析 perl 正则表达式

Question

if ( $_ =~ /^(\d+)_[^,]+,"",(.+)"NR"(.+)"0","",""/ )                    
{ }
elsif ( $_ =~ /^[^_]+_[^,]+,"([\d\/]+)","[^"]+","[^"]+","[^"]+","[^"]+","[^"]+",
               "[^"]+","[^"]+","[^"]+","[^"]+","[^"]+","[^"]+","[^"]+",.+/x    )

在第一次，是重复数字一次或多次，然后是_，然后重复任何不等于的字符，一次或多次，“”，做什么？它看起来是一个空格还是逗号是某种转义字符，有点困惑并且没有能力在这台机器上测试它。正则表达式中通常有逗号吗？也是一开始的^，它是一个锚还是否定整个事情？

第二种说法更糟

score 6 · Accepted Answer

CPAN 模块YAPE::Regex::Explain可用于解析和解释您不理解的 Perl 正则表达式。这是您的第一个正则表达式的输出：

(?-imsx:^(\d+)_[^,]+,"",(.+)"NR"(.+)"0","","")

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  ^                        the beginning of the string
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    \d+                      digits (0-9) (1 or more times (matching
                             the most amount possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  _                        '_'
----------------------------------------------------------------------
  [^,]+                    any character except: ',' (1 or more times
                           (matching the most amount possible))
----------------------------------------------------------------------
  ,"",                     ',"",'
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    .+                       any character except \n (1 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
  "NR"                     '"NR"'
----------------------------------------------------------------------
  (                        group and capture to \3:
----------------------------------------------------------------------
    .+                       any character except \n (1 or more times
                             (matching the most amount possible))
----------------------------------------------------------------------
  )                        end of \3
----------------------------------------------------------------------
  "0","",""                '"0","",""'
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

你也可以使用该模块来解析你的第二个正则表达式（我不会在这里转储它，因为解释会很长而且非常多余。）但是如果你想试一试，试试这个：

use strict;
use warnings;
use YAPE::Regex::Explain;

my $re = qr/^[^_]+_[^,]+,"([\d\/]+)","[^"]+","[^"]+","[^"]+","[^"]+","[^"]+",
           "[^"]+","[^"]+","[^"]+","[^"]+","[^"]+","[^"]+","[^"]+",.+/x;

print YAPE::Regex::Explain->new( $re )->explain;

score 2 · Accepted Answer

一切如你所说。
,"",匹配一个逗号，后跟两个双引号，后跟一个逗号。
逗号在正则表达式模式中并不重要。
^是一个锚（字符串的开头）。[^...]它仅在字符类 ( )的第一个字符时取反。

更好的方法是使用Text::CSV_XS将行解析为字段，然后匹配获得的值。

if (   my ($num) = $row->[0] =~ /^(\d+)_[^,]+\z/
   and $row->[1] eq ""
   and ...
) {
   ...
}
elsif (... ) {
   ...
}

perl - 解析 perl 正则表达式

2 回答 2

Related

Reference