我是 perl 新手,最近遇到以下问题。

我有一个格式为“$num1 $num2 $num3 $num4”的字符串,$num1、$num2、$num3、$num4 是实数,可以是科学数字或常规格式。

现在我想使用正则表达式从字符串中提取 4 个数字。

$real_num = '\s*([+-]?[0-9]+\.?[0-9]*([eE][+-]?[0-9]+)?)'
while (<FP>) {
    if (/$real_num$real_num$real_num$real_num/) {
        print $1; print $2; print$3; print$4;

如何从 $1、$2、$3、$4 获得 $num1、$num2、$num3、$num4?由于 $real_num 正则表达式中有一个必要的括号,所以 $1, $2, $3, $4 不是我现在所期望的。



$real_num = '\s*([+-]?[0-9]+\.?[0-9]*(?:[eE][+-]?[0-9]+)?)'

现在,问题是:/$real_num$real_num$real_num$real_num/如果有超过 4 个数字,很容易失败。可能现在情况并非如此。但是,您也应该注意这一点。拆分将是更好的选择。

while (<FP>) {
    my @numbers = split /\s+/; #<-- an array with the parsed numbers


use strict;
use warnings;
use Scalar::Util qw/looks_like_number/;

while(<DATA>) {
    my @numbers = split /\s+/;
    @numbers = map { looks_like_number($_) ? $_ : undef } @numbers;
    say "@numbers";

1 2 NaN 4 -1.23
5 6 f 8 1.32e12


1 2 NaN 4 -1.23
5 6  8 1.32e12
  1. 您确定您的行仅包含数字还是它们还包含其他数据(或者可能某些行根本没有数字而只有其他数据)?
  2. 您确定所有数字彼此和/或其他数据之间至少有一个空格隔开吗?如果不是,它们是如何分开的?(例如,输出portsnap fetch生成很多像这样的数字 3690....3700.... 带有小数点并且根本没有用于分隔它们的空格。


my @numbers = split /\s+/;

如果您不确定您的行是否包含数字,但您确定每个数字与其他数字或其他数据之间至少有一个空格,那么下一行代码是一种非常好的方法,可以巧妙地正确提取数字允许 Perl 本身识别所有许多不同的合法数字格式的方法。(这假设您不想将其他数据值转换为NaN。)结果@numbers将正确识别当前输入行中的所有数字。

my @numbers = grep { 1*$_ eq $_ } m/(\S*\d\S*)/g;
# we could do simply a split, but this is more efficient because when
# non-numeric data is present, it will only perform the number
# validation on data pieces that actually do contain at least one digit

您可以通过检查表达式的真值来确定是否至少存在一个数字,以及通过使用条件等@numbers > 1来确定是否恰好存在四个。@numbers == 4

如果您的数字相互碰撞,例如 5.17e+7-4.0e-1,那么您将遇到更困难的时期。那是您唯一需要复杂正则表达式的时候。


注意 2:由于 map 在存储 undef 值时的工作方式很微妙,因此投票最多的答案存在问题。当使用该程序从第一行数据(例如 HTTP 日志文件)中提取数字时,该程序的输出可以说明这一点。输出看起来是正确的,但数组实际上有很多空元素,并且找不到$numbers[0]按预期存储的第一个数字。事实上,这是完整的输出:

$ head -1 http | perl prog1.pl
Use of uninitialized value $numbers[0] in join or string at prog1.pl line 8, <> line 1.
Use of uninitialized value $numbers[1] in join or string at prog1.pl line 8, <> line 1.
Use of uninitialized value $numbers[2] in join or string at prog1.pl line 8, <> line 1.
Use of uninitialized value $numbers[3] in join or string at prog1.pl line 8, <> line 1.
Use of uninitialized value $numbers[4] in join or string at prog1.pl line 8, <> line 1.
Use of uninitialized value $numbers[5] in join or string at prog1.pl line 8, <> line 1.
Use of uninitialized value $numbers[6] in join or string at prog1.pl line 8, <> line 1.
Use of uninitialized value $numbers[7] in join or string at prog1.pl line 8, <> line 1.
Use of uninitialized value $numbers[10] in join or string at prog1.pl line 8, <> line 1.
Use of uninitialized value $numbers[11] in join or string at prog1.pl line 8, <> line 1.
Use of uninitialized value $numbers[12] in join or string at prog1.pl line 8, <> line 1.
Use of uninitialized value $numbers[13] in join or string at prog1.pl line 8, <> line 1.
Use of uninitialized value $numbers[14] in join or string at prog1.pl line 8, <> line 1.
Use of uninitialized value $numbers[15] in join or string at prog1.pl line 8, <> line 1.
Use of uninitialized value $numbers[16] in join or string at prog1.pl line 8, <> line 1.
        200 2206


但是,我的解决方案在视觉和实际数组内容中都产生了正确的结果,即 $numbers[0]、$number[1] 等实际上是数据文件行中包含的第一个和第二个数字。

while (<>) {
my @numbers = m/(\S*\d\S*)/g;
@numbers = grep { $_ eq 1*$_ } @numbers;
print "@numbers\n";

$ head -1 http | perl prog2.pl

200 2206

此外,使用 slow 库函数会使其他解决方案的运行速度降低 50%。在 10,000 行数据上运行程序时,输出在其他方面是相同的。

my $number = '([-+]?(?:\d+\.\d+|\.\d+|\d+)(?:[Ee][-+]\d+)?)';

my $type = shift;

if ($type eq 'all') {

while (<>) {
my @all_numbers = m/$number/g;
# finds legal numbers whether space separated or not
# this can be great, but it also means the string
# (an IP address) will return
# 120.120, .120, and .120
print "@all_numbers\n";

} else {
while (<>) {
my @ss_numbers = grep { m/^$number$/ } split /\s+/;
# finds only space separated numbers
print "@ss_numbers\n";


$ prog-jkm2.pl all < input # prints all numbers
$ prog-jkm2.pl < input # prints just space-separated numbers

OP 可能需要的唯一代码:

my $number = '(-?(?:\d+\.\d+|\.\d+|\d+)(?:[Ee][-+]\d+)?)';
my @numbers = grep { m/^$number$/ } split /\s+/;



  $ head -1 http | perl prog-jkm2.pl
200 2206
  $ head -1 http | perl prog-jkm2.pl all
67.195 .114 .38 19 2011 01 20 31 -0400 1 1 1.0 200 2206 5.0
