2

尽管我更熟悉 Java、Python 和函数式语言,但我必须编写 Perl。我想知道是否有一些惯用的方法来解析一个简单的文件,比如

# comment line - ignore

# ignore also empty lines
key1 = value
key2 = value1, value2, value3

我想要一个函数,我在文件的行上传递一个迭代器,并返回一个从键到值列表的映射。但是为了功能和结构化,我想:

  • 使用包装给定迭代器并返回没有空行或注释行的迭代器的过滤器
  • 提到的过滤器应该在函数之外定义,以便其他函数可重用。
  • 使用给定行的另一个函数并返回键和值字符串的元组
  • 使用另一个将逗号分隔的值分解为值列表的函数。

什么是最现代、最惯用、最干净且仍然实用的方法?代码的不同部分应该是可单独测试和可重用的。

作为参考,这里是(快速破解)我如何在 Python 中做到这一点:

re_is_comment_line = re.compile(r"^\s*#")
re_key_values = re.compile(r"^\s*(\w+)\s*=\s*(.*)$")
re_splitter = re.compile(r"\s*,\s*")
is_interesting_line = lambda line: not ("" == line or re_is_comment_line.match(line))
                                   and re_key_values.match(line)

def parse(lines):
    interesting_lines = ifilter(is_interesting_line, imap(strip, lines))
    key_values = imap(lambda x: re_key_values.match(x).groups(), interesting_lines)
    splitted_values = imap(lambda (k,v): (k, re_splitter.split(v)), key_values)
    return dict(splitted_values)
4

4 回答 4

5

A direct translation of your Python would be

my $re_is_comment_line = qr/^\s*#/;
my $re_key_values      = qr/^\s*(\w+)\s*=\s*(.*)$/;
my $re_splitter        = qr/\s*,\s*/;
my $is_interesting_line= sub {
  my $_ = shift;
  length($_) and not /$re_is_comment_line/ and /$re_key_values/;
};

sub parse {
  my @lines = @_;
  my @interesting_lines = grep $is_interesting_line->($_), @lines;
  my @key_values = map [/$re_key_values/], @interesting_lines;
  my %splitted_values = map { $_->[0], [split $re_splitter, $_->[1]] } @key_values;
  return %splitted_values;
}

Differences are:

  • ifilter is called grep, and can take an expression instead of a block as first argument. These are roughly equivalent to a lambda. The current item is given in the $_ variable. The same applies to map.
  • Perl doesn't emphazise laziness, and seldomly uses iterators. There are instances where this is required, but usually the whole list is evaluated at once.

In the next example, the following will be added:

  • Regexes don't have to be precompiled, Perl is very good with regex optimizations.
  • Instead of extracting key/values with regexes, we use split. It takes an optional third argument that limits the number of resulting fragments.
  • The whole map/filter stuff can be written in one expression. This doesn't make it more efficient, but emphazises the flow of data. Read the map-map-grep from bottom upwards (actually right to left, think of APL).

.

sub parse {
  my %splitted_values =
    map { $_->[0], [split /\s*,\s*/, $_->[1]] }
    map {[split /\s*=\s*/, $_, 2]}
    grep{ length and !/^\s*#/ and /^\s*\w+\s*=\s*\S/ }
    @_;
  return \%splitted_values; # returning a reference improves efficiency
}

But I think a more elegant solution here is to use a traditional loop:

sub parse {
  my %splitted_values;
  LINE: for (@_) {
    next LINE if !length or /^\s*#/;
    s/\A\s*|\s*\z//g; # Trimming the string—omitted in previous examples
    my ($key, $vals) = split /\s*=\s*/, $_, 2;
    defined $vals or next LINE; # check if $vals was assigned
    @{ $splitted_values{$key} } = split /\s*,\s*/, $vals; # Automatically create array in $splitted_values{$key}
  }
  return \%splitted_values
}

If we decide to pass a filehandle instead, the loop would be replaced with

my $fh = shift;
LOOP: while (<$fh>) {
  chomp;
  ...;
}

which would use an actual iterator.

You could now go and add function parameters, but do this only iff you are optimizing for flexibility and nothing else. I already used a code reference in the first example. You can invoke them with the $code->(@args) syntax.

use Carp; # Error handling for writing APIs
sub parse {
  my $args = shift;
  my $interesting  = $args->{interesting}   or croak qq("interesting" callback required);
  my $kv_splitter  = $args->{kv_splitter}   or croak qq("kv_splitter" callback required);
  my $val_transform= $args->{val_transform} || sub { $_[0] }; # identity by default

  my %splitted_values;
  LINE: for (@_) {
    next LINE unless $interesting->($_);
    s/\A\s*|\s*\z//g;
    my ($key, $vals) = $kv_splitter->($_);
    defined $vals or next LINE;
    $splitted_values{$key} = $val_transform->($vals);
  }
  return \%splitted_values;
}

This could then be called like

my $data = parse {
  interesting   => sub { length($_[0]) and not $_[0] =~ /^\s*#/ },
  kv_splitter   => sub { split /\s*=\s*/, $_[0], 2 },
  val_transform => sub { [ split /\s*,\s*/, $_[0] ] }, # returns anonymous arrayref
}, @lines;
于 2013-04-25T18:01:42.377 回答
4

我认为最现代的方法在于利用 CPAN 模块。在您的示例中,Config::Properties可能会有所帮助:

use strict;
use warnings;
use Config::Properties;

my $config = Config::Properties->new(file => 'example.properties') or die $!;
my $value = $config->getProperty('key');
于 2013-04-25T17:27:37.060 回答
2

正如@collapsar 链接的帖子中所指出的,Higher-Order Perl是探索 Perl 中函数式技术的绝佳读物。

这是一个符合您要点的示例:

use strict;
use warnings;
use Data::Dumper;

my @filt_rx = ( qr{^\s*\#},
                qr{^[\r\n]+$} );
my $kv_rx = qr{^\s*(\w+)\s*=\s*([^\r\n]*)};
my $spl_rx = qr{\s*,\s*};

my $iterator = sub {
    my ($fh) = @_;
    return sub {
        my $line = readline($fh);
        return $line;
    };
};
my $filter = sub {
    my ($it,@r) = @_;
    return sub {
        my $line;
        do {
            $line = $it->();
        } while (  defined $line
                && grep { $line =~ m/$_/} @r );
        return $line;
    };
};
my $kv = sub {
    my ($line,$rx) = @_;
    return ($line =~ m/$rx/);
};
my $spl = sub {
    my ($values,$rx) = @_;
    return split $rx, $values;
};

my $it = $iterator->( \*DATA );
my $f = $filter->($it,@filt_rx);

my %map;
while ( my $line = $f->() ) {
    my ($k,$v) = $kv->($line,$kv_rx);
    $map{$k} = [ $spl->($v,$spl_rx) ];
}
print Dumper \%map;

__DATA__
# comment line - ignore

# ignore also empty lines
key1 = value
key2 = value1, value2, value3

它在提供的输入上产生以下哈希:

$VAR1 = {
          'key2' => [
                      'value1',
                      'value2',
                      'value3'
                    ],
          'key1' => [
                      'value'
                    ]
        };
于 2013-04-26T07:00:39.650 回答
0

您可能对这个 SO 问题以及这个问题感兴趣。

以下代码是一个独立的 perl 脚本,旨在让您了解如何在 perl 中实现(仅部分采用功能样式;如果您不反感看到特定的编码样式和/或语言结构,我可以稍微改进解决方案)。

Miguel Prz 是对的,在大多数情况下,您会在CPAN中搜索符合您要求的解决方案。

my (
      $is_interesting_line
    , $re_is_comment_line
    , $re_key_values
    , $re_splitter
);

$re_is_comment_line = qr(^\s*#);
$re_key_values      = qr(^\s*(\w+)\s*=\s*(.*)$);
$re_splitter        = qr(\s*,\s*);
$is_interesting_line = sub {
        my $line = shift;
        return (
                (!(
                        !defined($line)
                     || ($line eq '')
                ))
            &&  ($line =~ /$re_key_values/)
        );
    };

sub strip {
    my $line = shift;
    # your implementation goes here
    return $line;
}
sub parse {
    my @lines = @_;
    #
    my (
          $dict
        , $interesting_lines
        , $k
        , $v
    );
    #
    @$interesting_lines =
        grep {
                &{$is_interesting_line} ( $_ );
            } ( map { strip($_); } @lines )
    ;

    $dict = {};
    map {
        if ($_ =~ /$re_key_values/) {
            ($k, $v) = ($1, [split(/$re_splitter/, $2)]);
            $$dict{$k} = $v;
        }
    } @$interesting_lines;

    return $dict;
} # parse

#
# sample execution goes here
#    
my $parse =<<EOL;
# comment
what = is, this, you, wonder
it = is, perl
EOL

parse ( split (/[\r\n]+/, $parse) );
于 2013-04-25T18:01:14.943 回答