这正是unpack
有用的东西的类型。
#!/usr/bin/env perl
use v5.10.0;
use strict;
use warnings;
while( my $line = <> ){
chomp $line;
my @elem = unpack 'A16 A6 A*', $line;
$elem[1] = sprintf '%06d', $.;
# $. is the line number for the last used file handle
say @elem;
}
实际上看这些行,前 14 个字符中似乎存储了日期信息。
假设在某些时候您可能出于某种原因想要解析行,您可以使用以下示例来说明如何使用unpack
拆分行。
#!/usr/bin/env perl
use v5.10.0; # say()
use strict;
use warnings;
use DateTime;
my @date_elem = qw'
year month day
hour minute second
';
my @elem_names = ( @date_elem, qw'
ZZ
line_number
random_data
');
while( my $line = <> ){
chomp $line;
my %data;
@data{ @elem_names } = unpack 'A4 (A2)6 A6 A*', $line;
# choose either this:
$data{line_number} = sprintf '%06d', $.;
say @data{@elem_names};
# or this:
$data{line_number} = $.;
printf '%04d' . ('%02d'x5) . "%2s%06d%s\n", @data{ @elem_names };
# the choice will affect the contents of %data
# this just shows the contents of %data
for( @elem_names ){
printf qq'%12s: "%s"\n', $_, $data{$_};
}
# you can create a DateTime object with the date elements
my $dt = DateTime->new(
(map{ $_, $data{$_} } @date_elem),
time_zone => 'floating',
);
say $dt;
print "\n";
}
尽管使用正则表达式会更好,这样您就可以丢弃虚假数据。
use v5.14; # /a modifier
...
my $rdate = join '', map{"(\\d{$_})"} 4, (2)x5;
my $rx = qr'$rdate (ZZ) (\d{6}) (.*)'xa;
while( my $line = <> ){
chomp $line;
my %data;
unless( @data{ @elem_names } = $line =~ $rx ){
die qq'unable to parse line "$line" ($.)';
}
...
还是会更好;使用5.10中添加的命名捕获组。
...
my $rx = qr'
(?<year> \d{4} ) (?<month> \d{2} ) (?<day> \d{2} )
(?<hour> \d{2} ) (?<minute> \d{2} ) (?<second> \d{2} )
ZZ
(?<line_number> \d{6} )
(?<random_data> .* )
'xa;
while( my $line = <> ){
chomp $line;
unless( $line =~ $rx ){
die qq'unable to parse line "$line" ($.)';
}
my %data = %+;
# for compatibility with previous examples
$data{ZZ} = 'ZZ';
...