2

我有一个名为的数组@mytitles,其中包含很多标题,例如,title1等等title2。我有一个名为“ Superdataset”的文件,其中包含与每个标题有关的信息。但是,相关信息title1可能是 6 行,而信息title2可能是 30 行(随机)。每条信息(对于 a titlex)都以“”开头,以“ Reading titlex”结尾Done reading titlex

从每个标题的这些信息行中,我需要提取一些数据。Done reading titlex我认为幸运的是,我需要的这些数据每次都在“”之前的两行中

所以我的“ Superdataset”看起来像:

阅读标题1  
 随机信息线1
 随机信息行2
 随机信息 line3
 随机信息 line4
 随机信息 line5
 我的收入是6000
 我的费用是1000
完成阅读标题1
阅读标题2
 随机信息第 6 行
 随机信息 line7
 随机信息 line8
 随机信息 line9
 随机信息行 10
 随机信息行 11
 随机信息行 12
 随机信息行 13
 随机信息行 14
 我的收入是11000
 我的开销是9000
完成阅读标题2

我需要总费用和总收入。有什么建议么?PS-数组有复杂的名字,不是那么简单titlex

4

3 回答 3

0

除非您可以预测相关行之前的行是什么,否则触发器运算符不会通过优化起到太大作用。我认为使用缓冲区数组会更容易,只需匹配收入和支出之后的行。

#!/usr/bin/perl
use strict;
use warnings;

my @buffer;
my ($earnings, $expenses);

for my $line (<DATA>) {
    shift @buffer if @buffer > 2;
    push @buffer, $line;

    next if $line !~ /^Done reading/;

    $earnings += $1 if $buffer[0] =~ /(\d+)$/;
    $expenses += $1 if $buffer[1] =~ /(\d+)$/;
}
print "Total earnings: $earnings\n";
print "Total expenses: $expenses\n";

__DATA__
Reading title1  
 random info line1
 random info line2
 random info line3
 random info line4
 random info line5
 my earnings are 6000
 my expenses are 1000
Done reading title1
Reading title2
 random info line6
 random info line7
 random info line8
 random info line9
 random info line10
 random info line11
 random info line12
 random info line13
 random info line14
 my earnings are 11000
 my expenses are 9000
Done reading title2

输出:

Total earnings: 17000
Total expenses: 10000
于 2011-12-18T22:11:56.740 回答
0

使用“范围”运算符,您可以执行以下操作:

#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
my $begin_stanza = qr/^Reading/i;
my $endof_stanza = qr/^Done reading/i;
my ( $title, @lines );
my ( $value, $total_earnings, $total_expenses );
while (<DATA>) {
    chomp;
    if ( m{$begin_stanza} .. m{$endof_stanza} ) {
        if ( m{$begin_stanza\s+(.+)} ) {
            $title = $1;
            @lines = ();
            next;
        }
        if ( m{$endof_stanza} ) {
            ($value) = ( $lines[0] =~ m{(\d+)} );
            $total_earnings += $value;
            ($value) = ( $lines[1] =~ m{(\d+)} );
            $total_expenses += $value;
            print join "\n", $title, @lines, "\n";
            next;
        }
        shift @lines if @lines == 2;
        push  @lines, $_;
    }
}
printf "Total Earnings = %7d\n", $total_earnings;
printf "Total Expenses = %7d\n", $total_expenses;
__DATA__
Reading title1
 random info line1
 random info line2
 random info line3
 random info line4
 random info line5
 my earnings are 6000
 my expenses are 1000
Done reading title1
Reading title2
 random info line6
 random info line7
 random info line8
 random info line9
 random info line10
 random info line11
 random info line12
 random info line13
 random info line14
 my earnings are 11000
 my expenses are 9000
Done reading title2

...产生:

title1
 my earnings are 6000
 my expenses are 1000

title2
 my earnings are 11000
 my expenses are 9000

Total Earnings =   17000
Total Expenses =   10000
于 2011-12-17T15:36:56.350 回答
0

这是将数据转换为可用形式的第一步。

use warnings;
use strict;
use autodie;

my $input_filename = 'example';
open my $input, '<', $input_filename;
my %data;
{
  my $current_title;

  while(<$input>){
    chomp;
    if( /^Reading (.*?)\s*$/ ){ # start of section
      $current_title = $1;
    }elsif( not defined $current_title ){ # outside of any section
      # invalid data
    }elsif( /^Done reading (.*)/ ){ # end of section
      die if $1 ne $current_title;
      $current_title = undef;
    }else{ # add an element of section to array
      push @{ $data{$current_title} }, $_;
    }
  }
}
close $input;

使用创建的数据结构来确定总收入和费用。

my( $earnings, $expenses );
for my $list( values %data ){
  for( @$list ){
    if( /earnings are (\d+)/ ){
      $earnings += $1;
    }elsif( /expenses are (\d+)/ ){
      $expenses += $1;
    }
  }
}

print "earnings $earnings\n";
print "expenses $expenses\n";

而是以对计算机更有用的形式打印出来。

use YAML 'Dump';
print Dump \%data;
---
标题1:
  - '随机信息行 1'
  - '随机信息行 2'
  - '随机信息行 3'
  - '随机信息 line4'
  - '随机信息行 5'
  - '我的收入是 6000'
  - '我的费用是 1000'
标题2:
  - '随机信息第 6 行'
  - '随机信息 line7'
  - '随机信息 line8'
  - '随机信息 line9'
  - '随机信息 line10'
  - '随机信息 line11'
  - '随机信息第 12 行'
  - '随机信息第 13 行'
  - '随机信息第 14 行'
  - '我的收入是 11000'
  - '我的开支是 9000'
于 2011-12-17T02:32:46.710 回答