perl - 创建和复制各个子例程以将文件与具有 n 个子例程定义的文件分开

Question

我有一个文件，其中包含 n 个子例程定义，如下所示（每个例程都以sub关键字开头和结尾bar）：

sub foo()
contents
bar

sub good ()
contents
bar

sub right ()
contents
bar

我想创建三个名为的文件foo.s，good.s并将right.s各自的子例程写入相应的文件中。我尝试了以下脚本，但由于我对 Perl 脚本完全陌生，因此对全局/局部变量进行了不当管理。我怎样才能做到这一点？

      my $op = 0;

      my $fnamet = $ARGV[0];
      if(not defined $fnamet) {
              die "Top file name not known\n";
      }
      my $srcfile = "source/$fnamet.s";
      print "$srcfile\n";
      open (SRC, $srcfile) or die "Couldn't find source file\n";

      foreach $line (<SRC>) {
              if ($op == 0) {
                      strtwrite($line, $op);
              } elsif ($op == 1) {
                      stopwrite($line, $op);
              }
      }
      if ($op == 1) {
              print "Some sub is missing bar statement or having\n";
              print "additional sub statement";
              close (DST);
      }
      close (SRC);
  }

  sub strtwrite {
      unless ($_[0] =~ /^\s*sub\s*.*/) {
              print "Searching for a sub start\n";
              $op = 0;
      } else {
              print "A sub start found\n";
              print "$_[0]\n";
              my @temp = (split /[\s,();]+/, $_[0]);
              my $strt = '';
              if($temp[1] =~ 'sub') {
                      print "Comparing with sub pass \n";
                      $strt = $temp[2];
              } else {
                      print "Comparing with sub fail \n";
                      $strt = $temp[1];
              }
              my $fnameb = "dstcodes/$strt.s";
             print "\n\n $fnameb \n\n";
              open (DST, '>$fnameb') or die "Couldn't open sub file\n";
              print DST $_[0];
              $op = 1;
      } 
  }

  sub stopwrite {
      unless ($_[0] =~ /^\s*bar\s*.*/) {
              # Copy till the bar is found
              print "Searching for an bar\n";
              print DST $_[0];
              $op = 1;
      } else {
              # Close the current destination file and start waiting 
              # for next SUB start
              print "A matching BAR found\n";  
              print DST $_[0];
              close (DST);
              $op = 0;
      }
  }

score 2 · Accepted Answer

使用输出文件句柄的词法变量相对简单（无论如何最好这样做）。

$dst只要sub在输入中找到一行，该程序就会简单地打开一个输出文件。然后打印当前输入行$dst，一切正常。

一些提示：

始终在程序开始use strict时use warnings
将变量声明为尽可能接近它们的第一个使用点，而不是在程序顶部一起声明
始终使用词法文件句柄和的三参数形式，并在字符串中open包含的值，以便您知道失败的原因$!die
如果你没有在你的die字符串末尾添加换行符，那么 Perl 将显示出现问题的文件名和行号
用于while读取文件，而不是for. 后者会在循环开始之前不必要地将整个文件读入内存，这对于大文件来说会成为问题

use strict;
use warnings;

die "Top file name not known" unless @ARGV;
my $srcfile = "source/$ARGV[0].s";

open my $src, $srcfile or die "Unable to open '$srcfile': $!";
my $dst;
while (<$src>) {
  if (/^\s*sub\s+(\w+)/) {
    my $file = "dstcodes/$1.s";
    open $dst, '>', $file or die "Unable to open '$file' for output: $!";
  }
  print $dst $_ if $dst;
}

close $dst if $dst;

score 1 · Accepted Answer

您可以尝试使用正则表达式while循环，/g

use strict;
use warnings;

my ($fnamet) = @ARGV;
open my $fh, "<", $fnamet or die $!;
my $str = do { local $/; <$fh> };
close $fh or die $!;

while ($str =~ /(sub \s+ (\w+) .+?) bar/xgs) {
  my ($cont, $name) = ($1, $2);

  open my $o, ">", "$name.s" or die $!;
  print $o $cont;
  close $o or die $!;
}

score 0 · Accepted Answer

好吧，首先有一个快速解决您的问题的方法：我们将对$op变量的引用传递给 subs。Perl 引用类似于 C 指针。\我们可以通过运营商获得参考。我们可以取消引用ref like $$opref，即 sigil$兼作取消引用运算符。

if ($op == 0) {
        strtwrite($line, \$op);
} elsif ($op == 1) {
        stopwrite($line, \$op);
}

然后在 subs 中，我们解包参数：

sub stopwrite {
  my ($line, $opref) = @_;

  ...
  $$opref = 1;
}

（有跳过引用并分配给的较短解决方案$_[1]，但这并不完全可读）。

但是你看，我有点反对这种技术，因为所有这些都是远距离动作，而可变状态通常会使简单的事情变得相当复杂。

假设我们有一个子例程extract_sub，它会跳过任何垃圾行，直到找到 sub 声明，将 sub 提取到文件中，并在 sub 被bar标记终止后返回。为此，extract_sub将文件句柄作为参数。所以我们的主要部分看起来像：

use strict; use warnings;
use autodie;  # automatic error messages for `open`

@ARGV or die "Usage: $0 input-file\n";
my $filename = shift @ARGV;

# using “lexical filehandes” and an explicit open mode “&lt;”:
open my $source, "<", $filename;

extract_subs($source) until eof $source;

# $source is closed automatically

现在，发生了extract_subs什么？首先，我们将参数解包成变量，这样更容易阅读：

sub extract_subs {
  my ($input) = @_;

接下来，我们开始丢弃行，直到我们看到sub：

  my $output;
  while (my $declaration = <$input>) {
    if ($declaration =~ /^\s* sub \s+ (\w+)/x) {
      my $name = $1;
      open $output, ">", "dstcodes/$name.s";  # no error handling because autodie
      print { $output } $declaration;  # print the whole declaration to this file
      last;  # leave the loop
    }
  }
  # check that the loop didn't abort because $input was exhausted:
  return unless defined $output;

我们在该循环中打开了文件，以便能够在$declaration那里打印。

现在我们已经打开了输出文件并且在 sub 中，我们将所有行打印到$output直到我们看到终止行：

# implicitly read into $_ default variable
while (<$input>) {
  print { $output } $_;
  return if /^\s*bar\b/;  # exit the whole subroutine, not just the loop
}
# If this code is reached, then $input was exhausted before finding the "bar" terminator
die "A sub was not terminated with a bar statement"

我认为这段代码更优雅，更容易理解。以下是一些你不应该做的事情：

open类似裸字文件句柄的古老形式，或指定不打开模式。始终使用三参数形式：open my $filehandle, "<", $filename or die "Can't open $filename: $!"–or die使用时可以省略部分autodie。
使用全局变量，如裸字文件句柄。它们使代码错误且更难理解。
不使用strictand warnings。所有这些错误消息看起来都很讨厌，但它们通常指向应该修复而不是忽略的实际问题。
使用标志来指定解析器的状态。如果您将代码分解为具有良好返回值的适当子例程，则不需要此类通信通道。请记住，如果需要，您可以从 Perl 子例程返回多个值。
用作=~一般字符串运算符。它用于正则表达式匹配。如果要测试字符串是否相等，请使用eq运算符。
测试特定的真值。如果你只关心一个变量是真还是假，那么直接在条件中使用它：if ($foo) { bar() } else { baz() }. 这比要求特定值（例如if ($foo == 1) { bar() } elsif ($foo = 0) { baz() }.
使用unless ($cond) { A } else { B }构造。这通常很难理解。要么if (not $cond) { A} else { B }更好if ($cond) { B } else { A }。
不阅读“现代 Perl”。一旦您对 Perl 编程感到满意，您应该阅读这本书（也可以在线获得）以了解当前的最佳实践。

perl - 创建和复制各个子例程以将文件与具有 n 个子例程定义的文件分开

3 回答 3

Related

Reference