perl - 改进我的 Perl 算法以合并 postscript show 命令

Question

Matlab R2007b 的后记输出是有问题的。我发现文本字符串在 postscript 输出 (simprintdiag)中被拆分为许多“ moveto ”和“show”命令。这会在排版为 PDF 时导致问题，因为有时可以将额外的空白插入标签中（因此您无法双击它们，并且在搜索中找不到！）。

为了避免这个问题，我编写了一个 Perl 脚本来将这些拆分的“显示”命令重新组合在一起，但是，它有一些问题，我需要一些帮助。

像“(0) s”这样的显示命令没有正确重复，并出现在下一个块中。
输入 postscript 文件始终由脚本修改，即使不需要更改。
一开始有一个技巧可以绕过连续的显示命令。
它不是很快，并且鉴于某些项目有超过 2000 个 postscript 文件，欢迎任何速度改进。

下面我的代码中的 DATA 有四个在 mt 和 s 命令中拆分文本字符串的示例。我已经在最后包含了最终输出应该是什么样的。该脚本使用了这样一个事实，即我们的文本是从左到右书写的，或者在后记中，带有移动的 X 线和固定的 Y 线。因此，可以得出结论，具有相同 Y 线的连续 mt 命令是相同的文本字符串。

感激地收到任何帮助。

谢谢：）

我的 Perl 脚本：

use strict;
use warnings;

my $debug=1;

#
## Slurp the input file into a variable
my $ps_in;
while(<DATA>) {
   $ps_in .= $_;     # Take a copy of input file
}


#
## HACK
## The main PS fix algorithm only works with show commands on a single
## line!  Fix the input contents now by joining all show commands that 
## occur over multiple lines.  Examples of this are:
##  272   63 mt 
## (main is an externally linked function of the ACC feature ru\
## nning every ) s
##  991   63 mt
## (100) s
my $buf;
my $no_show_split;
open(my $fh_ps, "<", \$ps_in );
while(<$fh_ps>) {
   if( /^(.*)\\$/ ) {   # Match on all lines ending with backslash \
      $buf .= $1;
   }
   else {
      if( $buf ) {
         $no_show_split .= $buf;
         undef($buf);
      }
      $no_show_split .= $_;
   }
}
close $fh_ps;

#
## Reopen our ps input, now the show splits have been removed
open($fh_ps,"<",\$no_show_split );

my $moveto_line = qr/^\s*\d+\s+(\d+)\s+(mt|moveto)/;  # Example '2831  738 mt'
my $show_line   = qr/^\((.+)\)\s+(s|show)/;           # Example '(chris) s'
my $ycrd;      # Y-axis cords
my $pstxt;     # Text to display
my $mtl;       # Moveto line
my $print_text;
my $fixes=0;
my $ps_condensed;

while(<$fh_ps>) {

    if( $print_text ) {
        $ps_condensed .= "$mtl\n";
        $ps_condensed .= "($pstxt) s\n";
        print "($pstxt) s\n====================\n" if $debug;
        undef($ycrd);
        undef($pstxt);
        $print_text=0;
        ++$fixes;
    }

    if( /$moveto_line/ ) {
        chomp;

        if( !$ycrd ) {
            $mtl=$_;       # Store this line for print later
            $ycrd=$1;      # Match on y-axis value
            redo;          # Redo this iteration so we can read the show line in
        }
        elsif( $1 == $ycrd ) {
            <$fh_ps> =~ /$show_line/;  # Read in the show line
            $pstxt .= $1;              # Built up string we want
            print " $mtl -->$1<--\n" if $debug;
        }
        else {
            $print_text=1; # Dropped out matching on y-cord so force a print
            redo;          # Need to redo this line again
        }
    }
    else {
        if( $pstxt ) {     # Print if we have something in buffer
            $print_text=1;
            redo;
        }
        $ps_condensed .= $_;
    }

} # End While Loop
close $fh_ps;

print $ps_condensed;


__DATA__
%%IncludeResource: font Helvetica
/Helvetica /WindowsLatin1Encoding 60 FMSR

11214 11653 mt 
(0) s
4.5 w
156 0 2204 19229 2 MP stroke
156 0 2204 19084 2 MP stroke

%%IncludeResource: font Helvetica
/Helvetica /WindowsLatin1Encoding 120 FMSR

8913 14971 mt 
(Function) s
9405 14971 mt 
(-) s
9441 14971 mt 
(Call) s
9009 15127 mt 
(Generator) s
6 w


%%IncludeResource: font Helvetica
/Helvetica /WindowsLatin1Encoding 120 FMSR

4962 4747 mt 
(trigger) s
5322 4747 mt 
(_) s
5394 4747 mt 
(scheduler) s
5934 4747 mt 
(_) s
6006 4747 mt 
(100) s
6222 4747 mt 
(ms) s
6378 4747 mt 
(_) s
6450 4747 mt 
(task) s
6654 4747 mt 
(_) s
6726 4747 mt 
(06) s
6 w
gr

24 10 10 24 0 4 -10 24 -24 10 5806 11736 14 MP stroke
%%IncludeResource: font Helvetica
/Helvetica /WindowsLatin1Encoding 120 FMSR

5454 11947 mt 
(Chris_\
did_this_example_) s
5874 11947 mt 
(to_test) s
5946 11947 mt 
(_out) s
6 w

最终的“浓缩”后记应该是什么样子：

%%IncludeResource: font Helvetica
/Helvetica /WindowsLatin1Encoding 60 FMSR

11214 11653 mt 
(0) s
4.5 w
156 0 2204 19229 2 MP stroke
156 0 2204 19084 2 MP stroke

%%IncludeResource: font Helvetica
/Helvetica /WindowsLatin1Encoding 120 FMSR

8913 14971 mt 
(Function-Call) s
9009 15127 mt 
(Generator) s
6 w


%%IncludeResource: font Helvetica
/Helvetica /WindowsLatin1Encoding 120 FMSR

4962 4747 mt 
(trigger_scheduler_100ms_task_06) s
6 w
gr

24 10 10 24 0 4 -10 24 -24 10 5806 11736 14 MP stroke
%%IncludeResource: font Helvetica
/Helvetica /WindowsLatin1Encoding 120 FMSR

5454 11947 mt 
(Chris_did_this_example_to_test_out) s
6 w

score 2 · Accepted Answer

我认为以下内容对您有用。

笔记：

用成语啜饮所有数据：do { local $/; <DATA> };
使用单个正则表达式修复行尾的反斜杠

use strict;
use warnings;

my $data = do { local $/; <DATA> };
$data =~ s,\\\n,,g;

my $out = "";
my $s = "";    
my $y;

for my $line (split("\n", $data)) {
  if (defined($y) && $line =~ m/^\((.*)\)\s+s\s*$/) {
    $s .= $1;
    next;
  } elsif ($line =~ m/^(\d+)\s+(\d+)\s+mt\s*$/) {
    if (defined($y) && $y == $2) {
      next;
    } else {
      $y = $2;
    }
  } else {
    $y = undef;
  }
  if (length($s)) {
    $out .= "($s) s\n";
    $s = "";
  }
  $out .= "$line\n";
}

print $out;

score 1 · Accepted Answer

我没有看到一个通用的方法。但一系列特殊情况似乎有效。这里的弱点是添加越来越多的特殊情况并不是一个可以很好扩展的模型。但是，如果这是完整的问题列表，那么这应该可行。

#!/usr/bin/perl -Tw

use strict;
use warnings;

my %regex_for = (
    a => qr{
        \( ( \w+ ) \)     \s s  \s+  # (Function) s
        \d+ \s+ \d+       \s mt \s+  # 9405 14971 mt
        \( ( [-_]|ms ) \) \s s  \s+  # (-) s
        \d+ \s+ \d+       \s mt \s+  # 9441 14971 mt
        \( ( \w+ ) \)     \s s  \s+  # (Call) s
    }xmsi,
    b => qr{
        \( ( \w+ ) \\ \s* ( \w+ ) \)  # (Chris_\
    }xms,    #  did_this_example_)
    c => qr{
        \( ( \w+ _ ) \) \s s  \s+  # (Chris_did_this_example_) s
        \d+ \s+ \d+     \s mt \s+  # 5874 11947 mt
        \( ( \w+ ) \)   \s s  \s+  # (to_test) s
    }xms,
    d => qr{
        \( ( \w+ ) \)   \s s  \s+  # (to_test) s
        \d+ \s+ \d+     \s mt \s+  # 5946 11947 mt
        \( ( _ \w+ ) \) \s s  \s+  # (_out) s
    }xms,
);

my $ps = do { local $/; <DATA> };

REGSUB:
{
    my $a = $ps =~ s{ $regex_for{a} }{($1$2$3) s\n}xmsg;
    my $b = $ps =~ s{ $regex_for{b} }{($1$2)}xmsg;
    my $c = $ps =~ s{ $regex_for{c} }{($1$2) s\n}xmsg;
    my $d = $ps =~ s{ $regex_for{d} }{($1$2) s\n}xmsg;

    redo REGSUB
        if $a || $b || $c || $d;
}

print $ps;

__DATA__
%%IncludeResource: font Helvetica
/Helvetica /WindowsLatin1Encoding 60 FMSR

11214 11653 mt
(0) s
4.5 w
156 0 2204 19229 2 MP stroke
156 0 2204 19084 2 MP stroke

%%IncludeResource: font Helvetica
/Helvetica /WindowsLatin1Encoding 120 FMSR

8913 14971 mt
(Function) s
9405 14971 mt
(-) s
9441 14971 mt
(Call) s
9009 15127 mt
(Generator) s
6 w


%%IncludeResource: font Helvetica
/Helvetica /WindowsLatin1Encoding 120 FMSR

4962 4747 mt
(trigger) s
5322 4747 mt
(_) s
5394 4747 mt
(scheduler) s
5934 4747 mt
(_) s
6006 4747 mt
(100) s
6222 4747 mt
(ms) s
6378 4747 mt
(_) s
6450 4747 mt
(task) s
6654 4747 mt
(_) s
6726 4747 mt
(06) s
6 w
gr

24 10 10 24 0 4 -10 24 -24 10 5806 11736 14 MP stroke
%%IncludeResource: font Helvetica
/Helvetica /WindowsLatin1Encoding 120 FMSR

5454 11947 mt
(Chris_\
did_this_example_) s
5874 11947 mt
(to_test) s
5946 11947 mt
(_out) s
6 w

perl - 改进我的 Perl 算法以合并 postscript show 命令

2 回答 2

Related

Reference