perl - 如何在 Perl 中使用变量进行替换？

Question

我有几个文本文件，它们曾经是数据库中的表，现在已被反汇编。我正在尝试重新组装它们，一旦我将它们变成可用的形式，这将很容易。第一个文件“keys.text”只是一个标签列表，格式不一致。喜欢：

Sa 1 #
Sa 2
U 328 #*

它总是字母、[空格]、数字、[空格]，有时还有符号。与这些键匹配的文本文件是相同的，然后是一行文本，也由空格分隔或定界。

Sa 1 # Random line of text follows.
Sa 2 This text is just as random.
U 328 #* Continuing text...

我在下面的代码中尝试做的是将“keys.text”中的键与 .txt 文件中的相同键匹配，并在键和文本之间放置一个制表符。我确定我忽略了一些非常基本的东西，但是我得到的结果看起来与源 .txt 文件相同。

提前感谢任何线索或帮助！

#!/usr/bin/perl

use strict;
use warnings;
use diagnostics;
open(IN1, "keys.text");

my $key;

# Read each line one at a time
while ($key = <IN1>) {

# For each txt file in the current directory
foreach my $file (<*.txt>) {
  open(IN, $file) or die("Cannot open TXT file for reading: $!");
  open(OUT, ">temp.txt") or die("Cannot open output file: $!");

  # Add temp modified file into directory 
  my $newFilename = "modified\/keyed_" . $file;
  my $line;

  # Read each line one at a time
  while ($line = <IN>) {

     $line =~ s/"\$key"/"\$key" . "\/t"/;
     print(OUT "$line");

  }
  rename("temp.txt", "$newFilename");
 }   
}

编辑：为了澄清，结果也应该保留键中的符号，如果有的话。所以他们看起来像：

Sa 1 #      Random line of text follows.
Sa 2        This text is just as random.
U 328 #*    Continuing text...

score 1 · Accepted Answer

The regex seems quoted rather oddly to me. Wouldn't

$line =~ s/$key/$key\t/;

work better?

Also, IIRC, <IN1> will leave the newline on the end of your $key. chomp $key to get rid of that.

And don't put parentheses around your print args, esp when you're writing to a file handle. It looks wrong, whether it is or not, and distracts people from the real problems.

score 0 · Accepted Answer

使用split而不是s///使问题变得简单。在下面的代码中，read_keys从中提取键keys.text并将它们记录在哈希中。

然后对于所有在命令行上命名的文件，在特殊的 Perl 数组中可用@ARGV，我们检查每一行以查看它是否以键开头。如果不是，我们不理会它，否则在键和文本之间插入一个 TAB。

-i请注意，由于 Perl 的方便选项，我们就地编辑文件：

-i[扩展名]

指定由<>构造处理的文件将被就地编辑。它通过重命名输入文件、按原始名称打开输出文件并选择该输出文件作为print语句的默认文件来完成此操作。扩展名（如果提供）用于修改旧文件的名称以制作备份副本……</p>

该行将split " ", $_, 3当前行精确地分成三个字段。这对于保护行的文本部分中可能存在的空格是必要的。

#! /usr/bin/perl -i.bak

use warnings;
use strict;

sub usage { "Usage: $0 text-file\n" }

sub read_keys {
  my $path = "keys.text";
  open my $fh, "<", $path
    or die "$0: open $path: $!";

  my %key;
  while (<$fh>) {
    my($text,$num) = split;
    ++$key{$text}{$num} if defined $text && defined $num;
  }

  wantarray ? %key : \%key;
}

die usage unless @ARGV;
my %key = read_keys;

while (<>) {
  my($text,$num,$line) = split " ", $_, 3;
  $_ = "$text $num\t$line" if defined $text &&
                              defined $num &&
                              $key{$text}{$num};
  print;
}

样品运行：

$ ./add-tab 输入

$ diff -u input.bak 输入
--- input.bak 2010-07-20 20:47:38.688916978 -0500
+++ 输入 2010-07-20 21:00:21.119531937 -0500
@@ -1,3 +1,3 @@
-Sa 1 # 随机文本行如下。
-Sa 2 这个文本是随机的。
-U 328 #* 继续文本...
+Sa 1 # 随机文本行跟随。
+Sa 2 这个文本是随机的。
+U 328 #* 继续文字...

score 0 · Accepted Answer

这看起来像是mapPerl 中函数的完美位置！将整个文本文件读入一个数组，然后在整个数组中应用 map 函数。您可能想要做的唯一另一件事是使用该quotemeta函数转义键中任何可能的正则表达式。

使用map效率很高。我还将密钥读入一个数组，以便不必在循环中不断打开和关闭密钥文件。这是一个 O^2 算法，但如果你的密钥不是那么大，它应该不会太糟糕。

#! /usr/bin/env perl

use strict;
use vars;
use warnings;

open (KEYS, "keys.text")
    or die "Cannot open 'keys.text' for reading\n";
my @keys = <KEYS>;
close (KEYS);

foreach my $file (glob("*.txt")) {
    open (TEXT, "$file")
        or die "Cannot open '$file' for reading\n";
    my @textArray = <TEXT>;
    close (TEXT);

    foreach my $line (@keys) {
        chomp $line;
        map($_ =~ s/^$line/$line\t/, @textArray);
    }
    open (NEW_TEXT, ">$file.new") or
        die qq(Can't open file "$file" for writing\n);

    print TEXT join("\n", @textArray) . "\n";
close (TEXT);
}

score 0 · Accepted Answer

有趣的答案：

$line =~ s/(?<=$key)/\t/;

XXXX(?<=XXXX)的零宽度正向回溯在哪里。这意味着它在 XXXX之后匹配，而不是被替换的匹配的一部分。

和：

$line =~ s/$key/$key . "\t"/e;

最后的/e标志意味着在填写它之前做eval下半部分的事情之一。s///

重要提示：我不推荐其中任何一个，它们会混淆程序。但它们很有趣。:-)

score 0 · Accepted Answer

如果 Perl 不是必须的，你可以使用这个 awk one liner

$ cat keys.txt
Sa 1 #
Sa 2
U 328 #*

$ cat mytext.txt
Sa 1 # Random line of text follows.
Sa 2 This text is just as random.
U 328 #* Continuing text...

$ awk 'FNR==NR{ k[$1 SEP $2];next }($1 SEP $2 in k) {$2=$2"\t"}1 ' keys.txt mytext.txt
Sa 1     # Random line of text follows.
Sa 2     This text is just as random.
U 328    #* Continuing text...

score 0 · Accepted Answer

对每个文件进行两次单独的 slurp 怎么样。对于第一个文件，您打开密钥并创建初步哈希。对于第二个文件，您需要做的就是将文本添加到哈希中。

use strict;
use warnings;

my $keys_file = "path to keys.txt";
my $content_file = "path to content.txt";
my $output_file = "path to output.txt";

my %hash = ();

my $keys_regex = '^([a-zA-Z]+)\s*\(d+)\s*([^\da-zA-Z\s]+)';

open my $fh, '<', $keys_file or die "could not open $key_file";
while(<$fh>){
    my $line = $_;
    if ($line =~ /$keys_regex/){
        my $key = $1;
        my $number = $2;
        my $symbol = $3;
        $hash{$key}{'number'} = $number;
        $hash{$key}{'symbol'} = $symbol;
    }
}
close $fh;

open my $fh, '<', $content_file or die "could not open $content_file";
while(<$fh>){
    my $line = $_;
    if ($line =~ /^([a-zA-Z]+)/){
        my $key = $1;
// strip content_file line from keys/number/symbols to leave text
        line =~ s/^$key//;
        line =~ s/\s*$hash{$key}{'number'}//;
        line =~ s/\s*$hash{$key}{'symbol'}//;
        $line =~ s/^\s+//g;
        $hash{$key}{'text'} = $line;
    }
}
close $fh;

open my $fh, '>', $output_file or die "could not open $output_file";
for my $key (keys %hash){
    print $fh $key . " " . $hash{$key}{'number'} . " " . $hash{$key}{'symbol'} . "\t" . $hash{$key}{'text'} . "\n";
}
close $fh;

我还没有机会对其进行测试，并且所有正则表达式的解决方案似乎有点笨拙，但可能会让您了解可以尝试的其他方法。

perl - 如何在 Perl 中使用变量进行替换？

6 回答 6

-i[扩展名]

Related

Reference