4

我有两个文件,一个带有文本,另一个带有键/哈希值。我想用哈希值替换键的出现。以下代码执行此操作,我想知道是否有比我正在使用的 foreach 循环更好的方法。

谢谢大家

编辑:我知道使用它有点奇怪

s/\n//;
s/\r//;

而不是 chomp,但这适用于具有混合行尾字符的文件(在 Windows 和 linux 上编辑)而 chomp(我认为)不适用。

带有键/哈希值的文件 (hash.tsv):

strict  $tr|ct
warnings    w@rn|ng5
here    h3r3

带有文本的文件(doc.txt):

Do you like use warnings and strict?
I do not like use warnings and strict.
Do you like them here or there?
I do not like them here or there?
I do not like them anywhere.
I do not like use warnings and strict.
I will not obey your good coding practice edict. 

perl 脚本:

#!/usr/bin/perl

use strict;
use warnings;
open (fh_hash, "<", "hash.tsv") or die "could not open file $!";
my %hash =();
while (<fh_hash>)
{
    s/\n//;
    s/\r//;
    my @tmp_hash = split(/\t/);
    $hash{ @tmp_hash[0] } = @tmp_hash[1];
}
close (fh_hash);
open (fh_in, "<", "doc.txt") or die "could not open file $!";
open (fh_out, ">", "doc.out") or die "could not open file $!";
while (<fh_in>)
{
    foreach my $key ( keys %hash )
    {
        s/$key/$hash{$key}/g;
    }
    print fh_out;
}
close (fh_in);
close (fh_out);
4

2 回答 2

2

您可以将整个文件读入一个变量,并为每个 key-val 一次替换所有出现。

就像是:

use strict;
use warnings;

use YAML;
use File::Slurp;
my $href = YAML::LoadFile("hash.yaml");
my $text = read_file("text.txt");

foreach (keys %$href) {
    $text =~ s/$_/$href->{$_}/g;
}
open (my $fh_out, ">", "doc.out") or die "could not open file $!";
print $fh_out $text;
close $fh_out;

产生:

Do you like use w@rn|ng5 and $tr|ct?
I do not like use w@rn|ng5 and $tr|ct.
Do you like them h3r3 or th3r3?
I do not like them h3r3 or th3r3?
I do not like them anywh3r3.
I do not like use w@rn|ng5 and $tr|ct.
I will not obey your good coding practice edict. 

为了缩短代码,我使用了 YAML 并将您的输入文件替换为:

strict: $tr|ct
warnings: w@rn|ng5
here: h3r3

并使用 File::Slurp 将整个文件读入变量。当然,你可以在没有 File::Slurp 的情况下“slurp”文件,例如:

my $text;
{
    local($/); #or undef $/;
    open(my $fh, "<", $file ) or die "problem $!\n";
    $text = <$fh>;
    close $fh;
}
于 2012-07-19T19:17:43.613 回答
2

一个问题

for my $key (keys %hash) {
    s/$key/$hash{$key}/g;
}

是不是没有正确处理

foo => bar
bar => foo

而不是交换,你最终得到所有“foo”或所有“bar”,你甚至无法控制哪个。

# Do once, not once per line
my $pat = join '|', map quotemeta, keys %hash;

s/($pat)/$hash{$1}/g;

您可能还想处理

foo  => bar
food => baz

通过采取最长的而不是可能以“吟游诗人”结尾。

# Do once, not once per line
my $pat =
   join '|',
    map quotemeta,
     sort { length($b) <=> length($a) }
      keys %hash;

s/($pat)/$hash{$1}/g;
于 2012-07-19T21:06:58.650 回答