string - 连接字符串

Question

我的算法设计有一个大问题，因为我使用大文本文件。我有一个包含单词序列的文本文件。例如

我的朋友
朋友们大家好
世界

第二个文件很大（千兆字节），包含句子。该程序的目标是逐字通过单词（第一个文件）并在第二个文件中查看连接符号“+”

例如“你好我的世界朋友”作为输入“变成”你好+我的+朋友+世界“

请问有什么想法吗？我想在 Perl 中对其进行编程，它对文本很有效

我已经在 Perl 中完成了这个脚本，但它太慢了，因为它多次读取文件..:( 这是 Perl 程序的一个示例，它可以工作，但它太慢了

use strict;
use warnings;
use utf8;
use feature qw(:5.10); 
my ($in, $dico) = @ARGV;
die "Bad infile $in" if !-r $in;
die "Bad dicofile $dico" if !-r $dico;

# load dico
my @dico;
open(FICHIERNOUVELLES, ">resultat7.txt");
open my $DICO, "<", $dico or die "Can't open $dico for reading: $!\n";
# For all lines in the Dico
foreach my $line (<$DICO>) {
chomp($line);
# extract words
 if (my @word = split /\s+/, $line) {

 my $re = q{(^\s*|\s+)(}.(join q(\s+), map quotemeta, @word).q{)(\s+|\s*$)};

push @dico, qr/$re/;
}
}

 open my $IN, "<", $in or die "Can't open $in for reading: $!\n";
 my @word;

foreach my $line (<$IN>) {

 foreach my $dico (@dico) {

  while (my (undef, $sequence) = $line =~ /$dico/) {

  $sequence =~ s/\s+/+/g;
  $line =~ s/$dico/$1$sequence$3/;
  }
 }
print FICHIERNOUVELLES "$line";

 }
close(FICHIERNOUVELLES);

score 2 · Accepted Answer

不多次读取第二个文件的解决方案是先从file1中读取单词集，并存储在数据结构中。

use File::Slurp;
my @lines = read_file($filename1);
my %replacements = map { my $c = $_; $c =~ s/ / + /g; ( $_ => $c ) } @lines; 

open (my $file2, "<", $filename2) or die "$!";
while (<$file2>) {
    chomp;
    foreach my $replacement (keys %replacements) {
        s/$replacement/$replacements{$replacement}/g;
    }
    print $_;
}

string - 连接字符串

1 回答 1

Related

Reference