perl - 如何在 Perl 中匹配两个文档之间的字符串顺序？

Question

我在制作一个 PERL 程序来匹配两个文档中的单词时遇到了问题。假设有文件 A 和 B。

所以我想删除文档 A 中不在文档 B 中的单词。

示例 1：

A：我吃披萨

B：她去市场吃披萨

结果：吃披萨

例 2 : A: 吃披萨

B：比萨吃

result:pizza（词序相关，所以删掉“吃”。）

我在系统中使用 Perl，每个文档中的句子不是很大，所以我想我不会使用 SQL

该程序是印度尼西亚语（Bahasa）自动论文评分的子程序

谢谢，对不起，如果我的问题有点混乱。我对“这个世界”真的很陌生：）

score 1 · Accepted Answer

好的，我目前无法访问，因此不能保证 100% 甚至可以编译，但应该提供足够的指导：

解决方案1：（词序无关）

#!/usr/bin/perl -w

use strict;
use File::Slurp;

my @B_lines = File::Slurp::read_file("B") || die "Error reading B: $!";
my %B_words = ();
foreach my $line (@B_lines) {
    map { $B_words{$_} = 1 } split(/\s+/, $line);
}
my @A_lines = File::Slurp::read_file("A") || die "Error reading A: $!";
my @new_lines = ();
foreach my $line (@A_lines) {
    my @B_words_only = grep { $B_words{$_} } split(/\s+/, $line);
    push @new_lines, join(" ", @B_words_only) . "\n";
}
File::Slurp::write_file("A_new", @new_lines) || die "Error writing A_new: $!";

这应该会创建一个新文件“A_new”，其中仅包含 A 在 B 中的单词。

这有一个小错误 - 它会用一个空格替换文件 A 中的任何多个空格，所以

    word1        word2              word3

会变成

word1 word2 word3

它可以修复，但这样做真的很烦人，所以我没有打扰，除非你绝对要求正确保留空白 100%

解决方案 2：（字序很重要，但您可以从文件 A 中打印单词，而完全不考虑保留空格）

#!/usr/bin/perl -w

use strict;
use File::Slurp;

my @A_words = split(/\s+/gs, File::Slurp::read_file("A") || die "Error reading A:$!");
my @B_words = split(/\s+/gs, File::Slurp::read_file("B") || die "Error reading B:$!");
my $B_counter = 0;
for (my $A_counter = 0; $A_counter < scalar(@A_words); ++$A_counter) {
    while ($B_counter < scalar(@B_words)
        && $B_words[$B_counter] ne $A_words[$A_counter]) {++$B_counter;}
    last if $B_counter == scalar(@B_words);
    print "$A_words[$A_counter]";
}

解决方案 3（为什么我们又需要 Perl？:)）

您可以在没有 Perl 的情况下在 shell 中轻松完成此操作（或通过 system() 调用或父 Perl 脚本中的反引号）

comm -12 A B | tr "\012" " "

要从 Perl 调用它：

my $new_text = `comm -12 A B | tr "\012" " " `;

但是请参阅我的最后一条评论，为什么这可能被认为是“糟糕的 Perl”……至少如果您在循环中执行此操作，并且迭代了很多文件并关心性能。

perl - 如何在 Perl 中匹配两个文档之间的字符串顺序？

1 回答 1

Related

Reference