regex - 编写一个包含 3 个文件的输出文件超集，其中没有重复的行

Question

我有三个文本文件 1.txt、2.txt 和 3.txt，它们是 perl 脚本的输出。所有 3 个文件中都有一些共同的行。请帮助编写一个 perl 脚本，其输出是另一个文本文件，该文件是 1.txt 和 2.txt 和 3.txt 的超集，并且不应在其中重复行。

score 3 · Accepted Answer

最简单的方法是使用散列来跟踪您以前见过的行。但是，对于非常大的文件，这将占用太多内存。

use strict;
use warnings;
use autodie 'open';

open my $out, '>', 'superset.txt';

my %seen;
for my $filename ('1.txt', '2.txt', '3.txt') {
    open my $in, '<', $filename;
    while ( my $line = <$in> ) {
        print $out $line unless $seen{$line}++;
    }
}

score 1 · Accepted Answer

1

.txtPerl 用于文件中唯一行的一行，

perl -ne '$s{$_}++ or print' *txt > out.txt

于 2013-11-12T06:39:00.817 回答

regex - 编写一个包含 3 个文件的输出文件超集，其中没有重复的行

2 回答 2

Related

Reference