perl - 比较文件行以匹配第二个文件中的任何位置

Question

这令人沮丧。我有 2 个文本文件，每行只有一个电话号码。我需要从 file1 中读取第一行，然后在 file2 中搜索匹配项。如果不匹配，则将行值写入输出文件。我一直在尝试这个，但我知道它错了。

$file1 = 'pokus1.txt';
$file2 = 'pokus2.txt';

open (F1, $file1) || die ("Could not open $file1!");
open (F2, $file2) || die ("Could not open $file2!");
open (OUTFILE, '>>output\output_x1.txt');
@f1data = <F1>;
@f2data = <F2>;

while (@f1data){
    @grp = grep {/$f1data/} @f2data;

    print OUTFILE "$grp";
}
close (F1);
close (F2);
close (OUTFILE);

我希望有人能帮帮忙？谢谢布伦特

score 2 · Accepted Answer

重击：

不存在

grep -vf 文件 1 文件 2 > 文件 3

共享

grep -f 文件 1 文件 2 > 文件 4

score 1 · Accepted Answer

每当您在另一组类型的问题中得到一个是一组中的一条数据时（并且它们出现了很多，您应该从散列的角度来考虑。

哈希是键控查找。假设您创建了一个哈希键，例如...我不知道...从文件＃1中获取的电话号码。如果您读取文件 #2 中的一行，只需查看哈希值即可轻松查看它是否在文件 #1 中。快速、高效。

use strict;   #ALWAYS ALWAYS ALWAYS
use warnings; #ALWAYS ALWAYS ALWAYS

use autodie;  #Will end the program if files you try to open don't exist

# Constants are a great way of storing data that is ...uh... constant
use constant {
    FILE_1    =>  "a1.txt",
    FILE_2    =>  "a2.txt",
};

my %phone_hash;

open my $phone_num1_fh, "<", FILE_1;

#Let's build our phone number hash
while ( my $phone_num = <$phone_num1_fh> ) {
    chomp $phone_num;
    $phone_hash{ $phone_num } = 1;   #Doesn't really matter, but best not a zero value
}
close $phone_num1_fh;

#Now that we have our phone hash, let's see if it's in file #2
open my $phone_num2_fh, "<", FILE_2;
while ( my $phone_num = <$phone_num2_fh> ) {
    chomp $phone_num;
    if ( exists $phone_hash { $phone_num } ) {
        print "$phone_num is in file #1 and file #2";
    }
    else {
        print "$phone_num is only in file #2";
    }
}

看看效果如何。唯一的问题是文件#1 中可能有电话号码不在文件#2 中。您可以通过简单地为文件 #2 中的所有电话号码创建第二个哈希来解决此问题。

让我们用两个哈希再做一次：

my %phone_hash1;
my %phone_hash2;

open my $phone_num1_fh, "<", FILE_1;

while ( my $phone_num = <$phone_num1_fh> ) {
    chomp $phone_num;
    $phone_hash1{ $phone_num } = 1;
}
close $phone_num1_fh;

open my $phone_num2_fh, "<", FILE_2;

while ( my $phone_num = <$phone_num2_fh> ) {
    chomp $phone_num;
    $phone_hash2{ $phone_num } = 1;
}
close $phone_num1_fh;

现在，我们将使用键来列出键并遍历它们。当手机在两个哈希中时，我将创建一个%in_common哈希

my %in_common;

for my $phone ( keys %phone_hash1 ) {
    if ( $phone_hash2{$phone} ) { 
       $in_common{$phone} = 1;    #Phone numbers in common between the two lists
    }
}

现在，我有三个哈希%phone_hash1、%phone_hash2和%in_common。

for my $phone ( sort keys %phone_hash1 ) {
    if ( not $in_common{$phone} ) {
         print "Phone number $phone is only in the first file\n";
    }
}

for my $phone ( sort keys %phone_hash2 ) {
    if ( not $in_common{$phone} ) {
        print "Phone number $phone is only in " . FILE_2 . "\n";
    }
}

for my $phone ( sort keys %in_common ) {
    print "Phone number $phone is in both files\n";
}

请注意，在此示例中，我没有使用存在来查看密钥是否存在于哈希中。也就是说，我只是简单地if ( $phone_hash2{$phone} )把if ( exists $phone_hash2{$phone} ). 第一种形式检查是否定义了键——即使值是空字符串或数字为零。

只要值不为零、空字符串或未定义，第二种形式就为真。由于我故意将哈希值设置为1，因此我可以使用这种形式。这是一个好习惯，exists因为在某些情况下有效值可能是空字符串或零。但是，有些人喜欢在可能的情况下不使用代码读取exists的方式。

score 1 · Accepted Answer

一种惯用的解决方案，您处理一个文件，将其数据保存为哈希键，然后再处理另一个查看该键是否存在：

#!/usr/bin/env perl

use warnings;
use strict;

my (%phone);

open my $fh1, '<', shift or die;
open my $fh2, '<', shift or die;
##open my $ofh, '>>', shift or die;

while ( <$fh2> ) { 
    chomp;
    $phone{ $_ } = 1;
}

while ( <$fh1> ) { 
    chomp;
    next if exists $phone{ $_ };
    ##printf $ofh qq|%s\n|, $_;
    printf qq|%s\n|, $_;
}

exit 0;

像这样运行它：

perl script.pl file1 file2 > outfile

perl - 比较文件行以匹配第二个文件中的任何位置

3 回答 3

不存在

共享

Related

Reference