perl - 使用 perl 检索与相同 ID 匹配的值

Question

这是一个简单的问题，但找不到任何可行的解决方案。我有 2 个文件，第一个文件包含我感兴趣的所有 ID，例如“tomato”、“cucumber”，还有我不感兴趣的 ID，它们在第二个文件中没有任何价值。第二个文件具有以下数据结构

tomato    red

tomato    round

tomato    sweet

cucumber    green

cucumber    bitter

cucumber    watery

我需要得到一个文件，其中包含具有第二个文件中所有匹配值的所有 ID，所有内容都以制表符分隔，如下所示：

tomato    red    round    sweet

cucumber    green    bitter    watery

到目前为止，我所做的是从第一个文件中的 ID 创建一个散列：

 while (<FILE>) {  
     chomp;  
     @records = split "\t", $_; 
     {%hash = map { $records[0] => 1 } @records};
 }

这是第二个文件：

  while (<FILE2>) {
      chomp;
      @records2 = split "\t", $_; 
      $key, $value = $records2[0], $records2[1];
      $data{$key} = join("\t", $value);
  }

 close FILE;

 foreach my $key ( keys %data )
 {
     print OUT "$key\t$data{$key}\n"
     if exists $hash{$key} 
 }

将不胜感激一些简单的解决方案来组合所有匹配相同 ID 的值！:)

score 1 · Accepted Answer

对于第一个文件：

while (<FILE>) {  
    chomp;  
    @records = split "\t", $_; 
    $hash{$records[0]} = 1;
}

第二个：

while (<FILE2>) {
    chomp;
    @records2 = split "\t", $_;
    ($key,$value) = @records2;
    $data{$key} = [] unless exists $data{$key};
    push @{$data{$key}}, $value;
}
close FILE;

foreach my $key ( keys %data ) {
    print OUT $key."\t".join("\t", @{$data{$key}})."\n" if exists $hash{$key};
}

score 0 · Accepted Answer

这似乎做了所需要的

use strict;
use warnings;

my %data;

open my $fh, '<', 'file1.txt' or die $!;
while (<$fh>) {
  $data{$1} = {} if /([^\t]+)/;
}

open $fh, '<', 'file2.txt' or die $!;
while (<$fh>) {
  $data{$1}{$2}++ if /^(.+?)\t(.+?)$/ and exists $data{$1};
}

while ( my ($key, $values) = each %data) {
  print join("\t", $key, keys %$values), "\n";
}

输出

tomato  sweet round red
cucumber  green watery  bitter

score -1 · Accepted Answer

如果您先阅读数据映射，则会更容易。

此外，如果您使用 Perl，您应该从一开始就考虑利用其主要优势之一 - CPAN 库。例如，文件的读入就像read_file()from一样简单File::Slurp；而不必自己打开/关闭文件，然后运行 while(<>) 循环。

use File::Slurp;
my %data;

my @data_lines = File::Slurp::read_file($filename2);
chomp(@data_lines);
foreach my $line (@data_lines) { # Improved version from CyberDem0n's answer
    my ($key, $value) = split("\t", $line);
    $data{$key} ||= []; # Make sure it's an array reference if first time
    push @{ $data{$key} }, $value;
}

my @id_lines = File::Slurp::read_file($filename1);
chomp(@id_lines);
foreach my $id (@id_lines) {
    print join("\t", ( $id, @{ $data{$id} } ) )."\n";
}

一个稍微更hacky但更短的代码从一开始就将ID添加到数据哈希中的值列表中：

my @data_lines = File::Slurp::read_file($filename2);
chomp(@data_lines);
foreach my $line (@data_lines) { # Improved version from CyberDem0n's answer
    my ($key, $value) = split("\t", $line);
    $data{$key} ||= [ $id ]; # Add the ID for printing
    push @{ $data{$key} }, $value;
}

my @id_lines = File::Slurp::read_file($filename1);
chomp(@id_lines);
foreach my $id (@id_lines) {
    print join("\t", @{ $data{$id} } ) ."\n"; # ID already in %data!
}

perl - 使用 perl 检索与相同 ID 匹配的值

3 回答 3

Related

Reference