regex - perl 哈希数组从文件中读取

Question

我正在尝试读取具有相同格式的多个文件，并希望根据正则表达式进行一些统计。

即我想计算[]内的类似项目

 NC_013618 NC_013633 ([T(nad6 trnE ,cob trnT ,)])
C_013481 NC_013479 ([T(trnP ,rrnS trnF trnV rrnL nad1 trnI ,)])
NC_013485 NC_003159 ([T(trnC ,trnY ,)])
NC_013554 NC_013254 ([T(trnR ,trnN ,)])
NC_013607 NC_013618 ([T(nad6 trnE ,cob trnT ,)])

问题是我没有得到正确的值，下面是我的代码：

 use strict;
 use warnings;

my %data;
@FILES = glob("../mitos-crex/*.out");
foreach my $file (@FILES) {
    local $/ = undef;
    open my $fh, '<', $file;
    $data{$file} = <$fh>;
}

my @t;
my $c = 0;
foreach my $line (keys %data) {
    foreach my $l ($data{$line}) {
         print $l."\n";
        ($t[$c]) = $l =~ m/(\[.*\])/;

        $c++;
    }
}

#the problem is here the counter is not giving the right value

print $c;
my %counts;
$counts{$_}++ for @t;

提前致谢

score 3 · Accepted Answer

首先，总是 use strict和use warnings。此措施对所有编程都至关重要，因为它会迅速揭示简单的问题，否则您可能会忽略或浪费时间进行调试。如果您在程序中寻求其他人的帮助，这尤其正确且简单的礼貌

您似乎在将整个文件吞入单个字符串和一行行之间感到困惑。您编写它的方式，每个元素$data{file}都是一个包含所有文件数据的单个标量值，然后您尝试迭代它只foreach $l ($data{$line}) { ... }执行一次，因此只找到[...]文件中的第一个字符串

通常我会说您不应该以这种方式读取所有文件数据，因为问题可能有更好的流式解决方案，但我不知道您还想将捕获的数据用于什么，所以我的解决方案遵循您自己的设计

我认为您需要将数据放入虚拟数组而不是标量中，然后在循环中对其进行迭代。您必须保留$/定义，以便按行读取文件，并使用[ <$fh> ]. 然后你可以用foreach my $line (@{ $data{$file} }) { ... }

use strict;
use warnings;

my %data;

my @files = glob("../mitos-crex/*.out");

foreach my $file (@files) {
    open my $fh, '<', $file or die $!;
    $data{$file} = [ <$fh> ];
}

my $c = 0;
my @t;
foreach my $file (keys %data) {
    foreach my $line (@{ $data{$file} }) {
        ($t[$c]) = $line =~ /(\[.*\])/;
        $c++;
    }
}

print $c;
my %counts;
$counts{$_}++ for @t;

score 0 · Accepted Answer

计数器给出了正确的值。您的问题是您正在吞食文件（一次全部读取），但只存储找到的第一个值：

($t[$c]) = $data{$line} =~ m/(\[.*\])/;  # only finds first value in file

要么正确循环每个文件，并为每一行使用上面的正则表达式，或者执行以下操作：

push @t, ($data{$line} =~ m/(\[.*\])/g);

您应该始终使用

use strict;
use warnings;

并解决导致的错误/警告。不这样做是个坏主意，而且只是将问题隐藏在代码中——而不是解决它们。

此外，您应该知道此声明：

foreach $l ($data{$line}) {

只迭代一次，因为这里的每一“行”都是一个完整的文件，并且$data{$line}除了一个标量值之外。此外，您将 using$l作为别名进行迭代，但您仍然$data{$line}在循环内部使用，这使得循环完全多余。

regex - perl 哈希数组从文件中读取

2 回答 2

Related

Reference