perl - 如何一次一行地合并多个文件句柄的数组或哈希中的文件中的特定列？

Question

我将首先描述我正在使用的文件：

./groupA
    ./groupA/fileA.txt
    ./groupA/fileB.txt
    ./groupA/fileC.txt
    ./groupA/fileD.txt

./groupB
    ./groupB/fileA.txt
    ./groupB/fileB.txt
    ./groupB/fileC.txt

etc.

这是我想做的事情：

我有每个文件句柄的散列或数组groupI，指向非常大的制表符分隔的文本文件fileJ，每个文件大小为数百 MB。
我想遍历文件句柄，一次读取一个制表符分隔的行。我无法将所有文件的行读入内存。
一旦我完成了文件句柄的循环，然后我想要split每一行，从每个拆分数组（例如第五个字段）中获取特定的数据列，并将数据合并到一行输出中。
重复步骤 2 — 从每个文件句柄中获取一行 — 直到 EOF。

然后我会以groupA/mergedOutput.mtx,groupB/mergedOutput.mtx等结尾。

问题是我不知道如何正确执行步骤 2 和 3。

这是我到目前为止的代码：

#!/usr/bin/perl

use strict;
use warnings;
use File::Glob qw(glob);

my @groups = qw(groupA groupB groupC);
my ($mergedOutputFn, %fileHandles);

foreach my $group (@groups) {
    $mergedOutputFn = "$group/mergedOutput.mtx";

    # Step 1:
    # Make hash table of file handles

    foreach my $inputFn (<"$group/*.txt">) {
        open my $handle, '< $inputFn' or die "could not open $inputFn\n";
        $fileHandles{$inputFn} = $handle;
    }

    # Steps 2 and 3:
    # Grab a line from each file handle
    # Repeat until EOF

    while(1) {
        my @mergedOutputLineElements = ();
        foreach (sort keys %handles) {
            my $handle = $handles{$_};
            my $line = <$handle>;
            chomp($line);
            my @lineElements = split("\t", $line);
            push (@mergedOutputLineElements, $lineElements[4]);
            last if (! defined $line); # jump out of while loop
        }
        print Dumper join("\t", @mergedOutputLineElements);
    }

    # Step 4:
    # Close handles

    foreach (sort keys %handles) {
        close $handles{$_};
    } 
}

一个问题似乎是以下代码不起作用：

foreach (sort keys %handles) {
    my $handle = $handles{$_};
    my $line = <$handle>;
    ...
}

如果我尝试打印出的值$line，则会得到一个GLOB值：

print Dumper $line;
...
GLOB(0x1d769f80)

我如何处理不当$line，或者在 Perl 中是否有更简单的方法来做到这一点？

谢谢你的建议。

编辑

这是固定代码：

#!/usr/bin/perl

use strict;
use warnings;
use File::Glob qw(glob);

my @groups = qw(groupA groupB groupC);
my ($mergedOutputFn, %fileHandles);

foreach my $group (@groups) {
    $mergedOutputFn = "$group/mergedOutput.mtx";
    open MERGE, "> $mergedOutputFn" or die "could not open handle to $mergedOutputFn\n";

    # Step 1:
    # Make hash table of file handles

    foreach my $inputFn (<"$group/*.txt">) {
        open my $handle, '< $inputFn' or die "could not open $inputFn\n";
        $fileHandles{$inputFn} = $handle;
    }

    # Steps 2 and 3:
    # Grab a line from each file handle
    # Repeat until EOF

    LINE: while(1) {
        my @mergedOutputLineElements = ();
        foreach (sort keys %handles) {
            my $handle = $handles{$_};
            my $line = readline $handle;
            last LINE if (! defined $line); # jump out of while loop
            chomp($line);
            my @lineElements = split("\t", $line);
            push (@mergedOutputLineElements, $lineElements[4]);
        }
        print MERGE join("\t", @mergedOutputLineElements);
    }

    # Step 4:
    # Close handles

    foreach (sort keys %handles) {
        close $handles{$_};
    } 

    close MERGE;
}

感谢您的提示！

score 2 · Accepted Answer

您可以像这样从文件句柄中读取：

foreach (sort keys %handles) {
    my $line = readline $handles{$_};
    ...
}

perl - 如何一次一行地合并多个文件句柄的数组或哈希中的文件中的特定列？

1 回答 1

Related

Reference