我制作了一个包含另一个工具(overlapFeatures)的 perl 脚本,这样我就可以即时正确地转换我的文件格式。我正在处理的文件都是制表符分隔的表格,通常有 200 万行左右。就其本身而言,overlapFeatures 可以轻松处理这些问题。
但是,我认为我正在通过一次管道如此多的线路导致管道锁定。我知道我需要以某种方式线程化,以便我可以同时读取和写入子进程。但是我真的不明白如何在 perl(或任何其他程序)中正确使用线程。据我了解,我可以使用threads
甚至IPC::run
解决我的问题。
我最终陷入僵局的原始脚本是这样的:
use strict;
use warnings;
use IPC::Open2;
my $infile = shift;
my $featurefile = shift;
my $command = 'overlapFeatures';
my @args = (qw (-a stdin -b), $featurefile);
my ($input, $output);
my $pid = open2($output, $input, $command, @args)
or die "Failed with error $!\n";
open (my $infh, '<', $infile) or die "Can't open $infile\n";
while (<$infh>){
# Do some format conversion...
chomp
my @cols = split /\t/;
# print a modified line to the tool
print $input join ("\t", @cols[0,2,3,1,5,4]),"\n";
}
close ($input);
while (<$output>){
# format conversion for ouput
chomp;
my @cols = split /\t/;
print join (",",@cols[0,1,2,5,3,8]),"\n";
}
close ($output);
我尝试根据如何使用 IPC::Open2 过滤大量数据来重写脚本以使用线程?像这样:
use strict;
use warnings;
use IPC::Open2;
use threads;
my $infile = shift;
my $featurefile = shift;
my $command = 'overlapFeatures';
my @args = (qw (-a stdin -b), $featurefile);
my ($input, $output);
my $pid = open2($output, $input, $command, @args)
or die "Failed with error $!\n";
my $thread = async {
print join(",", qw(seqid start end strand read feature name)),"\n";
for(;;) {
my $line = <$output>; # should block here and wait for output?
last if !defined $line; # end of stream reached?
print STDERR "Got line $line\n";
# Do some format conversion...
chomp $line;
my @cols = split /\t/, $line;
# print a modified line to the tool
print join(",",@cols[0,1,2,5,3,8]),"\n";
}
close($output)
};
{
open (my $infh, '<', $infile) or die "Can't open $infile\n";
while (<$infh>){
# format conversion for ouput
chomp;
my @cols = split /\t/;
print $input join ("\t", @cols[0,2,3,1,5,4]),"\n";
}
close ($input);
}
$thread->join();
waitpid ($pid, 0);
但是,脚本仍然以同样的方式卡住,我也卡住了。我也无法弄清楚如何IPC::run
在这种情况下使用。
我究竟做错了什么?我误解了线程吗?
编辑:花更多时间调试脚本(以及 amon 的帮助),我发现我能够从$output
. 但是,脚本永远不会完成,并且在收到所有输出后似乎挂起。我想这是我现在唯一的问题。