我正在尝试运行的多线程 Perl 应用程序遇到问题(在 Redhat 7.4 上使用 Perl 5.10.1)。该问题已通过以下设置重现(设置类似于原始 Perl 应用程序):
有 2 个文件:main.pl,module1.pm。
主要.pl:
- 完成一组任务的顶级脚本,调用为“main.pl <NumberOfTestsToRun> <MaxWorkersToUse>”
# main.pl
#!/apps/perl/5.10.1/bin/perl
use strict;
use warnings;
use v5.10.1;
use threads;
use threads::shared;
use module1;
use lib "<path to Thread::Queue module>";
use Thread::Queue;
my $glNumTests = shift(); # Number of tests(jobs) to run
my $glMaxThreads = shift(); # What is the max allowed number of workers (threads) for queue mode
my $q = Thread::Queue->new(); # A new empty queue
# Worker thread
sub worker
{
# Thread will loop until no more work
while (defined(my $item = $q->dequeue())) {
# Do work on $item
sleep(5); # dummy pre-work
$item->dummy(); # Actual task that needs to be run by each thread
}
}
sub start_threads
{
my $loNumThreads = shift();
$loNumThreads = $loNumThreads > $glMaxThreads ? $glMaxThreads : $loNumThreads;
print "Creating $loNumThreads threads...";
for ( 1..$loNumThreads )
{
threads->create(\&worker);
}
print "done\n";
}
sub initialize
{
my @lotests;
my $loNumTests = shift();
for my $i (1..$loNumTests)
{
push(@lotests,"Test_".$i);
}
return \@lotests;
}
sub launchjobs
{
my @lotests = @{shift()};
my $tests = {};
# Create objects
foreach my $lotest (@lotests)
{
$tests->{$lotest}->{"obj"} = module1->new($lotest);
}
# Start the threads before adding work to the queue
start_threads(scalar(@lotests));
# Adding work to queue
foreach my $lotest (@lotests)
{
$q->enqueue($tests->{$lotest}->{"obj"});
}
# No more work to be added
$q->end();
# Wait for threads to finish
foreach my $thr ( threads -> list() )
{
$thr->join();
}
}
launchjobs(initialize($glNumTests));
模块1.pm:
- 支持使用 IPC::Run 模块运行另一个命令(在本例中为 echo)的模块
# module1.pm
package module1;
use lib "<InstallationPath>/IPC-Run-20200505.0/lib";
use IPC::Run qw( run );
sub new
{
my $class = shift();
my $test = shift();
my $self = {};
$self->{"testName"} = $test;
system("\\mkdir -p test_output/$self->{testName}");
# Create new file track.log for this object
open(my $OFH,">","test_output/$self->{testName}/track.log") || die "Cannot open track.log for writing in new\n";
close($OFH);
bless($self,$class);
return $self;
}
sub logTracker
{
# Writes out the message to track.log file
my $self = shift();
my $message = shift();
open(my $OFH,">>","test_output/$self->{testName}/track.log") || die"Cannot open track.log for writing\n";
print $OFH $message;
close($OFH);
}
sub dummy
{
my $self = shift();
print "running $self->{testName}\n";
my $loCmd = "echo"; # Command to be run
my $loArgs = "This is test $self->{testName}"; # Arguments to the above command
$self->logTracker("Calling run\n");
run [$loCmd,$loArgs],'>&',"test_output/$self->{testName}/output";
$self->logTracker("run completed\n");
}
1;
输出目录结构如下: <pwd>/test_output/Test_<TestNumber>/,每个Test_<TestNumber>目录下都有track.log和output 两个文件
我面临的问题是,有时某些线程会卡在 IPC::run 命令上(track.log 文件不包含“运行完成”行)。
例如,当我上次执行“main.pl 1000 128”(使用 128 个工作人员进行 1000 次测试)时,许多线程没有完成并且 main.pl 继续运行(等待)。当我运行 strace 时,我得到以下输出:
strace: Process 41187 attached with 8 threads
[pid 42343] read(22, <unfinished ...>
[pid 42292] read(20, <unfinished ...>
[pid 42291] read(25, <unfinished ...>
[pid 42282] read(24, <unfinished ...>
[pid 42234] read(43, <unfinished ...>
[pid 42212] read(18, <unfinished ...>
[pid 41187] futex(0x7ff2597939d0, FUTEX_WAIT, 42212, NULL <unfinished ...>
[pid 42338] read(16,
上面运行的进程树如图所示(一旦所有其他线程都完成了):
-perl,41187 main.pl 1000 128
|-perl,42614 main.pl 1000 128
|-perl,42615 main.pl 1000 128
|-perl,42616 main.pl 1000 128
|-perl,42617 main.pl 1000 128
|-perl,42618 main.pl 1000 128
|-perl,42620 main.pl 1000 128
|-perl,42621 main.pl 1000 128
|-{perl},42212
|-{perl},42234
|-{perl},42282
|-{perl},42291
|-{perl},42292
|-{perl},42338
`-{perl},42343
我无法理解为什么会发生这种情况(可能与 Threads::Queue 或管道等有关)。如果有人可以帮助我解决这个问题,那就太好了。我终于希望能够在没有任何 futex 等待/挂起问题的情况下运行所有测试。我努力了 :
- 使用 IPC::Run 的超时选项来检查控件是否返回,但这里没有运气。
- 使用 open3() 代替 IPC::Run :- 与 IPC::Run 相比,问题仍然存在并且更频繁地发生
笔记:
- 当我使用 Perl 5.16.3 时不会出现这个问题。但是我需要使用 Perl 5.10.1 使原始应用程序工作,因此如果有人可以帮助我理解如何解决这个问题会很棒:)
- 在 module1::dummy() 中,如果将 $loCmd 更改为“sleep”并且将 $loArgs 更改为(比如说)“10”,那么问题就不会出现(给人的印象是它可能与管道/IO 有关缓冲区)。
- 卡住的线程数可能因不同的运行而异,有时所有线程都完成而没有任何问题(可能存在竞争条件)。