2

我正在尝试运行的多线程 Perl 应用程序遇到问题(在 Redhat 7.4 上使用 Perl 5.10.1)。该问题已通过以下设置重现(设置类似于原始 Perl 应用程序):

有 2 个文件:main.plmodule1.pm

主要.pl

  • 完成一组任务的顶级脚本,调用为“main.pl <NumberOfTestsToRun> <MaxWorkersToUse>”
# main.pl
#!/apps/perl/5.10.1/bin/perl

use strict;
use warnings;
use v5.10.1;
use threads;
use threads::shared;

use module1;

use lib "<path to Thread::Queue module>"; 
use Thread::Queue;

my $glNumTests = shift();       # Number of tests(jobs) to run
my $glMaxThreads = shift();     # What is the max allowed number of workers (threads) for queue mode

my $q = Thread::Queue->new();    # A new empty queue

# Worker thread
sub worker
{
    # Thread will loop until no more work
    while (defined(my $item = $q->dequeue())) {
        # Do work on $item
        sleep(5); # dummy pre-work
        $item->dummy(); # Actual task that needs to be run by each thread
    }
}

sub start_threads
{
    my $loNumThreads = shift();
    $loNumThreads = $loNumThreads > $glMaxThreads ? $glMaxThreads : $loNumThreads;
    print "Creating $loNumThreads threads...";
    for ( 1..$loNumThreads )
    {
        threads->create(\&worker);
    }
    print "done\n";
}

sub initialize
{
    my @lotests;
    my $loNumTests = shift();
    for my $i (1..$loNumTests)
    {
        push(@lotests,"Test_".$i);
    }
    return \@lotests;
}

sub launchjobs
{
    my @lotests = @{shift()};
    my $tests = {};
    
    # Create objects
    foreach my $lotest (@lotests)
    {
        $tests->{$lotest}->{"obj"} = module1->new($lotest);
    }
    
    # Start the threads before adding work to the queue
    start_threads(scalar(@lotests));
    
    # Adding work to queue
    foreach my $lotest (@lotests)
    {
        $q->enqueue($tests->{$lotest}->{"obj"});
    }
    # No more work to be added
    $q->end();

    # Wait for threads to finish
    foreach my $thr ( threads -> list() )
    {
        $thr->join();
    }   
}

launchjobs(initialize($glNumTests));

模块1.pm

  • 支持使用 IPC::Run 模块运行另一个命令(在本例中为 echo)的模块
# module1.pm
package module1;

use lib "<InstallationPath>/IPC-Run-20200505.0/lib";
use IPC::Run qw( run ); 

sub new
{
    my $class = shift();
    my $test = shift();
    my $self = {};
    $self->{"testName"} = $test;
    system("\\mkdir -p test_output/$self->{testName}");
    # Create new file track.log for this object
    open(my $OFH,">","test_output/$self->{testName}/track.log") || die "Cannot open track.log for writing in new\n";
    close($OFH);
    bless($self,$class);
    return $self;
}

sub logTracker
{
    # Writes out the message to track.log file 
    my $self = shift();
    my $message = shift();  
    open(my $OFH,">>","test_output/$self->{testName}/track.log") || die"Cannot open track.log for writing\n";
    print $OFH $message;
    close($OFH);    
}

sub dummy
{
    my $self = shift();
    print "running $self->{testName}\n";

    my $loCmd = "echo"; # Command to be run
    my $loArgs = "This is test $self->{testName}"; # Arguments to the above command
    
    $self->logTracker("Calling run\n");
    run [$loCmd,$loArgs],'>&',"test_output/$self->{testName}/output";
    $self->logTracker("run completed\n");
}
1;

输出目录结构如下: <pwd>/test_output/Test_<TestNumber>/,每个Test_<TestNumber>目录下都有track.logoutput 两个文件

我面临的问题是,有时某些线程会卡在 IPC::run 命令上(track.log 文件不包含“运行完成”行)。

例如,当我上次执行“main.pl 1000 128”(使用 128 个工作人员进行 1000 次测试)时,许多线程没有完成并且 main.pl 继续运行(等待)。当我运行 strace 时,我得到以下输出:

strace: Process 41187 attached with 8 threads
[pid 42343] read(22,  <unfinished ...>
[pid 42292] read(20,  <unfinished ...>
[pid 42291] read(25,  <unfinished ...>
[pid 42282] read(24,  <unfinished ...>
[pid 42234] read(43,  <unfinished ...>
[pid 42212] read(18,  <unfinished ...>
[pid 41187] futex(0x7ff2597939d0, FUTEX_WAIT, 42212, NULL <unfinished ...>
[pid 42338] read(16,

上面运行的进程树如图所示(一旦所有其他线程都完成了):

-perl,41187 main.pl 1000 128
    |-perl,42614 main.pl 1000 128
    |-perl,42615 main.pl 1000 128
    |-perl,42616 main.pl 1000 128
    |-perl,42617 main.pl 1000 128
    |-perl,42618 main.pl 1000 128
    |-perl,42620 main.pl 1000 128
    |-perl,42621 main.pl 1000 128
    |-{perl},42212
    |-{perl},42234
    |-{perl},42282
    |-{perl},42291
    |-{perl},42292
    |-{perl},42338
    `-{perl},42343

我无法理解为什么会发生这种情况(可能与 Threads::Queue 或管道等有关)。如果有人可以帮助我解决这个问题,那就太好了。我终于希望能够在没有任何 futex 等待/挂起问题的情况下运行所有​​测试。我努力了 :

  • 使用 IPC::Run 的超时选项来检查控件是否返回,但这里没有运气。
  • 使用 open3() 代替 IPC::Run :- 与 IPC::Run 相比,问题仍然存在并且更频繁地发生

笔记:

  • 当我使用 Perl 5.16.3 时不会出现这个问题。但是我需要使用 Perl 5.10.1 使原始应用程序工作,因此如果有人可以帮助我理解如何解决这个问题会很棒:)
  • 在 module1::dummy() 中,如果将 $loCmd 更改为“sleep”并且将 $loArgs 更改为(比如说)“10”,那么问题就不会出现(给人的印象是它可能与管道/IO 有关缓冲区)。
  • 卡住的线程数可能因不同的运行而异,有时所有线程都完成而没有任何问题(可能存在竞争条件)。
4

0 回答 0