multithreading - How do I queue perl subroutines to a thread queue instead of data?

Question

Background:
In reading how to multithread my perl script, I read (from http://perldoc.perl.org/threads.html#BUGS-AND-LIMITATIONS)

On most systems, frequent and continual creation and destruction of threads can lead to ever-increasing growth in the memory footprint of the Perl interpreter. While it is simple to just launch threads and then ->join() or ->detach() them, for long-lived applications, it is better to maintain a pool of threads, and to reuse them for the work needed, using queues to notify threads of pending work.

My script will be long-lived; it's an PKI LDAP directory monitoring daemon that will always be running. The enterprise monitoring solution will generate an alarm if it stops running for any reason. My script will check that I can reach another PKI LDAP directory, as well as validate revocation lists on both.

Problem: Everything I can find on google shows passing variables (e.g. scalars) to the thread queue rather than the subroutine itself... I think I'm just not understanding how to implement a thread queue properly compared to how you implement a thread (without queues).

Question 1: How can I "maintain a pool of threads" to avoid the perl interpreter from slowly eating up more and more memory?
Question 2: (Unrelated but while I have this code posted) Is there a safe amount of sleep at the end of the main program so that I don't start a thread more than once in a minute? 60 seems obvious but could that ever cause it to run more than once if the loop is fast, or perhaps miss a minute because of processing time or something?

Thanks in advance!

#!/usr/bin/perl

use feature ":5.10";
use warnings;
use strict;
use threads;
use Proc::Daemon;
#

### Global Variables
use constant false => 0;
use constant true  => 1;
my $app = $0;
my $continue = true;
$SIG{TERM} = sub { $continue = false };

# Directory Server Agent (DSA) info
my @ListOfDSAs = (
    { name => "Myself (inbound)",
      host => "ldap.myco.ca",
      base => "ou=mydir,o=myco,c=ca",
    },
    { name => "Company 2",
      host => "ldap.comp2.ca",
      base => "ou=their-dir,o=comp2,c=ca",
    }
);    
#

### Subroutines

sub checkConnections
{   # runs every 5 minutes
    my (@DSAs, $logfile) = @_;
    # Code to ldapsearch
    threads->detach();
}

sub validateRevocationLists
{   # runs every hour on minute xx:55
    my (@DSAs, $logfile) = @_;
    # Code to validate CRLs haven't expired, etc
    threads->detach();
}

#

### Main program
Proc::Daemon::Init;

while ($continue)
{
    my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime(time);

    # Question 1: Queues??

    if ($min % 5 == 0 || $min == 0)
        { threads->create(&checkConnections, @ListOfDSAs, "/var/connect.log"); }

    if ($min % 55 == 0)
        { threads->create(&validateRevocationLists, @ListOfDSAs, "/var/RLs.log"); }

    sleep 60; # Question 2: Safer/better way to prevent multiple threads being started for same check in one matching minute?
}

# TERM RECEIVED
exit 0;
__END__

score 3 · Accepted Answer

use threads;
use Thread::Queue 3.01 qw( );

my $check_conn_q      = Thread::Queue->new();
my $validate_revoke_q = Thread::Queue->new();

my @threads;
push @threads, async {
   while (my $job = $check_conn_q->dequeue()) {
      check_conn(@$job);
   }
};
push @threads, async {
   while (my $job = $validate_revoke_q->dequeue()) {
      validate_revoke(@$job);
   }
};

while ($continue) {
   my ($S,$M,$H,$m,$d,$Y) = localtime; $m+=1; $Y+=1900;

   $check_conn_q->enqueue([ @ListOfDSAs, "/var/connect.log" ])
      if $M % 5 == 0;

   $validate_revoke_q->enqueue([ @ListOfDSAs, "/var/RLs.log" ])
      if $M == 55;

   sleep 30;
}

$check_conn_q->end();
$validate_revoke_q->end();
$_->join for @threads;

我不确定这里是否需要并行化。如果不是，您可以简单地使用

use List::Util qw( min );

sub sleep_until {
   my ($until) = @_;
   my $time = time;
   return if $time >= $until;
   sleep($until - $time);
}

my $next_check_conn = my $next_validate_revoke = time;
while ($continue) {
   sleep_until min $next_check_conn, $next_validate_revoke;
   last if !$continue;

   my $time = time;
   if ($time >= $next_check_conn) {
      check_conn(@ListOfDSAs, "/var/connect.log");
      $next_check_conn = time + 5*60;
   }

   if ($time >= $next_validate_revoke) {
      validate_revoke(@ListOfDSAs, "/var/RLs.log");
      $next_validate_revoke = time + 60*60;
   }
}

score 1 · Accepted Answer

我建议一次只运行一个检查，因为在这里使用线程似乎没有令人信服的理由，而且您不想为将一直运行的程序增加不必要的复杂性。

如果您确实想了解如何使用线程池，模块中包含一些示例threads。还有一个可能有用的Thread::Pool模块。

至于确保您不在同一分钟内重复检查，您是正确的，sleeping60 秒是不够的。无论您选择睡眠的值是什么，您都会遇到它失败的边缘情况：或者它会略短于一分钟，并且您偶尔会在同一分钟内进行两次检查，或者它会略长于一分钟，你偶尔会完全错过一张支票。

相反，使用变量来记住上次完成任务的时间。然后，您可以使用更短的睡眠时间，而不必担心每分钟进行多次检查。

my $last_task_time = -1;
while ($continue)
{
    my $min = (localtime(time))[1];

    if ($last_task_time != $min && 
          ($min % 5 == 0 || $min > ($last_task_time+5)%60))
    { 
        #Check connections here.

        if ($min == 55 || ($last_task_time < 55 && $min > 55))
        { 
           #Validate revocation lists here.
        }

        $last_task_time = $min;
    }
    else
    {
        sleep 55; #Ensures there is at least one check per minute.
    }
}

更新：我修复了代码，以便在最后一个任务运行时间过长时恢复。如果偶尔需要很长时间，这会很好。但是，如果任务经常花费超过五分钟的时间，您需要一个不同的解决方案（在这种情况下线程可能是有意义的）。

multithreading - How do I queue perl subroutines to a thread queue instead of data?

2 回答 2

Related

Reference