multithreading - 使用共享变量的 Perl 线程性能

Question

我正在开发一个用 Perl 实现的项目，并认为使用线程来分配工作是一个不错的主意，因为这些任务可以相互独立地完成，并且只能从内存中的共享数据中读取。但是，性能与我预期的相差甚远。因此，经过一番调查，我只能得出结论，Perl 中的线程基本上很烂，但我一直想知道，一旦我实现了一个共享变量，性能就会下降。

例如，这个小程序没有共享任何东西，并且消耗了 75% 的 CPU（如预期的那样）：

use threads;

sub fib {
  my ( $n ) = @_;
  if ( $n < 2 ) {
     return $n;
  } else {
     return fib( $n - 1 ) + fib( $n - 2 );
  }
}

my $thr1 = threads->create( 'fib', 35 );
my $thr2 = threads->create( 'fib', 35 );
my $thr3 = threads->create( 'fib', 35 );

$thr1->join;
$thr2->join;
$thr3->join;

一旦我引入一个共享变量$a，CPU 使用率就在 40% 到 50% 之间：

use threads;
use threads::shared;

my $a : shared;
$a = 1000;

sub fib {
  my ( $n ) = @_;
  if ( $n < 2 ) {
    return $n;
  } else {
    return $a + fib( $n - 1 ) + fib( $n - 2 ); # <-- $a was added here
  }
}

my $thr1 = threads->create( 'fib', 35 );
my $thr2 = threads->create( 'fib', 35 );
my $thr3 = threads->create( 'fib', 35 );

$thr1->join;
$thr2->join;
$thr3->join;

$a只读也是如此，不会发生锁定，但性能会降低。我很好奇为什么会这样。

目前我在 Windows XP 上的 Cygwin 下使用 Perl 5.10.1。不幸的是，我无法在具有（希望）更新的 Perl 的非 Windows 机器上对此进行测试。

score 3 · Accepted Answer

您的代码是围绕同步结构的紧密循环。通过让每个线程将共享变量（每个线程仅一次）复制到非共享变量中来优化它。

score 0 · Accepted Answer

在 Perl 中构造一个包含大量数据的共享对象是可能的，而不必担心额外的副本。生成工作线程时不会影响性能，因为共享数据驻留在单独的线程或进程中，具体取决于是否使用线程。

use MCE::Hobo;    # use threads okay or parallel module of your choice
use MCE::Shared;

# The module option constructs the object under the shared-manager.
# There's no trace of data inside the main process. The construction
# returns a shared reference containing an id and class name.

my $data = MCE::Shared->share( { module => 'My::Data' } );
my $b;

sub fib {
  my ( $n ) = @_;
  if ( $n < 2 ) {
    return $n;
  } else {
    return $b + fib( $n - 1 ) + fib( $n - 2 );
  }
}

my @thrs;

push @thrs, MCE::Hobo->create( sub { $b = $data->get_keys(1000), fib(35) } );
push @thrs, MCE::Hobo->create( sub { $b = $data->get_keys(2000), fib(35) } );
push @thrs, MCE::Hobo->create( sub { $b = $data->get_keys(3000), fib(35) } );

$_->join() for @thrs;

exit;

# Populate $self with data. When shared, the data resides under the
# shared-manager thread (via threads->create) or process (via fork).

package My::Data;

sub new {
  my $class = shift;
  my %self;

  %self = map { $_ => $_ } 1000 .. 5000;

  bless \%self, $class;
}

# Add any getter methods to suit the application. Supporting multiple
# keys helps reduce the number of trips via IPC. Serialization is
# handled automatically if getter method were to return a hash ref.
# MCE::Shared will use Serial::{Encode,Decode} if available - faster.

sub get_keys {
  my $self = shift;
  if ( wantarray ) {
    return map { $_ => $self->{$_} } @_;
  } else {
    return $self->{$_[0]};
  }
}

1;

multithreading - 使用共享变量的 Perl 线程性能

2 回答 2

Related

Reference