perl - 缓存的施瓦茨变换

Question

我正在学习“中级 Perl”，这很酷。我刚刚完成了关于“施瓦茨变换”的部分，在它沉入其中之后，我开始想知道为什么变换不使用缓存。在具有多个重复值的列表中，转换会重新计算每个值的值，所以我想为什么不使用哈希来缓存结果。这是一些代码：

# a place to keep our results
my %cache;

# the transformation we are interested in
sub foo {
  # expensive operations
}

# some data
my @unsorted_list = ....;

# sorting with the help of the cache
my @sorted_list = sort {
  ($cache{$a} //= &foo($a)) <=> ($cache{$b} //= &foo($b))
} @unsorted_list;

我错过了什么吗？为什么书籍中没有列出 Schwartzian 变换的缓存版本，并且通常只是更好地传播，因为乍一看我认为缓存版本应该更有效？

编辑：daxim 在评论中指出这被称为兽人策略。所以我并没有发疯，虽然我不太明白这个名字。

score 5 · Accepted Answer

（许多其他评论已编辑）

在某种程度上，数组查找比散列查找更有效（即，$a->[1]比快$cache{$a}），规范形式可能比您的代码更有效，即使有很多重复。

基准测试结果：

这是我的基准测试代码：

# when does an additional layer of caching improve the performance of 
# the Schwartzian transform?

# methods:
#   1. canonical Schwartzian transform
#   2. cached transform
#   3. canonical with memoized function

# inputs:
#   1. few duplicates (rand)
#   2. many duplicates (int(rand))

# functions:
#   1. fast
#   2. slow

use Benchmark;
use Math::BigInt;
use strict qw(vars subs);
use warnings;
no warnings 'uninitialized';

# fast_foo: a cheap operation,  slow_foo: an expensive operation
sub fast_foo { my $x = shift; exp($x) }
sub slow_foo { my $x = shift; my $y = new Math::BigInt(int(exp($x))); $y->bfac() }

# XXX_memo_foo: put caching optimization inside call to 'foo'
my %fast_memo = ();
sub fast_memo_foo {
  my $x = shift;
  if (exists($fast_memo{$x})) {
    return $fast_memo{$x};
  } else {
    return $fast_memo{$x} = fast_foo($x);
  }
}

my %slow_memo = ();
sub slow_memo_foo {
  my $x = shift;
  if (exists($slow_memo{$x})) {
    return $slow_memo{$x};
  } else {
    return $slow_memo{$x} = slow_foo($x);
  }
}

my @functions = qw(fast_foo slow_foo fast_memo_foo slow_memo_foo);
my @input1 = map { 5 * rand } 1 .. 1000;         # 1000 random floats with few duplicates
my @input2 = map { int } @input1;                # 1000 random ints with many duplicates

sub canonical_ST {
  my $func = shift @_;
  my @sorted = map { $_->[0] }
    sort { $a->[1] <=> $b->[1] }
    map { [$_, $func->($_)] } @_;
  return;
}

sub cached_ST {
  my $func = shift @_;
  my %cache = ();
  my @sorted = sort {
    ($cache{$a} //= $func->($a)) <=> ($cache{$b} //= $func->{$b})
  } @_;
  return;
}

foreach my $input ('few duplicates','many duplicates') {
  my @input = $input eq 'few duplicates' ? @input1 : @input2;
  foreach my $func (@functions) {

    print "\nInput: $input\nFunction: $func\n-----------------\n";
    Benchmark::cmpthese($func =~ /slow/ ? 30 : 1000,
             {
              'Canonical' => sub { canonical_ST($func, @input) },
              'Cached'    => sub { cached_ST($func, @input) }
             });
  }
}

和结果（草莓 Perl 5.12）：

输入：少量重复
功能：fast_foo
-----------------
           速率规范缓存
标准 160/s -- -18%
缓存 196/s 22% --

输入：少量重复
功能：slow_foo
-----------------
            速率规范缓存
标准 7.41/s -- -0%
缓存 7.41/s 0% --

输入：少量重复
功能：fast_memo_foo
-----------------
           速率规范缓存
标准 153/s -- -25%
缓存 204/s 33% --

输入：少量重复
功能：slow_memo_foo
-----------------
            速率缓存规范
缓存 20.2/s -- -7%
标准 21.8/s 8% --

输入：许多重复
功能：fast_foo
-----------------
           速率规范缓存
标准 179/s -- -50%
缓存 359/s 101% --

输入：许多重复
功能：slow_foo
-----------------
            速率规范缓存
标准 11.8/s -- -62%
缓存 31.0/s 161% --

输入：许多重复
功能：fast_memo_foo
-----------------
           速率规范缓存
标准 179/s -- -50%
缓存 360/s 101% --

输入：许多重复
功能：slow_memo_foo
-----------------
            速率规范缓存
标准 28.2/s -- -9%
缓存 31.0/s 10% --

这些结果让我有些吃惊——规范的 Schwartzian 变换在最有利的条件下（昂贵的函数调用、很少的重复或没有记忆）只有一点优势，而在其他情况下则处于相当大的劣势。函数内部的 OP 缓存方案sort甚至优于外部的记忆sort。当我做基准测试时，我并没有预料到这一点，但我认为 OP 正在做一些事情。

score 2 · Accepted Answer

当您调用foo()多个变换时，缓存 Schwartzian 变换会很有用：

@sorted1 = map { $_->[0] }
           sort { $a->[1] cmp $b->[1] }
           map  { [$_, foo($_)] }
           @unsorted1;
@sorted2 = map { $_->[0] }
           sort { $a->[1] cmp $b->[1] }
           map  { [$_, foo($_)] }
           @unsorted2;

如果@unsorted1和@unsorted2具有大致相同的值，那么您将调用foo()相同的值两次。如果此函数的计算量很大，您可能希望缓存结果。

最简单的方法是使用Memoize模块：

use Memoize;
memoize('foo');

如果你加上这两行，你就不用担心为自己设置一个缓存，为foo()你Memoize处理它。

编辑：我只是注意到你的排序不做施瓦茨变换。ST 背后的全部要点是，您只需为列表的每个成员运行一次昂贵的函数，这就是您执行整个map sort map构造的原因。虽然您可能可以像您所做的那样进行一些手写缓存，但这将是非标准 Perl（从某种意义上说，有人会期望看到 ST，然后必须坐在那里弄清楚您的代码是什么做）并且可能很快成为维护的噩梦。

但是，是的，如果您的列表具有重复值，则使用缓存（手动滚动或 with Memoize）可能会导致更快的 Schwartzian 变换。我说“可以”是因为在某些情况下，进行哈希查找实际上比调用更昂贵foo()（Memoize文档sub foo { my $x = shift; return $x * $x }用作这些实例之一的示例）。

perl - 缓存的施瓦茨变换

2 回答 2

基准测试结果：

Related

Reference