performance - 如何将此范围编码 C++ 片段转换为高性能 Haskell？

Question

我知道足够的 Haskell 来翻译下面的代码，但我不太了解如何让它表现良好：

typedef unsigned long precision;
typedef unsigned char uc;

const int kSpaceForByte = sizeof(precision) * 8 - 8;
const int kHalfPrec = sizeof(precision) * 8 / 2;

const precision kTop = ((precision)1) << kSpaceForByte;
const precision kBot = ((precision)1) << kHalfPrec;


//This must be called before encoding starts
void RangeCoder::StartEncode(){
  _low = 0;
  _range = (precision) -1;
}

/*
  RangeCoder does not concern itself with models of the data.
  To encode each symbol, you pass the parameters *cumFreq*, which gives
  the cumulative frequency of the possible symbols ordered before this symbol,
  *freq*, which gives the frequency of this symbol. And *totFreq*, which gives
  the total frequency of all symbols.
  This means that you can have different frequency distributions / models for
  each encoded symbol, as long as you can restore the same distribution at
  this point, when restoring.
*/
void RangeCoder::Encode(precision cumFreq, precision freq, precision totFreq){
  assert(cumFreq + freq <= totFreq && freq && totFreq <= kBot);
  _low += cumFreq * (_range /= totFreq);
  _range *= freq;
  while ((_low ^ _low + _range) < kTop or
         _range < kBot and ((_range= -_low & kBot - 1), 1)){
    //the "a or b and (r=..,1)" idiom is a way to assign r only if a is false.
    OutByte(_low >> kSpaceForByte); //output one byte.
    _range <<= sizeof(uc) * 8;
    _low <<= sizeof(uc) * 8;
  }
}

我知道，我知道“写几个版本并使用标准来看看什么有效”。我不知道我的选择是什么，或者避免愚蠢的错误。

到目前为止，这是我的想法。一种方法是使用 State monad 和/或 lens。另一个是将循环和状态转换为显式递归。我在某处读到显式递归往往在 ghc 上表现不佳。我认为使用 ByteString Builder 将是输出每个字节的好方法。假设我在 64 位平台上运行，我应该使用未装箱的 Word64 参数吗？如果我将精度降低到 32 位，压缩质量不会显着降低。GHC 会对此进行更好的优化吗？

由于这不是 1-1 映射，带有 StateP 的管道将导致非常简洁的代码，我会一次请求一个参数，然后让 while 循环一个字节一个字节地响应。不幸的是，当我对其进行基准测试时，似乎管道开销（不出所料）相当大。由于每个符号都可以导致很多字节的输出，所以感觉有点像带State的concatMap。也许这将是惯用的解决方案？不过，连接字节列表对我来说听起来不是很快。ByteString 有一个 concatMap。也许这是正确的方法？编辑：不，不是。它需要一个 ByteString 作为输入。

我打算在完成后在 Hackage 上发布该软件包，因此您可以提供的任何建议（或实际代码！）都会使社区受益:)。我计划将此压缩用作编写内存效率非常高的压缩映射的基础。

score 1 · Accepted Answer

我在某处读到显式递归往往在 ghc 上表现不佳。

不，GHC 为递归生成缓慢的机器代码，无法减少（或 GHC“不想”减少）。如果可以展开递归（我在您的代码段中没有看到任何基本问题），它将被转换为与 C 或 C++ 中的 while-loop 几乎相同的机器代码。

假设我在 64 位平台上运行，我应该使用未装箱的 Word64 参数吗？如果我将精度降低到 32 位，压缩质量不会显着降低。GHC 会对此进行更好的优化吗？

你的意思是Word#？让 GHC 来处理它，使用盒装类型。我从来没有遇到过只有使用未装箱的类型才能获得一些利润的情况。在 64 位平台上使用 32 位类型无济于事。

优化 GHC 性能的一个一般规则是尽可能避免使用数据结构。如果您可以通过函数参数或闭包传递数据片段，请利用这个机会。

performance - 如何将此范围编码 C++ 片段转换为高性能 Haskell？

1 回答 1

Related

Reference