perl - Perl 中的快速字符串校验和函数生成 0..2^32-1 范围内的值

Question

我正在寻找具有以下属性的 Perl 字符串校验和函数：

输入：未定义长度的 Unicode 字符串 ( $string)
输出：无符号整数 ( $hash)，其中0 <= $hash <= 2^32-1包含 (0 到 4294967295，匹配 4 字节 MySQL unsigned int 的大小)

伪代码：

sub checksum {
    my $string = shift;
    my $hash;
    ... checksum logic goes here ...
    die unless ($hash >= 0);
    die unless ($hash <= 4_294_967_295);
    return $hash;
}

理想情况下，校验和函数应该快速运行，并且应该在目标空间 ( 0.. 2^32-1) 中生成某种程度均匀的值以避免冲突。在这个应用程序中，随机碰撞完全不是致命的，但显然我想尽可能避免它们。

鉴于这些要求，解决此问题的最佳方法是什么？

score 14 · Accepted Answer

任何散列函数就足够了 - 只需将其截断为 4 字节并转换为数字。好的散列函数有一个随机分布，无论你在哪里截断字符串，这个分布都是恒定的。

我建议使用Digest::MD5，因为它是 Perl 标准中最快的哈希实现。正如 Pim 提到的，String::CRC 也是用 C 实现的，应该更快。

以下是计算哈希并将其转换为整数的方法：

use Digest::MD5 qw(md5);
my $str = substr( md5("String-to-hash"), 0, 4 );
print unpack('L', $str);  # Convert to 4-byte integer (long)

score 5 · Accepted Answer

来自perldoc -f unpack：

        For example, the following computes the same number as the
        System V sum program:

            $checksum = do {
                local $/;  # slurp!
                unpack("%32W*",<>) % 65535;
            };

score 4 · Accepted Answer

4

不知道它有多快，但您可以尝试String::CRC。

于 2009-12-22T13:04:03.303 回答

perl - Perl 中的快速字符串校验和函数生成 0..2^32-1 范围内的值

3 回答 3

Related

Reference