4

我正在用 Dancer 编写一个非常小的 URL 缩短器。它使用 REST 插件将发布的 URL 存储在数据库中,其中包含六个字符串,用户使用该字符串来访问缩短的 URL。

现在我有点不确定我的随机字符串生成方法。

sub generate_random_string{
    my $length_of_randomstring = shift; # the length of 
                                        # the random string to generate

    my @chars=('a'..'z','A'..'Z','0'..'9','_');
    my $random_string;
    for(1..$length_of_randomstring){
        # rand @chars will generate a random 
        # number between 0 and scalar @chars
        $random_string.=$chars[rand @chars];
    }

    # Start over if the string is already in the Database
    generate_random_string(6) if database->quick_select('urls', { shortcut => $random_string });

    return $random_string;
}

如果生成的字符串已经在数据库中,这会生成一个六个字符的字符串并递归调用该函数。我知道有 63^6 个可能的字符串,但如果数据库收集更多条目,这将需要一些时间。也许它会变成一个几乎无限的递归,我想阻止它。

有没有办法生成唯一的随机字符串,防止递归?

提前致谢

4

4 回答 4

5

我们真的不需要对你的函数会有多少次迭代(或递归)犹豫不决。我相信在每次调用时,预期的迭代次数是几何分布的(即第一次成功之前的试验次数由几何分布控制),其平均值为 1/p,其中 p 是成功找到未使用字符串的概率。我相信 p 只是 1 - n/63^6,其中 n 是当前存储的字符串的数量。因此,我认为您需要在数据库中存储 300 亿个字符串 (~63^6/2),然后您的函数每次调用平均递归超过 2 次 (p = .5)。

此外,几何分布的方差为 1-p/p^2,因此即使有 300 亿个条目,一个标准差也只是 sqrt(2)。因此,我预计大约 99% 的时间循环将花费少于 2 + 2*sqrt(2) 次或 5 次迭代。换句话说,我不会太担心它。

于 2012-12-03T22:25:49.263 回答
4

从学术立场来看,这似乎是一个有趣的项目。但是,如果您在时钟上并且只需要随机且不同的字符串,我会使用 Data::GUID 模块。

use strict;
use warnings;
use Data::GUID qw( guid_string );

my $guid = guid_string();
于 2012-12-03T22:45:17.420 回答
2

Getting rid of recursion is easy; turn your recursive call into a do-while loop. For instance, split your function into two; the "main" one and a helper. The "main" one simply calls the helper and queries the database to ensure it's unique. Assuming generate_random_string2 is the helper, here's a skeleton:

do {
   $string = generate_random_string2(6);
} while (database->quick_select(...));

As for limiting the number of iterations before getting a valid string, what about just saving the last generated string and always building your new string as a function of that?

For example, when you start off, you have no strings, so let's just say your string is 'a'. Then the next time you build a string, you get the last built string ('a') and apply a transformation on it, for instance incrementing the last character. This gives you 'b'. and so on. Eventually you get to the highest character you care for (say 'z') at which point you append an 'a' to get 'za', and repeat.

Now there is no database, just one persistent value that you use to generate the next value. Of course if you want truly random strings, you will have to make the algorithm more sophisticated, but the basic principle is the same:

  1. Your current value is a function of the last stored value.
  2. When you generate a new value, you store it.
  3. Ensure your generation will produce a unique value (one that did not occur before).
于 2012-12-03T16:43:19.117 回答
1

我还有一个基于使用 MySQL 的想法。

create table string (
    string_id int(10) not null auto_increment,
    string varchar(6) not null default '',
    primary key(string_id)
);

insert into string set string='';

update string
    set string = lpad( hex( last_insert_id() ), 6, uuid() )
    where string_id = last_insert_id();

select string from string
    where string_id = last_insert_id();

这为您提供了一个增量十六进制值,该值用非零垃圾填充。

于 2012-12-03T23:43:52.647 回答