c# - 从加权列表中选择一个随机项目

Question

我正在尝试编写一个程序来从美国人口普查姓氏列表中选择一个随机名称。列表格式为

Name           Weight Cumulative line
-----          -----  -----      -
SMITH          1.006  1.006      1
JOHNSON        0.810  1.816      2
WILLIAMS       0.699  2.515      3
JONES          0.621  3.136      4
BROWN          0.621  3.757      5
DAVIS          0.480  4.237      6

假设我将数据加载到类似的结构中

Class Name
{
    public string Name {get; set;}
    public decimal Weight {get; set;}
    public decimal Cumulative {get; set;}
}

什么数据结构最适合保存名称列表，以及从列表中选择随机名称但名称分布与现实世界相同的最佳方法是什么。

如果数据结构有所不同，我只会使用前 10,000 行。

我曾尝试查看有关加权随机性的其他一些问题，但我在将理论转化为代码时遇到了一些麻烦。我对数学理论知之甚少，所以我不知道这是否是“有或没有替换”随机选择，我希望同一个名字能够多次出现，这意味着。

score 9 · Accepted Answer

处理此问题的“最简单”方法是将其保存在列表中。

然后你可以使用：

Name GetRandomName(Random random, List<Name> names)
{
    double value = random.NextDouble() * names[names.Count-1].Culmitive;
    return names.Last(name => name.Culmitive <= value);
}

如果速度是一个问题，您可以存储一个单独的数组，仅包含Culmitive值。有了这个，你可以Array.BinarySearch用来快速找到合适的索引：

Name GetRandomName(Random random, List<Name> names, double[] culmitiveValues)
{
    double value = random.NextDouble() * names[names.Count-1].Culmitive;
    int index = Array.BinarySearch(culmitiveValues, value);
    if (index >= 0)
        index = ~index;

    return names[index];
}

另一种可能是最有效的选择是使用类似于C5 Generic Collection Library的树类之一的东西。然后，您可以使用RangeFrom来查找适当的名称。这样做的好处是不需要单独的集合

score 3 · Accepted Answer

我为随机选择的加权项目创建了一个 C# 库。

它实现了树选择和 walker 别名方法算法，为所有用例提供最佳性能。
它经过单元测试和优化。
它具有 LINQ 支持。
它是免费和开源的，在 MIT 许可下获得许可。

一些示例代码：

IWeightedRandomizer<string> randomizer = new DynamicWeightedRandomizer<string>();
randomizer["Joe"] = 1;
randomizer["Ryan"] = 2;
randomizer["Jason"] = 2;

string name1 = randomizer.RandomWithReplacement();
//name1 has a 20% chance of being "Joe", 40% of "Ryan", 40% of "Jason"

string name2 = randomizer.RandomWithRemoval();
//Same as above, except whichever one was chosen has been removed from the list.

score 0 · Accepted Answer

我会说一个数组（如果您愿意，可以使用向量）最好保存它们。至于加权平均，求和，在零和和之间取一个随机数，取累积值较小的姓氏。（例如这里，<1.006 = smith，1.006-1.816 = johnson，等等。

PS这是累积的。

score 0 · Accepted Answer

只是为了好玩，绝不是最佳选择

List<Name> Names = //Load your structure into this

List<String> NameBank = new List<String>();
foreach(Name name in Names)
   for(int i = 0; i <= (int)(name.Weight*1000); i++)
     NameBank.Add(name.Name)

然后：

String output = NameBank[rand(NameBank.Count)];

c# - 从加权列表中选择一个随机项目

4 回答 4

Related

Reference