haskell - 生成分布更均匀的随机序列

Question

我已经阅读了这个问题，并认为这个算法不是最优的。例如，'f 20 100' 返回类似 [85,14,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0]; 结果，我经常得到一个长的零尾。

好吧，我认为这是一项有趣的任务，并决定创建自己的实现 :)

我决定按随机比例划分数字：

g 1 sum = return [sum]
g n sum = do
    prop <- randomRIO(0.0, 1.0)
    k1 <- g (round prop * n) (round( prop * sum))
    k2 <- g (n - (round prop * n)) (sum - (round prop * sum))
    return k1 ++ k2

但我的代码不起作用：

   Couldn't match expected type `IO [a0]' with actual type `[a1]'
    In the expression: return k1 ++ k2
    In the expression:
      do { prop <- randomRIO (0.0, 1.0);
           k1 <- g (round prop * n) (round (prop * sum));
           k2 <- g (n - (round prop * n)) (sum - (round prop * sum));
             return k1 ++ k2 }
    In an equation for `g':
        g n sum
          = do { prop <- randomRIO (0.0, 1.0);
                 k1 <- g (round prop * n) (round (prop * sum));

                 k2 <- g (n - (round prop * n)) (sum - (round prop * sum));
                 .... }

如我所见，我无法连接 IO 列表。我该如何解决？

score 6 · Accepted Answer

你问的类型错误是由你应该写的事实引起的

return (k1 ++ k2)

而不是

return k1 ++ k2

请注意，这return只是 Haskell 中的一个函数，并且函数应用程序的绑定比任何其他中缀运算符都强，因此您的代码读取 Haskell 就像您编写的一样

(return k1) ++ k2

但是请注意，您的代码还有其他问题。

score 2 · Accepted Answer

首先让我们为以后做一些导入：

import Control.Applicative
import Control.Monad
import System.Random
import Data.List hiding (partition)

代码修复

永远记住函数应用比中缀运算符具有更高的优先级：return k1 ++ k2手段(return k1) ++ k2和round prop * n手段(round prop) * n。您可以使用$将函数与您应用它的表达式分开，因为它的f $ x = f x优先$级非常低。例如，您可以使用return $ k1 ++ k2。

在相乘之前，您将 Ints 和 Doubles 混合了一些(round prop * n)比例，但您想先相乘，因此您需要fromIntegral申请n. 我为此做了一个单独的功能

(.*) :: Double -> Int -> Int
d .* i = floor $ d * fromIntegral i

所以现在(round prop * n)你可以使用(prop .* n). 它稍微清理了代码，这意味着如果它是错误的，我们可以在一个函数中修复它，而不是到处修复。

我提供了一个类型签名以使错误消息更具信息性，并且还提供了第二种基本情况 - 它没有终止，因为有时舍入会导致它要求长度为 0 的列表。

partition1 :: Int -> Int -> IO [Int]
partition1 0 total = return []
partition1 1 total = return [total]
partition1 n total = do
    prop <- randomRIO(0.0, 1.0)
    k1 <- partition1 (prop .* n) (prop .* total)
    k2 <- partition1 (n - (prop .* n)) (total - (prop .* total))
    return $ k1 ++ k2

我还冒昧地给它起了一个更具描述性的名字。

获得正确的总数

不幸的是，这可以编译，但正如 Will Ness 在评论中指出的那样，存在一个小故障：它通常给你的数字总小于总数。事实证明，这是因为您将调用partition 0 nnon-zero n，要求长度为 0 的列表总和为非零。哎呀。

您的算法背后的想法是随机拆分列表和总数，但保持两者的比例相同，以防止分布偏向一边（原始问题中的问题）。

让我们使用这个想法，但要防止它要求长度为零 - 我们需要 prop 既不是 0 也不是 1。

partition2 :: Int -> Int -> IO [Int]
partition2 0 total = return []
partition2 1 total = return [total]
partition2 n total = do
    new_n <- randomRIO(1,n-1)
    let prop = fromIntegral new_n / fromIntegral n
    k1 <- partition2 new_n (prop .* total)
    k2 <- partition2 (n - new_n) (total - (prop .* total))
    return $ k1 ++ k2

现在它永远不会给我们错误的总数。万岁！

随机不等于公平

但是哎呀：partition2 18 10000给了我们

[555,555,555,555,556,556,555,556,556,556,555,555,556,556,555,556,556,556]

问题是公平与随机不同。这个算法很公平，但不是很随机。让我们让它与长度分开选择比例：

partition3 :: Int -> Int -> IO [Int]
partition3 0 total = return []
partition3 1 total = return [total]
partition3 n total = do
    new_n   <- randomRIO(1,n-1)
    new_total <- randomRIO(0,total)  -- it's fine to have zeros.
    k1 <- partition3 new_n new_total
    k2 <- partition3 (n - new_n) (total - new_total)
    return $ k1 ++ k2

看起来更好：partition3 15 20000给了我

[1134,123,317,725,1031,3897,8089,2111,164,911,25,0,126,938,409]

随机不公平，但也没有偏见

这显然要好得多，但本质上我们正在做的二进制分区是引入偏差。

您可以通过查看来测试很多运行

check :: (Int -> Int -> IO [Int]) -> Int -> Int -> Int -> IO ()
check f n total times = mapM_ print =<< map average.transpose.map (righttotal total) <$> replicateM times (f n total)
   where average xs = fromIntegral (sum xs)/fromIntegral total

righttotal tot xs | sum xs == tot = xs
                  | otherwise = error $ "wrong total: " ++ show (sum xs)

这一次check partition3 11 10000 1000给了我

n没有进入大量的测试数据和分析，虽然很有趣，当不是的一个因素时，有一个不成比例的 0 total，并且分布不均匀，它是杯形的 - 算法最终在一端塞满数据.

出路

与其一点一点地选择子列表中的数量，让我们生成一个小计将同时结束的所有位置。当然其中一个必须是总数，我们最好在生成它们后对其进行排序。

stopgaps :: Int -> Int -> IO [Int]
stopgaps parts total = sort.(total:) <$> replicateM (parts-1) (randomRIO (0,total))

这里我replicateM :: Int -> m a -> m [a]用来生成parts-1正确范围内的随机数。

我想插入一个无名英雄：

mapAccumL :: (acc -> x -> (acc, y)) -> acc -> [x] -> (acc, [y])

用于沿列表累积，生成新列表。

gapsToLengths :: [Int] -> (Int,[Int])
gapsToLengths = mapAccumL between 0
   where between previous new = (new,new - previous)

partition4 :: Int -> Int -> IO [Int]
partition4 parts total = snd.gapsToLengths <$> stopgaps parts total

它有效吗？

一些测试运行partition4 11 10000，漂亮的打印：

[ 786,   20,  607,  677, 1244, 1137,  990,   50, 1716,  813, 1960]
[ 406,  110, 2556,  126, 1289,  567,  348, 1230,  171,  613, 2584]
[ 368, 1794,  136, 1266,  583,   93, 1514,   66, 1594, 1685,  901]
[ 657, 1296, 1754,  411,  691, 1865,  531,  270, 1941,  286,  298]
[2905,  313,  842,  796,  698, 1104,   82, 1475,   22,  619, 1144]
[1411,  966,  530,  129,   81,  561, 1779, 1179,  301,  607, 2456]
[1143,  409,  903,   27,  855,  354,  887, 1898, 1880,  301, 1343]
[ 260,  643,   96,  323,  142,   74,  401,  977, 3685, 2690,  709]
[1350,  979,  377,  765,  137, 1295,  615,  592, 2099, 1088,  703]
[2411,  958,  330, 1433, 1355,  680, 1075,   41,  988,   81,  648]

这看起来很随意。让我们检查一下没有偏见：

check partition4 11 10000 1000
92.6425
93.4513
92.3544
90.8508
88.0297
91.7731
88.7939
86.5268
86.3502
95.2499
93.9774

终于！

score 0 · Accepted Answer

这是我使用的模块的一部分，用于简化 QuickCheck 的使用。代码中有趣的部分是由无与伦比的Brent Yorgey编写的，并使用我上面评论中链接的博客文章中描述的二项式数字系统。该pickDistribution函数是一些示例粘合代码，用于生成具有特定权重的非负数列表（您可以使用调整大小来选择特定权重）。

{-# LANGUAGE MultiParamTypeClasses #-}
module QuickCheckUtils where

import Control.Monad.Reader
import Test.QuickCheck
import Test.QuickCheck.Gen

instance MonadReader Int Gen where
    ask = MkGen (\r n -> n)
    local f (MkGen g) = MkGen (\r -> g r . f)

-- pickDistribution n chooses uniformly at random from all lists of length n of
-- non-negative numbers that sum to the current weight
pickDistribution :: Int -> Gen [Int]
pickDistribution n = do
    m <- ask
    let j = fromIntegral (m+n-1)
        k = fromIntegral (n-1)
    i <- choose (1, binom j k)
    return . map fromIntegral . combToComposition $ toComb j k (i-1)

-- code from Brent {{{
-- Comb n cs represents a choice cs of distinct numbers between 0 and
-- (n-1) inclusive.
data Comb = Comb Integer [Integer] deriving Show
type Comp = [Integer]

-- Convert a choice of (n-1) out of (m+n-1) things into a composition
-- of m, that is, an ordered list of natural numbers with sum m.
combToComposition :: Comb -> Comp
combToComposition (Comb n cs) = map pred $ zipWith (-) cs' (tail cs')
    where cs' = [n] ++ cs ++ [-1]

-- Convert a number into "base binomial", i.e. generate the
-- ith combination in lexicographical order.  See TAOCP 7.2.1.3, Theorem L.
toComb :: Integer -- ^ Total number of things
       -> Integer -- ^ Number to select
       -> Integer -- ^ Index into the lexicographic ordering of combinations
       -> Comb    -- ^ Corresponding combination
toComb n k i = Comb n (toComb' k i (n-1) (binom (n-1) k))

binom _ 0 = 1
binom 0 _ = 0
binom n k = binom (n-1) (k-1) * n `div` k

toComb' 0 _ _ _ = []
toComb' k i j jCk
    | jCk > i   =     toComb' k     i         (j-1) (jCk * (j-k) `div` j)
    | otherwise = j : toComb' (k-1) (i - jCk) (j-1) (jCk *     k `div` j)
-- }}}

haskell - 生成分布更均匀的随机序列

3 回答 3

代码修复

获得正确的总数

随机不等于公平

随机不公平，但也没有偏见

出路

它有效吗？

Related

Reference