f# - 计算完美数时的 F# 并行化问题？

Question

我正在尝试优化一个从给定指数计算完美数字的小程序。

该程序（几乎）完美运行，但是当我打开任务管理器时，它仍然在单个线程上运行。这意味着我一定做错了什么，但我对 F# 的了解仍处于“开始”阶段。

我会尽量把这个问题说清楚，但如果我没有这样做，请告诉我。

完美数是所有除数之和（除数字本身）等于数字本身的数字（例如，6 是完美的，因为它的除数 1、2 和 3 的总和是 6）。

我使用素数来加快计算速度，也就是说我对存储所有除数的（巨大的）列表不感兴趣。为此，我使用欧几里得证明是正确的公式： (2*(num - 1)) * ( 2* (num - 1)) 其中后者是梅森素数。我使用了来自stackoverflow（@Juliet）的一个非常快速的算法来确定给定的数字是否是素数。

当我在网上阅读了几篇文章（我还没有买一本好书，真丢脸）时，我发现序列比列表表现更好。所以这就是为什么我首先开始创建一个生成完美数字序列的函数：

   let perfectNumbersTwo (n : int) =  
    seq { for i in 1..n do 
           if (PowShift i) - 1I |> isPrime 
           then yield PowShift (i-1) * ((PowShift i)-1I)
        }

辅助函数 PowShift 实现如下：

    let inline PowShift (exp:int32) = 1I <<< exp ;;

我使用位移运算符，因为所有功率计算的基础都是从 2 开始的，因此这可能是一种简单的方法。当然，我仍然很感谢我在以下问题上提出的问题的贡献：F# Power 问题，它接受两个参数都是 bigints> F# Power 问题，它接受两个参数都是 bigints

Juliet 创建的函数（这里借用）如下：

let isPrime ( n : bigint) = 
    let maxFactor = bigint(sqrt(float n))
    let rec loop testPrime tog =
        if testPrime > maxFactor then true
        elif n % testPrime = 0I then false
        else loop (testPrime + tog) (6I - tog)
    if n = 2I || n = 3I || n = 5I then true
    elif n <= 1I || n % 2I = 0I || n % 3I = 0I || n % 5I = 0I then false
    else loop 7I 4I;;

使用此代码，无需并行，在我的笔记本电脑上大约需要 9 分钟才能找到第 9 个完美数字（由 37 位数字组成，可以找到指数值为 31）。由于我的笔记本电脑有一个带有两个内核的 CPU，并且只有一个以 50% 的速度运行（一个内核的满载），我认为我可以通过并行计算结果来加快计算速度。

所以我改变了我的完美数字功能如下：

//Now the function again, but async for parallel computing
let perfectNumbersAsync ( n : int) =
    async {
        try
            for x in 1.. n do
                if PowShift x - 1I |> isPrime then
                    let result = PowShift (x-1) * ((PowShift x)-1I)
                    printfn "Found %A as a perfect number" result
        with
            | ex -> printfn "Error%s" (ex.Message);
    }

为了调用这个函数，我使用了一个小的辅助函数来运行它：

 let runPerfects n =
    [n]
        |> Seq.map perfectNumbersAsync
        |> Async.Parallel
        |> Async.RunSynchronously
        |> ignore

异步计算的结果被忽略，因为我在 perfectNumbersAsync 函数中显示它。

上面的代码编译并运行，但它仍然只使用一个内核（尽管在计算第 9 个完美数时它运行速度快了 10 秒）。恐怕它与辅助函数 PowShift 和 isPrime 有关系，但我不确定。我是否必须将这些辅助函数的代码放在 perfectNumbersAsync 的异步块中？它不会提高可读性...

我玩 F# 的次数越多，我就越学会欣赏这种语言，但在这种情况下，我有时需要一些专家 :)。

提前感谢您阅读本文，我只希望我让自己有点清楚......

罗伯特。

score 3 · Accepted Answer

关于速度和并行性的快速评论，

你isPrime是 O(sqrt(n))，每个连续的 n 大约是最后一个的 2 x 大，因此计算大约需要 1.5 x 的时间，这意味着计算最后一个数字需要更长的时间

我已经对素数进行了一些黑客攻击，我发现一些有用的东西是：

对于大 N，（您正在测试 20 位数字），素数密度实际上非常低，因此您将通过合数进行大量除法。更好的方法是预先计算一个素数表（使用筛子），直到某个最大限制（可能由内存量决定）。请注意，您最有可能找到数字较小的因子。一旦您的表格内存不足，您可以使用现有函数测试其余数字，起点更大。
另一种方法是在检查中使用多个线程。例如，您当前检查x,x+4,x+6...为因子。通过稍微聪明一点，您可以在 1 个线程中执行与 1 mod 3 一致的数字，并在另一个线程中执行与 2 mod 3 一致的数字。

No. 2 最简单，但 No. 1 更有效，并且提供了使用 OutOfMemoryExceptions 进行控制流的潜力，这总是很有趣

编辑： 所以我实现了这两个想法，它几乎立即找到 2305843008139952128，在我的计算机（四核 AMD 3200）上找到 2658455991569831744654692615953842176 需要 7 分钟。大部分时间都花在检查 2^61 是否是素数上，因此更好的算法可能会更好地检查素数：这里的代码

let swatch = new System.Diagnostics.Stopwatch()
swatch.Start()
let inline PowShift (exp:int32) = 1I <<< exp ;;
let limit = 10000000 //go to a limit, makes table gen slow, but should pay off
printfn "making table"
//returns an array of all the primes up to limit
let table =
    let table = Array.create limit true //use bools in the table to save on memory
    let tlimit = int (sqrt (float limit)) //max test no for table, ints should be fine
    table.[1] <- false //special case
    [2..tlimit] 
    |> List.iter (fun t -> 
        if table.[t]  then //simple optimisation
            let mutable v = t*2
            while v < limit do
                table.[v] <- false
                v <- v + t)
    let out = Array.create (50847534) 0I //wolfram alpha provides pi(1 billion) - want to minimize memory
    let mutable idx = 0
    for x in [1..(limit-1)] do
        if table.[x] then
            out.[idx] <- bigint x
            idx <- idx + 1
    out |> Array.filter (fun t -> t <> 0I) //wolfram no is for 1 billion as limit, we use a smaller number
printfn "table made"

let rec isploop testprime incr max n=
    if testprime > max then true
    else if n % testprime = 0I then false
    else isploop (testprime + incr) incr max n

let isPrime ( n : bigint) = 
    //first test the table
    let maxFactor = bigint(sqrt(float n))
    match table |> Array.tryFind (fun t -> n % t = 0I && t <= maxFactor) with
    |Some(t) -> 
        false
    |None -> //now slow test
        //I have 4 cores so
        let bases = [|limit;limit+1;limit+3;limit+4|] //uses the fact that 10^x congruent to 1 mod 3
        //for 2 cores, drop last 2 terms above and change 6I to 3I
        match bases |> Array.map (fun t -> async {return isploop (bigint t) 6I maxFactor n}) |> Async.Parallel |> Async.RunSynchronously |> Array.tryFind (fun t -> t = false) with
        |Some(t) -> false
        |None -> true


let pcount = ref 0
let perfectNumbersTwo (n : int) =  
    seq { for i in 2..n do 
           if (isPrime (bigint i)) then
                if (PowShift i) - 1I |> isPrime then
                    pcount := !pcount + 1
                    if !pcount = 9 then
                        swatch.Stop()
                        printfn "total time %f seconds, %i:%i m:s"  (swatch.Elapsed.TotalSeconds) (swatch.Elapsed.Minutes) (swatch.Elapsed.Seconds)
                    yield PowShift (i-1) * ((PowShift i)-1I)
        } 


perfectNumbersTwo 62 |> Seq.iter (printfn "PERFECT: %A") //62 gives 9th number

printfn "done"
System.Console.Read() |> ignore

score 3 · Accepted Answer

@Jeffrey Sax 的评论绝对有趣，所以我花了一些时间做了一个小实验。Lucas-Lehmer 测试的编写如下：

let lucasLehmer p =
    let m = (PowShift p) - 1I
    let rec loop i acc =
        if i = p-2 then acc
        else loop (i+1) ((acc*acc - 2I)%m)
    (loop 0 4I) = 0I

通过 Lucas-Lehmer 检验，我可以非常快速地得到前几个完美数：

let mersenne (i: int) =     
    if i = 2 || (isPrime (bigint i) && lucasLehmer i) then
        let p = PowShift i
        Some ((p/2I) * (p-1I))
    else None

let runPerfects n =
    seq [1..n]
        |> Seq.choose mersenne
        |> Seq.toArray

let m1 = runPerfects 2048;; // Real: 00:00:07.839, CPU: 00:00:07.878, GC gen0: 112, gen1: 2, gen2: 1

Lucas-Lehmer 检验有助于减少检查素数的时间。我们没有测试 2^p-1 的可除性O(sqrt(2^p-1))，而是使用最多为的素数测试O(p^3)。有了n = 2048，我可以在 7.83 秒内找到前 15 个梅森数。第 15 个梅森数是i = 1279770 个数字。

我尝试在F#runPerfects Powerpack 中使用 PSeq 模块进行并行化。PSeq 不保留原始序列的顺序，所以公平地说，我已经对输出序列进行了排序。由于素数测试在各个指标之间相当平衡，因此结果非常令人鼓舞：

#r "FSharp.Powerpack.Parallel.Seq.dll"    
open Microsoft.FSharp.Collections

let runPerfectsPar n =
    seq [1..n]
        |> PSeq.choose mersenne
        |> PSeq.sort (* align with sequential version *)
        |> PSeq.toArray 

let m2 = runPerfectsPar 2048;; // Real: 00:00:02.288, CPU: 00:00:07.987, GC gen0: 115, gen1: 1, gen2: 0

使用相同的输入，并行版本需要 2.28 秒，这相当于我的四核机器上的 3.4 倍加速。Parallel.For我相信如果您使用构造并明智地划分输入范围，结果可能会进一步改善。

f# - 计算完美数时的 F# 并行化问题？

2 回答 2

Related

Reference