multithreading - 如何使用 Rayon 并行计算 PI

Question

我的代码可以在没有 Rayon 的情况下使用 Ramanujan 公式计算 Pi，我想为并行线程实现 Rayon，因为这是我的项目。

我知道我需要使用这个

use rayon::prelude::*;
fn sum_of_squares(input: &[f64]) ->f64 {
    for i in total.iter() // <-- just change that!
         .map(|&i| i * i)
         .sum()
}

但我仍然不明白该怎么做。

这是我的代码

use rayon::prelude::*;

pub fn factorial(n: f64) -> f64 {
    if n == 0.0 {
        return 1.00;
    } else {
        let result: f64 = factorial(n - 1.0) * n;

        return result;
    }
}

pub fn estimate_pi() -> f64 {
    let mut total: f64 = 0.0;
    let mut k: f64 = 0.0;
    let factor1: f64 = 2.0;
    let factor: f64 = (factor1.sqrt() * 2.0) / 9801.0;

    loop {
        let num: f64 = factorial(4.0 * k) * (1103.0 + (26390.0 * k));
        let numm: f64 = 396.0;
        let den: f64 = factorial(k).powf(4.0) * numm.powf(4.0 * k);
        let term: f64 = factor * num / den;

        total += term;

        if term.abs() < 1e-15 {
            break;
        }

        k += 1.0;
    }

    return 1.0 / total;
}

fn main() {
    println!("{}", estimate_pi());
}

操场

score 1 · Accepted Answer

第一步是通过使每个迭代独立来使您的算法可并行化。我做的第一件事是添加一个调试语句来打印的最终值k：

        k += 1.0;
    }

    dbg!(k);

    return 1.0 / total;

打印出来的，所以我可以用它来为每次迭代k = 2创建一系列独立的值：k

(0..=iterations) // [0, 1, 2] for iterations = 2

我们将遍历该范围内的元素，而不是使用您拥有的 epsilon 检查：

pub fn estimate_pi(iterations: usize) -> f64 {
    let mut total: f64 = 0.0;
    let factor1: f64 = 2.0;
    let factor: f64 = (factor1.sqrt() * 2.0) / 9801.0;

    for i in 0..=iterations {
        let k: f64 = i as f64;

        let num: f64 = factorial(4.0 * k) * (1103.0 + (26390.0 * k));
        let numm: f64 = 396.0;
        let den: f64 = factorial(k).powf(4.0) * numm.powf(4.0 * k);
        let term: f64 = factor * num / den;

        total += term;
    }

    return 1.0 / total;
}

// call estimate_pi(2)

Total 只是所有迭代的总和，因此我们可以将其从循环转换为 map-reduce 操作。对于范围内的每个数字，我们计算term. 然后，我们使用fold(reduce) 来计算总和。

pub fn estimate_pi(iterations: usize) -> f64 {
    let factor1: f64 = 2.0;
    let factor: f64 = (factor1.sqrt() * 2.0) / 9801.0;

    let sum: f64 = (0..=iterations).into_iter().map(|i| {
        let k: f64 = i as f64;

        let num: f64 = factorial(4.0 * k) * (1103.0 + (26390.0 * k));
        let numm: f64 = 396.0;
        let den: f64 = factorial(k).powf(4.0) * numm.powf(4.0 * k);
        let term: f64 = factor * num / den;

        term
    }).fold(0.0, |a, b| a + b);

    return 1.0 / sum;
}

现在我们可以使用 rayon 的方法将其转换为并行操作。替换into_iter()为into_par_iter()和：fold(0.0, |a, b| a + b)_reduce(|| 0.0, |a, b| a + b)

pub fn estimate_pi(iterations: usize) -> f64 {
    let factor1: f64 = 2.0;
    let factor: f64 = (factor1.sqrt() * 2.0) / 9801.0;

    // map is now a parallel map, and reduce is a parallel reduce
    let sum: f64 = (0..=iterations).into_par_iter().map(|i| {
        let k: f64 = i as f64;

        let num: f64 = factorial(4.0 * k) * (1103.0 + (26390.0 * k));
        let numm: f64 = 396.0;
        let den: f64 = factorial(k).powf(4.0) * numm.powf(4.0 * k);
        let term: f64 = factor * num / den;

        term
    }).reduce(|| 0.0, |a, b| a + b);

    return 1.0 / sum;
}

现在稍微清理一下代码以使其更惯用：

在适当的地方删除显式输入
使用隐式返回
对 sqrt(2) 使用常量
更有意义的变量名
嵌入396表达式

use std::f64::consts::*;

pub fn estimate_pi(iterations: usize) -> f64 {
    let factor = (SQRT_2 * 2.0) / 9801.0;

    let sum = (0..=iterations).into_par_iter().map(|i| {
        let k = i as f64;

        let numerator = factorial(4.0 * k) * (1103.0 + (26390.0 * k));
        let denominator = factorial(k).powf(4.0) * (396_f64).powf(4.0 * k);

        factor * numerator / denominator
    }).reduce(|| 0.0, |a, b| a + b);

    1.0 / sum
}

作为最后一步，我们也可以进行factorial并行处理：

// now have to call this with a `usize`
pub fn factorial(n: usize) -> f64 {
    let out = (1..=n).into_par_iter().reduce(|| 1, |a, b| a * b);
    out as f64
}

pub fn estimate_pi(iterations: usize) -> f64 {
    let factor = (SQRT_2 * 2.0) / 9801.0;

    let sum = (0..=iterations).into_par_iter().map(|i| {
        let k = i as f64;

        // notice we now pass the `i: usize` in here
        let numerator = factorial(4 * i) * (1103.0 + (26390.0 * k));
        let denominator = factorial(i).powf(4.0) * (396_f64).powf(4.0 * k);

        factor * numerator / denominator
    }).reduce(|| 0.0, |a, b| a + b);

    1.0 / sum
}

最终代码

use rayon::prelude::*;
use std::f64::consts::*;

pub fn factorial(n: usize) -> f64 {
    let out = (1..=n).into_par_iter().reduce(|| 1, |a, b| a * b);
    out as f64
}

pub fn estimate_pi(iterations: usize) -> f64 {
    let factor = (SQRT_2 * 2.0) / 9801.0;

    let sum = (0..=iterations).into_par_iter().map(|i| {
        let k = i as f64;

        let numerator = factorial(4 * i) * (1103.0 + (26390.0 * k));
        let denominator = factorial(i).powf(4.0) * (396_f64).powf(4.0 * k);

        factor * numerator / denominator
    }).reduce(|| 0.0, |a, b| a + b);

    1.0 / sum
}

fn main() {
    // our algorithm results in the same value as the constant
    println!("pi_a: {:.60}", estimate_pi(2));
    println!("pi_c: {:.60}", PI);
}

输出

pi_a: 3.141592653589793115997963468544185161590576171875000000000000
pi_c: 3.141592653589793115997963468544185161590576171875000000000000

操场

建议

您应该使用不同数量的并行度对不同版本进行基准测试，以查看性能或多或少。可能是人造丝并行迭代会导致性能降低，因为您的总迭代次数很少。

您也可以考虑使用查找表来查找阶乘，因为n <= k * 4 <= 8：

pub fn factorial(n: usize) -> f64 {
    const TABLE: [f64; 9] = [
        1.0,     // 0!
        1.0,     // 1!
        2.0,     // 2!
        6.0,     // 3!
        24.0,    // 4!
        120.0,   // 5!
        720.0,   // 6!
        5040.0,  // 7!
        40320.0, // 8!
    ];

    TABLE[n]
}

操场

当然，启用内联也会有所帮助。

multithreading - 如何使用 Rayon 并行计算 PI

1 回答 1

最终代码

建议

Related

Reference