第一步是通过使每个迭代独立来使您的算法可并行化。我做的第一件事是添加一个调试语句来打印的最终值k
:
k += 1.0;
}
dbg!(k);
return 1.0 / total;
打印出来的,所以我可以用它来为每次迭代k = 2
创建一系列独立的值:k
(0..=iterations) // [0, 1, 2] for iterations = 2
我们将遍历该范围内的元素,而不是使用您拥有的 epsilon 检查:
pub fn estimate_pi(iterations: usize) -> f64 {
let mut total: f64 = 0.0;
let factor1: f64 = 2.0;
let factor: f64 = (factor1.sqrt() * 2.0) / 9801.0;
for i in 0..=iterations {
let k: f64 = i as f64;
let num: f64 = factorial(4.0 * k) * (1103.0 + (26390.0 * k));
let numm: f64 = 396.0;
let den: f64 = factorial(k).powf(4.0) * numm.powf(4.0 * k);
let term: f64 = factor * num / den;
total += term;
}
return 1.0 / total;
}
// call estimate_pi(2)
Total 只是所有迭代的总和,因此我们可以将其从循环转换为 map-reduce 操作。对于范围内的每个数字,我们计算term
. 然后,我们使用fold
(reduce) 来计算总和。
pub fn estimate_pi(iterations: usize) -> f64 {
let factor1: f64 = 2.0;
let factor: f64 = (factor1.sqrt() * 2.0) / 9801.0;
let sum: f64 = (0..=iterations).into_iter().map(|i| {
let k: f64 = i as f64;
let num: f64 = factorial(4.0 * k) * (1103.0 + (26390.0 * k));
let numm: f64 = 396.0;
let den: f64 = factorial(k).powf(4.0) * numm.powf(4.0 * k);
let term: f64 = factor * num / den;
term
}).fold(0.0, |a, b| a + b);
return 1.0 / sum;
}
现在我们可以使用 rayon 的方法将其转换为并行操作。替换into_iter()
为into_par_iter()
和:fold(0.0, |a, b| a + b)
_reduce(|| 0.0, |a, b| a + b)
pub fn estimate_pi(iterations: usize) -> f64 {
let factor1: f64 = 2.0;
let factor: f64 = (factor1.sqrt() * 2.0) / 9801.0;
// map is now a parallel map, and reduce is a parallel reduce
let sum: f64 = (0..=iterations).into_par_iter().map(|i| {
let k: f64 = i as f64;
let num: f64 = factorial(4.0 * k) * (1103.0 + (26390.0 * k));
let numm: f64 = 396.0;
let den: f64 = factorial(k).powf(4.0) * numm.powf(4.0 * k);
let term: f64 = factor * num / den;
term
}).reduce(|| 0.0, |a, b| a + b);
return 1.0 / sum;
}
现在稍微清理一下代码以使其更惯用:
- 在适当的地方删除显式输入
- 使用隐式返回
- 对 sqrt(2) 使用常量
- 更有意义的变量名
- 嵌入
396
表达式
use std::f64::consts::*;
pub fn estimate_pi(iterations: usize) -> f64 {
let factor = (SQRT_2 * 2.0) / 9801.0;
let sum = (0..=iterations).into_par_iter().map(|i| {
let k = i as f64;
let numerator = factorial(4.0 * k) * (1103.0 + (26390.0 * k));
let denominator = factorial(k).powf(4.0) * (396_f64).powf(4.0 * k);
factor * numerator / denominator
}).reduce(|| 0.0, |a, b| a + b);
1.0 / sum
}
作为最后一步,我们也可以进行factorial
并行处理:
// now have to call this with a `usize`
pub fn factorial(n: usize) -> f64 {
let out = (1..=n).into_par_iter().reduce(|| 1, |a, b| a * b);
out as f64
}
pub fn estimate_pi(iterations: usize) -> f64 {
let factor = (SQRT_2 * 2.0) / 9801.0;
let sum = (0..=iterations).into_par_iter().map(|i| {
let k = i as f64;
// notice we now pass the `i: usize` in here
let numerator = factorial(4 * i) * (1103.0 + (26390.0 * k));
let denominator = factorial(i).powf(4.0) * (396_f64).powf(4.0 * k);
factor * numerator / denominator
}).reduce(|| 0.0, |a, b| a + b);
1.0 / sum
}
最终代码
use rayon::prelude::*;
use std::f64::consts::*;
pub fn factorial(n: usize) -> f64 {
let out = (1..=n).into_par_iter().reduce(|| 1, |a, b| a * b);
out as f64
}
pub fn estimate_pi(iterations: usize) -> f64 {
let factor = (SQRT_2 * 2.0) / 9801.0;
let sum = (0..=iterations).into_par_iter().map(|i| {
let k = i as f64;
let numerator = factorial(4 * i) * (1103.0 + (26390.0 * k));
let denominator = factorial(i).powf(4.0) * (396_f64).powf(4.0 * k);
factor * numerator / denominator
}).reduce(|| 0.0, |a, b| a + b);
1.0 / sum
}
fn main() {
// our algorithm results in the same value as the constant
println!("pi_a: {:.60}", estimate_pi(2));
println!("pi_c: {:.60}", PI);
}
输出
pi_a: 3.141592653589793115997963468544185161590576171875000000000000
pi_c: 3.141592653589793115997963468544185161590576171875000000000000
操场
建议
您应该使用不同数量的并行度对不同版本进行基准测试,以查看性能或多或少。可能是人造丝并行迭代会导致性能降低,因为您的总迭代次数很少。
您也可以考虑使用查找表来查找阶乘,因为n <= k * 4 <= 8
:
pub fn factorial(n: usize) -> f64 {
const TABLE: [f64; 9] = [
1.0, // 0!
1.0, // 1!
2.0, // 2!
6.0, // 3!
24.0, // 4!
120.0, // 5!
720.0, // 6!
5040.0, // 7!
40320.0, // 8!
];
TABLE[n]
}
操场
当然,启用内联也会有所帮助。