我需要从Vec
生产中相当大的 a 中获取前 N 个项目。目前我这样做是这样低效的:
let mut v = vec![6, 4, 3, 7, 2, 1, 5];
v.sort_unstable();
v = v[0..3].to_vec();
在 C++ 中,我会使用,但在Rust 文档std::partial_sort
中找不到等效项。
我只是忽略了它,还是它不存在(还)?
我需要从Vec
生产中相当大的 a 中获取前 N 个项目。目前我这样做是这样低效的:
let mut v = vec![6, 4, 3, 7, 2, 1, 5];
v.sort_unstable();
v = v[0..3].to_vec();
在 C++ 中,我会使用,但在Rust 文档std::partial_sort
中找不到等效项。
我只是忽略了它,还是它不存在(还)?
标准库不包含此功能,但看起来lazysort
crate 正是您所需要的:
那么惰性排序有什么意义呢?根据链接的博客文章,当您不需要或不打算需要每个值时,它们很有用;例如,您可能只需要较大集合中的前 1,000 个有序值。
#![feature(test)]
extern crate lazysort;
extern crate rand;
extern crate test;
use std::cmp::Ordering;
trait SortLazy<T> {
fn sort_lazy<F>(&mut self, cmp: F, n: usize)
where
F: Fn(&T, &T) -> Ordering;
unsafe fn sort_lazy_fast<F>(&mut self, cmp: F, n: usize)
where
F: Fn(&T, &T) -> Ordering;
}
impl<T> SortLazy<T> for [T] {
fn sort_lazy<F>(&mut self, cmp: F, n: usize)
where
F: Fn(&T, &T) -> Ordering,
{
fn sort_lazy<F, T>(data: &mut [T], accu: &mut usize, cmp: &F, n: usize)
where
F: Fn(&T, &T) -> Ordering,
{
if !data.is_empty() && *accu < n {
let mut pivot = 1;
let mut lower = 0;
let mut upper = data.len();
while pivot < upper {
match cmp(&data[pivot], &data[lower]) {
Ordering::Less => {
data.swap(pivot, lower);
lower += 1;
pivot += 1;
}
Ordering::Greater => {
upper -= 1;
data.swap(pivot, upper);
}
Ordering::Equal => pivot += 1,
}
}
sort_lazy(&mut data[..lower], accu, cmp, n);
sort_lazy(&mut data[upper..], accu, cmp, n);
} else {
*accu += 1;
}
}
sort_lazy(self, &mut 0, &cmp, n);
}
unsafe fn sort_lazy_fast<F>(&mut self, cmp: F, n: usize)
where
F: Fn(&T, &T) -> Ordering,
{
fn sort_lazy<F, T>(data: &mut [T], accu: &mut usize, cmp: &F, n: usize)
where
F: Fn(&T, &T) -> Ordering,
{
if !data.is_empty() && *accu < n {
unsafe {
use std::mem::swap;
let mut pivot = 1;
let mut lower = 0;
let mut upper = data.len();
while pivot < upper {
match cmp(data.get_unchecked(pivot), data.get_unchecked(lower)) {
Ordering::Less => {
swap(
&mut *(data.get_unchecked_mut(pivot) as *mut T),
&mut *(data.get_unchecked_mut(lower) as *mut T),
);
lower += 1;
pivot += 1;
}
Ordering::Greater => {
upper -= 1;
swap(
&mut *(data.get_unchecked_mut(pivot) as *mut T),
&mut *(data.get_unchecked_mut(upper) as *mut T),
);
}
Ordering::Equal => pivot += 1,
}
}
sort_lazy(&mut data[..lower], accu, cmp, n);
sort_lazy(&mut data[upper..], accu, cmp, n);
}
} else {
*accu += 1;
}
}
sort_lazy(self, &mut 0, &cmp, n);
}
}
#[cfg(test)]
mod tests {
use test::Bencher;
use lazysort::Sorted;
use std::collections::BinaryHeap;
use SortLazy;
use rand::{thread_rng, Rng};
const SIZE_VEC: usize = 100_000;
const N: usize = 42;
#[bench]
fn sort(b: &mut Bencher) {
b.iter(|| {
let mut rng = thread_rng();
let mut v: Vec<i32> = std::iter::repeat_with(|| rng.gen())
.take(SIZE_VEC)
.collect();
v.sort_unstable();
})
}
#[bench]
fn lazysort(b: &mut Bencher) {
b.iter(|| {
let mut rng = thread_rng();
let v: Vec<i32> = std::iter::repeat_with(|| rng.gen())
.take(SIZE_VEC)
.collect();
let _: Vec<_> = v.iter().sorted().take(N).collect();
})
}
#[bench]
fn lazysort_in_place(b: &mut Bencher) {
b.iter(|| {
let mut rng = thread_rng();
let mut v: Vec<i32> = std::iter::repeat_with(|| rng.gen())
.take(SIZE_VEC)
.collect();
v.sort_lazy(i32::cmp, N);
})
}
#[bench]
fn lazysort_in_place_fast(b: &mut Bencher) {
b.iter(|| {
let mut rng = thread_rng();
let mut v: Vec<i32> = std::iter::repeat_with(|| rng.gen())
.take(SIZE_VEC)
.collect();
unsafe { v.sort_lazy_fast(i32::cmp, N) };
})
}
#[bench]
fn binaryheap(b: &mut Bencher) {
b.iter(|| {
let mut rng = thread_rng();
let v: Vec<i32> = std::iter::repeat_with(|| rng.gen())
.take(SIZE_VEC)
.collect();
let mut iter = v.iter();
let mut heap: BinaryHeap<_> = iter.by_ref().take(N).collect();
for i in iter {
heap.push(i);
heap.pop();
}
let _ = heap.into_sorted_vec();
})
}
}
running 5 tests
test tests::binaryheap ... bench: 3,283,938 ns/iter (+/- 413,805)
test tests::lazysort ... bench: 1,669,229 ns/iter (+/- 505,528)
test tests::lazysort_in_place ... bench: 1,781,007 ns/iter (+/- 443,472)
test tests::lazysort_in_place_fast ... bench: 1,652,103 ns/iter (+/- 691,847)
test tests::sort ... bench: 5,600,513 ns/iter (+/- 711,927)
test result: ok. 0 passed; 0 failed; 0 ignored; 5 measured; 0 filtered out
这段代码让我们看到它lazysort
比使用BinaryHeap
. 我们还可以看到,当增加时,BinaryHeap
解决方案会变得更糟。N
问题lazysort
在于它创建了第二个Vec<_>
. “更好”的解决方案是就地实现部分排序。我提供了这样一个实现的例子。
请记住,所有这些解决方案都会带来开销。N
大约什么时候SIZE_VEC / 3
,经典sort
胜出。
您可以提交一个 RFC/issue 来询问有关将此功能添加到标准库的问题。