performance - OOP 比结构编程慢得多。为什么以及如何解决？

Question

正如我在这篇文章的主题中提到的，我发现 OOP 比结构编程（意大利面条代码）慢。

我用 OOP 编写了一个模拟退火程序，然后删除一个类并以主要形式将其写入结构。突然它变得更快了。我在 OOP 程序的每次迭代中都调用了我删除的类。

还使用禁忌搜索进行了检查。结果相同。谁能告诉我为什么会发生这种情况以及如何在其他 OOP 程序上修复它？有什么技巧吗？例如缓存我的课程或类似的东西？

（程序是用 C# 编写的）

score 1 · Accepted Answer

如果您有一个高频循环，并且在该循环中创建新对象并且不经常调用其他函数，那么，是的，您会看到如果您可以避免这些news，例如通过重新使用对象，您可以节省大部分时间。

在new、构造函数、析构函数和垃圾回收之间，很少的代码会浪费大量时间。谨慎使用它们。

score 1 · Accepted Answer

内存访问经常被忽视。oo 倾向于在内存中布局数据的方式不利于循环中实践中的高效内存访问。考虑以下伪代码：

adult_clients = 0
for client in list_of_all_clients:
  if client.age >= AGE_OF_MAJORITY:
    adult_clients++

碰巧的是，在现代架构上，从内存中访问它的方式效率很低，因为它们喜欢访问大的连续内存行，但我们只关心client.age，以及我们拥有的所有clients；那些不会被布置在连续的内存中。

关注具有字段的对象会导致数据在内存中的布局方式是，保存相同类型信息的字段不会在连续的内存中布局。性能密集型代码往往涉及循环，这些循环经常查看具有相同概念含义的数据。将此类数据布局在连续内存中，有利于提高性能。

考虑Rust中的这两个示例：

// struct that contains an id, and an optiona value of whether the id is divisible by three
struct Foo {
    id         : u32,
    divbythree : Option<bool>,
}

fn main () {
  // create a pretty big vector of these structs with increasing ids, and divbythree initialized as None
    let mut vec_of_foos : Vec<Foo> = (0..100000000).map(|i| Foo{ id : i, divbythree : None }).collect();
    
    // loop over all hese vectors, determine if the id is divisible by three
    // and set divbythree accordingly
    let mut divbythrees = 0;
    for foo in vec_of_foos.iter_mut() {
        if foo.id % 3 == 0 {
            foo.divbythree = Some(true);
            divbythrees += 1;
        } else {
            foo.divbythree = Some(false);
        }
    }
    // print the number of times it was divisible by three
    println!("{}", divbythrees);
}

在我的系统上，实时时间rustc -O是0m0.436s；现在让我们考虑这个例子：

fn main () {
    // this time we create two vectors rather than a vector of structs
    let vec_of_ids             : Vec<u32>          = (0..100000000).collect();
    let mut vec_of_divbythrees : Vec<Option<bool>> = vec![None; vec_of_ids.len()];
    
    // but we basically do the same thing
    let mut divbythrees = 0;
    for i in 0..vec_of_ids.len(){
        if vec_of_ids[i] % 3 == 0 {
            vec_of_divbythrees[i] = Some(true);
            divbythrees += 1;
        } else {
            vec_of_divbythrees[i] = Some(false);
        }
    }
    println!("{}", divbythrees);
}

在相同的优化级别上，这在0m0.254 秒内运行——接近所需时间的一半。

尽管必须分配两个向量而不是一个向量，但将相似值存储在连续内存中几乎可以将执行时间减半。尽管显然 oo 方法提供了更好、更易于维护的代码。

Ps：在我看来，我可能应该解释为什么这很重要，因为这两种情况下的代码本身仍然一次索引一个字段的内存，而不是说，在堆栈上放一大片。原因是 cpu 缓存：当程序在某个地址请求内存时，它实际上会获取并缓存该地址周围的大量内存，如果再次快速请求它旁边的内存，那么它可以为它服务来自缓存，而不是来自实际的物理工作内存。当然，编译器也会因此更有效地向量化底层代码。

performance - OOP 比结构编程慢得多。为什么以及如何解决？

2 回答 2

Related

Reference