c++ - 测量函数在其整个生命周期内在堆和堆栈上分配的总空间

Question

我有一个用 C++ 编写的递归函数，它使用new. 如何测量函数在其整个生命周期内在堆和堆栈上分配的空间总量？

这是一个如何测量堆栈的示例（它不是我的代码）。

unsigned int maxStackLocation ; 
unsigned int minStackLocation ; 
main () 
{ 
    //get the address of a local variable before calling Quicksort 
    //the stack grows down, so this is the max stack location 
    int localVariable;
    void *currPtrLocation = (void*)(&localVariable); 
    maxStackLocation = *(unsigned int *)(&currPtrLocation); 
    //we get the value for the minimum stack location in quicksort itself 
    //call quicksort 
    Quick (A, num); 
    space = maxStackLocation - minStackLocation;
} 
//some redundant function whose stack usage will be measured
void Quick (unsigned int A[], int num) 
{ 
    if (num <= 1) 
    { 
        //check the stack usage 
        //figure out where we are on the stack by looking at the byte  
        // address of the local variable 
        //we do this by making a pointer to a local variable, and then  
        //casting it to a integer 
        void *currPtrLocation = (void*)(&num); 
        unsigned int currStackLocation = *(unsigned int*)(&currPtrLocation); 
        if (currStackLocation < minStackLocation) 
            minStackLocation = currStackLocation; 
        return; 
    }
}

编辑
Borgleader 指出我最初的问题“测量最大空间函数在其整个生命周期内分配在堆和堆栈上”是不正确的。我已将“最大”更改为“总计”。

score 7 · Accepted Answer

Valgrind 实际上可以为您进行非常精确的测量。您所需要做的就是编写尽可能简单的示例来调用您的函数。

例如，一个程序只main()使用 for 循环打印其参数（传递给函数）并std::cout产生以下输出：

zaufi@gentop /work/tests $ valgrind --tool=drd --show-stack-usage=yes ./stack-usage-test-1
==26999== drd, a thread error detector
==26999== Copyright (C) 2006-2012, and GNU GPL'd, by Bart Van Assche.
==26999== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==26999== Command: ./stack-usage-test-1
==26999==
./stack-usage-test-1
==26999== thread 1 finished and used 11944 bytes out of 8388608 on its stack. Margin: 8376664 bytes.
==26999==
==26999== For counts of detected and suppressed errors, rerun with: -v
==26999== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

正如人们所看到的，唯一的线程将在堆栈上消耗近 12K。并且绝对大部分空间在. main()为了进行更好的测量，有必要在单独的线程中运行目标函数。像这样：

#include <iostream>
#include <thread>

int main(int argc, char* argv[])
{
    auto thr = std::thread([](){std::cout << __PRETTY_FUNCTION__ << std::endl;});
    thr.join();
    return 0;
}

此代码将产生以下输出：

==27029== thread 2 finished and used 1840 bytes out of 8384512 on its stack. Margin: 8382672 bytes.
==27029== thread 1 finished and used 11992 bytes out of 8388608 on its stack. Margin: 8376616 bytes.

那肯定更好。所以测量一个什么都不做的函数，你有一个最小的堆栈使用量（在最后一个例子中它大约是 1840 字节）。因此，如果您要在单独的线程中调用目标函数，则必须从结果中减去 1840 个字节（甚至更少）...

您可以使用以下简单算法自行完成几乎相同的操作：

从堆中分配 8M 缓冲区（linux/pthreads 的默认堆栈大小，但实际上您可以分配任何其他（合理）大小）
用一些图案填充它
派生一个新线程，并将堆栈分配给刚刚分配和填充的区域（使用pthread_attr_setstack()（或朋友））
只要您可以在该线程中调用目标函数并退出
在主线程中，pthread_join()成功后，分析您的缓冲区以找到一个区域，您之前分配的模式没有保留

（甚至）在这种情况下，您最好对一个什么都不做的线程进行第一次测量——只是为了获得如上所述的最小使用量。

score 3 · Accepted Answer

您测量堆栈使用情况的示例非常准确，并且会给您（非常接近）正确的答案。我也有一个关于确定堆使用情况的建议，但首先......

在他的书“更有效的 C++”第 27 项中，Scott Meyers（顺便说一句，我最喜欢的 C++ 作家）描述了为什么在一般意义上和在可移植或半可移植 C++ 的范围内不可能明确区分对象是否已被分配在堆栈、堆上或静态分配。但他的解释是从对象本身的角度来看分配的对象，而不是从请求这种分配的代码的角度来看分配。从请求分配的代码的角度来看，您可以根据您请求分配的方式确定一个对象（无论其类型如何）是在堆上分配还是在堆栈上分配。（边栏：当然这取决于您的系统实际使用运行时堆栈，有些人很快指出 C++ 标准不能保证，但是这些标准势利小人都没有指出任何不使用运行时堆栈的符合标准的 C++ 编译器的实际示例。回到我们的讨论......）例如：

int Fred;  // Fred is allocated on the stack
int *Barney = new int; // Barney is allocated on the heap

Fred 超出范围时会破坏；当你处理完 Barney 后，你必须明确删除他。这同样适用于用户定义的类型，包括 STL 类型：

std::string Fred; // Fred is allocated on the stack (see disclaimer*)
std::string *Barney = new std::string(); // Barney is allocated on the heap

// *Disclaimer:  May not be totally true on some small embedded processors that
//               implement the runtime stack in a special small address space
//               separate from “normal” RAM.

...但是：虽然称为 Fred 的 std::string 对象本身是在堆栈上分配的，但我们当然知道它的数据不能在堆栈上分配，因为它的数据是可变长度的，并且必须能够无限扩展为我们给它添加字符。所以只有 std::string 的管理部分（即 std::string 对象本身）可以被堆栈分配。

回到 Scott Meyer 的第 27 条，它适用于本次讨论：

myClass *someFunction() {
    int someInputParameter = 42;
    myClass Fred(someInputParameter); // Fred is allocated on the stack
    myClass *Barney = new myClass(someInputParameter); // Barney is allocated on the heap
    return Barney;
}

根据 Scott Meyer 的第 27 条，Fred 和 Barney 都不能自己知道它们是堆栈分配还是堆分配，或者就此而言，静态分配（即“全局”或“文件范围”或“某些其他类”）。但是在 someFunction() 中，我们可以知道，因为我们知道我们是如何请求分配的。

现在，回到你的问题：

正如我所说，您展示的用于测量堆栈使用情况的示例非常准确。您在 main() 中肯定知道“localVariable”是在堆栈上分配的。而且您在 Quick() 中肯定知道“currPtrLocation”在堆栈上。如果你的堆栈变小了，你肯定知道因为 main() 调用 Quick()，Quick() 中的 'currPtrLocation' 的地址将低于 main() 中的 'localVariable' 的地址，所以你的算术有效. 它不会给出完全正确的答案，因为 Quick() 中可能有其他变量分配在堆栈位置但低于“currPtrLocation”，但您的测量结果将非常接近；当然和你应该关心的一样近。无论错误量是多少，无论调用堆栈有多深，它都是一个固定错误，并且如果您递归任何显着次数，误差占整体的百分比可以忽略不计。最后一件事：如果您希望解决方案即使对于堆栈增长的系统也可移植，您可以检测“localVariable”和“currPtrLocation”地址之间差异的方向并计算差异（即交换参数减法）相应地。

关于堆使用：

你的系统可能有一些钩子可以用来测量堆使用统计，但如果你不想深入研究，你可以按照 Synxis 的建议做一些事情，但是这个答案存在一些严重的问题。首先，Synxis 的建议忽略了所有的堆栈开销，包括返回地址、溢出的寄存器等，这些都不是微不足道的。其次，如果您使用问题中显示的堆栈测量技术，则无论如何都无需费力地维护堆栈分配对象的分配总和。第三，假设您只对堆分配对象的分配求和（您可以使用 sizeof() 运算符获得其大小），您不能只对它们求和。当您删除它们时，您必须减去分配的大小，以便您的“totalSize”保持所有的总大小目前分配的对象，并且您希望单独跟踪 totalSize 的高水位线。这样你就可以知道你的算法在任何时候使用的最大内存量。如果不这样做，即使您只有 4G 的 RAM，当您的算法报告它使用了 758 GB 之类的东西时，您可能会感到困惑。最后，无论您是在堆栈还是堆上分配给定对象本身，它都可能在堆上进行自己的内部分配。（如果当您调用该对象的方法时它在堆栈上进行临时分配，除非您也将堆栈跟踪代码放入该方法中，否则您的堆栈总数不会像我已经描述的那样完全正确，但错误会继续或多或少保持不变，具体取决于方法是否执行可能改变其调用行为的条件。）无论如何，关于您在堆栈或堆上分配的这个“给定对象”；如果它分配自己的内部堆内存，除非您将堆分配跟踪添加到内部对象，否则您将错过分配给这些内部对象的堆内存量。所以也许你最好看看你的系统可能提供的堆统计钩子。

现在，假设您想自己跟踪堆分配......您可能会探索的另一个选择是编写一个基类，您的算法可以从中派生出它创建的所有对象。您的基类可以覆盖 operator new & operator delete，这样基类本身就可以使用基类中的静态类变量来执行所有的 totalSize 跟踪（添加“new”并减去“delete”）。

score 1 · Accepted Answer

拥有确切的总大小几乎是不可能的，因为它取决于您的平台、调用约定、优化模式等……但是您可以进行估计。我不计算参数，因为并不总是由被调用函数处理（取决于调用约定）。

没有 C++ 或 C 功能可以做到这一点。
我不知道有任何调试器会指示这个数字。我从来没有使用过 Valgrind 和 co，所以我不能告诉你他们是否可以告诉你这个数字（我怀疑他们可以）
所以，最后一个选择：你自己数......

unsigned int totalSize = 0;

int partition(unsigned int A[], int num, int left, int right, int pivot)
{
    /// UpdateTotalSize
    totalSize += sizeof(unsigned int);
    /// /UpdateTotalSize
    unsigned int value = A[pivot];
    A[pivot] = A[right];
    A[right] = value;

    /// UpdateTotalSize
    totalSize += sizeof(int);
    /// /UpdateTotalSize
    int store = left;

    for(int i = left; i < right; i++)
        if(A[i] < value)
        {
            // This variable does not count in the total size:
            // allocated, then deallocated, and so on...
            unsigned int tmp = A[i];
            A[i] = A[store];
            A[store] = tmp;
            store++;
        }

    /// UpdateTotalSize
    totalSize += sizeof(int);
    /// /UpdateTotalSize
    int tmp = A[store];
    A[store] = A[right];
    A[right] = tmp;
    return store;
}

void quick(unsigned int A[], int num, int left = -1, int right = -1)
{
    if(num <= 1)
        return;

    if(left <= -1)
        left = 0;
    if(right <= -1)
        right = right = num - 1;

    if(left >= right)
        return;

    /// UpdateTotalSize
    totalSize += sizeof(int);
    /// /UpdateTotalSize
    int pivot = (left + right) / 2;

    /// UpdateTotalSize
    totalSize += sizeof(int);
    /// /UpdateTotalSize
    int pivotNew = partition(A, num, left, right, pivot);

    quick(A, num, left, pivotNew - 1);
    quick(A, num, pivotNew + 1, right);
}

请注意，这只是对总大小的估计。此外，对于这个例子，它可以在精神上完成。

最后但同样重要的是：当优化开启时，这个计数是不可靠的，所以只在没有优化的情况下使用它！

c++ - 测量函数在其整个生命周期内在堆和堆栈上分配的总空间

3 回答 3

Related

Reference