c - c：调试模糊内存泄漏的策略？

Question

我正在用 c 开发一个项目，我试图了解如何调试一个使我的程序崩溃的晦涩错误。它有点大，通过制作较小版本的代码来隔离问题的尝试不起作用。所以我试图想出一种方法来调试和查明内存泄漏。

我想出了以下计划：我知道问题来自于运行某个函数，并且该函数递归调用自身。所以我想我可以对我的程序内存分配进行快照。因为我不知道杰克在幕后发生了什么（我知道一点不足以在这种情况下有用）：

typedef struct record_mem {
    int num_allocs;
    int num_frees;
    int size_space;
    int num_structure_1;
    ...
    int num_structure_N;
    int num_records;
    struct record_mem *next;
} RECORD;
extern RECORD *top;
void pushmem(RECORD **top)
{
    RECORD *nnew = 0;
    RECORD *nnew = (RECORD *)malloc(sizeof(RECORD));
    nnew->num_allocs=1;
    nnew->num_frees=0;
    nnew->size_space=sizeof(RECORD);
    nnew->num_structure_1=0;
    ...
    nnew->num_structure_N=0;
    nnew->num_records=1;
    nnew->next=0;
    if(*top)
    {
        nnew->num_allocs+=(*top)->num_allocs;
        nnew->num_frees=(*top)->num_frees;
        nnew->size_space+=(*top)->size_space;
            nnew->num_structure_1=(*top)->num_allocs;
            ...
            nnew->num_structure_N=(*top)->num_allocs;
            nnew->num_records+=(*top)->num_records;
        nnew->next=*top;
    }
    *top=nnew;
}

我的想法是在我的程序崩溃之前打印出我保存的内存记录的内容（由于 GDB，我知道它在哪里崩溃）。

然后在整个程序中（对于我的程序中的每个数据结构，我都有一个类似的推送功能，如上）我可以简单地添加一个带有统计数据结构分配加上总堆栈（堆？）内存分配的函数（我可以跟踪的）。只要我觉得需要记录我的程序运行的快照，我就会简单地制作更多的 memory_record 结构。问题是，如果我无法以某种方式记录实际使用了多少内存，则此内存资产负债表记录将无济于事。

但是我该怎么做呢？另外，我将如何考虑悬空指针和泄漏？我正在使用 OS X，目前正在查找如何记录堆栈指针和其他内容。

编辑：既然你问：valgrind的输出：（closure（）是从main调用的函数，它返回错误的指针：它应该返回双向链表的头部，traversehashmap（）是从closure（）调用的函数我用来计算额外的节点并将其附加到链表中，并且它递归地调用自身，因为它需要在节点之间跳转。）

jason-danckss-macbook:project Jason$ valgrind --leak-check=full --tool=memcheck ./testc
Will attempt to compute closure of AB:
Result: testcl: 0x10000d0b0
==7682== Invalid read of size 8
==7682==    at 0x100001D4E: printrelation2 (relation.h:490)
==7682==    by 0x100003CFE: main (test-computation.c:47)
==7682==  Address 0x10000cee8 is 8 bytes inside a block of size 24 free'd
==7682==    at 0xD828: free (vg_replace_malloc.c:450)
==7682==    by 0x100001232: destroyrelation2 (relation.h:161)
==7682==    by 0x100003407: destroyallhashmap (computation.h:333)
==7682==    by 0x1000039E1: closure (computation.h:539)
==7682==    by 0x100003CBE: main (test-computation.c:38)
==7682== 
==7682== 
==7682== HEAP SUMMARY:
==7682==     in use at exit: 5,360 bytes in 48 blocks
==7682==   total heap usage: 99 allocs, 51 frees, 6,640 bytes allocated
==7682== 
==7682== 48 (24 direct, 24 indirect) bytes in 1 blocks are definitely lost in loss record 33 of 37
==7682==    at 0xC283: malloc (vg_replace_malloc.c:274)
==7682==    by 0x100001104: getnewrelation (relation.h:134)
==7682==    by 0x100001848: copyrelation (relation.h:343)
==7682==    by 0x100003991: closure (computation.h:531)
==7682==    by 0x100003CBE: main (test-computation.c:38)
==7682== 
==7682== 1,128 (24 direct, 1,104 indirect) bytes in 1 blocks are definitely lost in loss record 36 of 37
==7682==    at 0xC283: malloc (vg_replace_malloc.c:274)
==7682==    by 0x100002315: getnewholder (dependency.h:129)
==7682==    by 0x100003B17: main (test-computation.c:14)
==7682== 
==7682== LEAK SUMMARY:
==7682==    definitely lost: 48 bytes in 2 blocks
==7682==    indirectly lost: 1,128 bytes in 44 blocks
==7682==      possibly lost: 0 bytes in 0 blocks
==7682==    still reachable: 4,096 bytes in 1 blocks
==7682==         suppressed: 88 bytes in 1 blocks
==7682== Reachable blocks (those to which a pointer was found) are not shown.
==7682== To see them, rerun with: --leak-check=full --show-reachable=yes
==7682== 
==7682== For counts of detected and suppressed errors, rerun with: -v
==7682== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0)

score 5 · Accepted Answer

您是否尝试过valgrind（及其memcheck）？

$ valgrind --tool=memcheck --leak-check=full ./yourprogram

（最好用编译你的程序-g）

编辑：抱歉，我没有读到您不想使用 Valgrind，但正如dureuill在您的帖子中的评论中指出的那样，它非常有用，学习它是值得的。

另一个信息：内存泄漏是由free一些mallocor之后的缺失引起的realloc（您可以在此处查看 C 中的一个简单示例）。您还可以使用grep(with-n获取行和-r递归搜索) 列出程序中的所有内存分配行；并尝试通过调用来匹配它们中的每一个free。然而，这可能很乏味，我真的相信使用 Valgrind 会更快。

score 4 · Accepted Answer

从您的valgrind输出：

这可能是导致您的问题的原因：

==7682== Invalid read of size 8
==7682==    at 0x100001D4E: printrelation2 (relation.h:490)
==7682==    by 0x100003CFE: main (test-computation.c:47)
==7682==  Address 0x10000cee8 is 8 bytes inside a block of size 24 free'd
==7682==    at 0xD828: free (vg_replace_malloc.c:450)
==7682==    by 0x100001232: destroyrelation2 (relation.h:161)
==7682==    by 0x100003407: destroyallhashmap (computation.h:333)
==7682==    by 0x1000039E1: closure (computation.h:539)
==7682==    by 0x100003CBE: main (test-computation.c:38)

让我们深入

==7682== Invalid read of size 8
==7682==    at 0x100001D4E: printrelation2 (relation.h:490)
==7682==    by 0x100003CFE: main (test-computation.c:47)

这是您的错误的摘要。printrelation2在relation.h 的第490 行，您访问了一个未分配（或先前分配然后被释放）的8 个字节的内存位置。

==7682==  Address 0x10000cee8 is 8 bytes inside a block of size 24 free'd

访问的地址在一个大小为 24 的块内是 8 个字节长，即可能是一个大小为 24 的结构中的一个大小为 8 个字节的字段（寻找这样的结构），您之前释放了这个地址。

==7682==    at 0xD828: free (vg_replace_malloc.c:450)
==7682==    by 0x100001232: destroyrelation2 (relation.h:161)
==7682==    by 0x100003407: destroyallhashmap (computation.h:333)
==7682==    by 0x1000039E1: closure (computation.h:539)
==7682==    by 0x100003CBE: main (test-computation.c:38)

这是导致在程序崩溃时释放您引用的地址的调用堆栈。它以 free 开头，这很正常，因为您可能使用该free函数来释放内存。但是文件和行是标准库，所以不是很相关。不过相关的是，这个 free 是从destroyrelation2第 161 行的 relative.h 中调用的，这是有问题的 free。destroyrelation2自身被调用destroyallhashmap，被调用closure，被调用，main在test-computation.c的第38行被调用。您需要找出分配中的哪些错误导致您重用了 printrelation2 中的指针，该指针之前在第 38 行的 main 中释放。

之后报告的内存泄漏是存在的，但不太可能是导致您崩溃的原因。

valgrind 的输出现在更清晰了吗？

注意 1：修复 segfault 后，此内存泄漏报告可能会发生变化，但就像现在一样，我是这样解释的：

==7682== 48 (24 direct, 24 indirect) bytes in 1 blocks are definitely lost in loss record 33 of 37
==7682==    at 0xC283: malloc (vg_replace_malloc.c:274)
==7682==    by 0x100001104: getnewrelation (relation.h:134)
==7682==    by 0x100001848: copyrelation (relation.h:343)
==7682==    by 0x100003991: closure (computation.h:531)
==7682==    by 0x100003CBE: main (test-computation.c:38)
==7682== 
==7682== 1,128 (24 direct, 1,104 indirect) bytes in 1 blocks are definitely lost in loss record 36 of 37
==7682==    at 0xC283: malloc (vg_replace_malloc.c:274)
==7682==    by 0x100002315: getnewholder (dependency.h:129)
==7682==    by 0x100003B17: main (test-computation.c:14)
==7682== 
==7682== LEAK SUMMARY:
==7682==    definitely lost: 48 bytes in 2 blocks
==7682==    indirectly lost: 1,128 bytes in 44 blocks
==7682==      possibly lost: 0 bytes in 0 blocks
==7682==    still reachable: 4,096 bytes in 1 blocks
==7682==         suppressed: 88 bytes in 1 blocks

让我们从总结开始：

==7682== LEAK SUMMARY:
==7682==    definitely lost: 48 bytes in 2 blocks
==7682==    indirectly lost: 1,128 bytes in 44 blocks
==7682==      possibly lost: 0 bytes in 0 blocks
==7682==    still reachable: 4,096 bytes in 1 blocks
==7682==         suppressed: 88 bytes in 1 blocks

您有两个分配的内存块，无法通过任何指针访问。这意味着在程序的某个地方，你 malloc 了它们，然后你完全忘记了它们。那些是糟糕的内存泄漏。您需要检查您的逻辑以便处理这些块，或者在程序生命周期中更快地释放它们。我不确定是否间接丢失，我会说您没有块的直接句柄，但是您有指向拥有块句柄的结构的指针。这些内存泄漏可以通过在退出前释放结构中的指针来缓解。我不知道“可能迷路”，也从未与 valgrind 接触过。“仍然可达”是很好的内存泄漏，即在 valgrind 崩溃的地方，您没有释放仍然可以访问的块，但是您有一个指向它的指针，您可以轻松地添加一个调用来释放该指针并解决内存泄漏。

这两个调用堆栈向您显示导致内存泄漏的 malloc，减去“仍然可访问”的泄漏（要查看它们，您必须将选项添加--leak-check-full --show-reachable=yes到您的 valgrind 调用中。

注意 2：避免使用像 destroyallhashmap（难以阅读）或 destroyrelation2（编号）这样的函数名称。更喜欢 destroy_all_hashmap 或不太常见的（在 C 中）destroyAllHashmap 并避免给你的函数编号。同样，避免使用像 nnew 这样的变量名，而是使用语义上合理的变量名。

score 4 · Accepted Answer

由于我在所有建议中都看到了 Valgrind，因此我会推荐一些其他更通用的建议，这些建议随着时间的推移被证明是有用的。

缩小代码范围以查找错误

首先，很难使用任何工具/跟踪大型系统。尝试缩小问题范围。

例如，关闭模块（注释掉代码片段，看看您是否仍然不断产生问题）。除非它是真正令人讨厌的随机内存损坏，否则一些命中和试验应该可以让您消除大部分代码。

删除动态内存或至少注释掉内存释放

尝试注释掉“无内存”调用（如果您的情况可以避免系统内存溢出）。这样，您至少可以消除或缩小与 dealloc 相关的问题。更好的是，尝试使用静态分配的内存运行整个系统。我知道这可能不是很实用，但是一旦你有一个有限的范围持续产生崩溃，你可能能够分配一个足够大的静态内存而不是需要动态内存。可以创建一个节点数组并将它们分配给您的指针。

调用 Stack 并在崩溃位置观察

我假设您已经在崩溃时检查了调用堆栈并验证了本地可用的指针。在尝试上述任何方法之前，这应该是非常直接的方法。

c - c：调试模糊内存泄漏的策略？

3 回答 3

Related

Reference