0

我有一个函数 find_nodes() 里面有一个循环:

for (htmlNodePtr current_node=root_node
    ; current_node!=NULL
    ; current_node=current_node->next) 
{
    if (xmlHasProp(current_node,(xmlChar *)"href")) {
        if (xmlHasProp(current_node,(xmlChar *)attribute)) {
            if (strcmp(value,
                (char *)xmlGetProp(current_node,(xmlChar *)attribute))==0) {
                    found_nodes[numb_found]=current_node;
                    numb_found++;
            }
        }
    }

    find_nodes(found_nodes,numb_found,
               current_node->children,mode,attribute,value);

}

我在这个作业中遇到了分段错误:

found_nodes[numb_found]=current_node;

我检查了 numb_found 值,几次迭代都可以,之后它而不是几个+1,它等于 -1207604106

是什么原因造成的?

4

9 回答 9

3

您以某种方式超出了数组边界并查看随机数据。

好的,看看这个,我们没有足够的信息,但我观察到这似乎是对 DOM 树的递归搜索。您numb_found作为参数传递,因此当您在递归调用中分配给它时,该值不会在上面更新。最终你会遇到麻烦。

于 2009-05-01T23:01:16.733 回答
2

在您声明但未初始化的 get_urls 函数中

char **url_list;

然后你使用它

if (tree_is_true(l_list)) {
    url_list[numb_found]=(char *)xmlGetProp(matching_nodes[j],(xmlChar *)"href");
    numb_found++;
}

-1207604106 是 0xB8056C76 - 非常适合指针;-)

于 2009-05-02T15:17:13.473 回答
1

你踩到了内存。编译你的代码-g并使用它运行它,它valgrindvalgrind告诉你错误的确切位置。

于 2009-05-01T23:11:14.247 回答
0

从给定的代码中猜测,要么你有一些事情正在踩踏堆栈,要么 numb_found 变得非常大并且溢出。输入一些真实代码(如上述所有代码的类型信息),我们将能够告诉您更多信息。

我怀疑 found_nodes 是本地堆栈上的一个固定大小的数组,并且您也过度运行它。

于 2009-05-01T23:09:16.590 回答
0

也许我误解了您的代码,但是由于您正在传递numb_found而不是&numb_found,因此每次从递归返回时,您都将重写找到的节点,这对我来说似乎是一个错误。

于 2009-05-01T23:10:03.000 回答
0

这是整个功能:

void find_nodes(htmlNodePtr *found_nodes, int &numb_found, htmlNodePtr root_node, SearchMode mode, const char *attribute, const char *value) {

    htmlNodePtr tmp_ptr;

    switch (mode) {

        case S_HREF:
            for (htmlNodePtr current_node=root_node; current_node!=NULL; current_node=current_node->next) {
                if (xmlHasProp(current_node,(xmlChar *)"href")) {
                    if (xmlHasProp(current_node,(xmlChar *)attribute)) {
                        if (strcmp(value,(char *)xmlGetProp(current_node,(xmlChar *)attribute))==0) {
                            found_nodes[numb_found]=current_node;
                            numb_found++;
                        }
                    }
                }

                find_nodes(found_nodes,numb_found,current_node->children,mode,attribute,value);

            }
            break;
        case S_KEYWORD:
            for (htmlNodePtr current_node=root_node; current_node!=NULL; current_node=current_node->next) {
                if (xmlHasProp(current_node,(xmlChar *)"href")) {
                    if (strcmp(value,(char *)xmlNodeGetContent(current_node))==0) {
                        found_nodes[numb_found]=current_node;
                        numb_found++;
                    }
                }

                find_nodes(found_nodes,numb_found,current_node->children,mode,attribute,value);
            }
            break;
        case S_TAG:
            for (htmlNodePtr current_node=root_node; current_node!=NULL; current_node=current_node->next) {
                if (xmlHasProp(current_node,(xmlChar *)attribute)) {
                    if (strcmp(value,(char *)xmlGetProp(current_node,(xmlChar *)attribute))==0) {
                        tmp_ptr=inner_href_seek(current_node);
                        if (tmp_ptr==NULL) {
                            find_nodes(found_nodes,numb_found,current_node->children,mode,attribute,value);
                            continue;
                        }
                        else {
                            found_nodes[numb_found]=tmp_ptr;
                            numb_found++;
                        }
                    }
                }

                find_nodes(found_nodes,numb_found,current_node->children,mode,attribute,value);
            }
            break;
    }
}

该阵列是固定的,但它比需要的要大得多。我是否以正确的方式传递了 numb_found ?

=== 编辑 ===

char** get_urls(string url, ParseTreeNode *tree_root, int &numb_found) {

    numb_found=0;
    char **url_list;
    htmlDocPtr doc;
    htmlNodePtr root_node;
    string site_content;

    if (get_page(url,site_content)<0) {
        url_list=NULL;
        return url_list;
    }

    // get a DOM
    doc=htmlReadMemory(site_content.data(),site_content.size(),url.data(),NULL,0);

    // and the root
    root_node=xmlDocGetRootElement(doc);

    if (tree_root==NULL) {
        url_list=NULL;
        return url_list;
    }

    LeafList *l_list;
    l_list= new LeafList();

    l_list->numb_leafs=0;

    get_leaf_list(l_list,tree_root);

    htmlNodePtr matching_nodes[256];
    int numb_matching_nodes;

    htmlNodePtr tmp_nodes[64];
    int numb_tmp;

    SearchMode tmp_rule;

    for (int i=0;i<l_list->numb_leafs;i++) {
        if (l_list->leaf_buff[i]->data->rule!=TAG) continue;
        else {
            numb_matching_nodes=0;
            find_nodes(matching_nodes,numb_matching_nodes,root_node,S_TAG,l_list->leaf_buff[i]->data->attribute.data(),l_list->leaf_buff[i]->data->value.data());

            if (numb_matching_nodes==0) continue;
            else l_list->leaf_buff[i]->state=true;

            for (int j=0;j<numb_matching_nodes;j++) {
                for (int k=0;k<l_list->numb_leafs;k++) {
                    if (k==i) continue;
                    else {
                        switch(l_list->leaf_buff[k]->data->rule) {
                            case HREF:
                                tmp_rule=S_HREF;
                                break;
                            case TAG:
                                tmp_rule=S_TAG;
                                break;
                            case KEYWORD:
                                tmp_rule=S_KEYWORD;
                                break;
                        }

                        find_nodes(tmp_nodes,numb_tmp,matching_nodes[j],tmp_rule,l_list->leaf_buff[k]->data->attribute.data(),l_list->leaf_buff[i]->data->value.data());

                        if (numb_tmp>0) l_list->leaf_buff[k]->state=true;
                        else l_list->leaf_buff[k]->state=false;
                    }
                }

                if (tree_is_true(l_list)) {
                    url_list[numb_found]=(char *)xmlGetProp(matching_nodes[j],(xmlChar *)"href");
                    numb_found++;
                }
            }
        }
    }

    for (int i=0;i<l_list->numb_leafs;i++) {
        if (l_list->leaf_buff[i]->data->rule!=HREF) continue;
        else {
            numb_matching_nodes=0;
            find_nodes(matching_nodes,numb_matching_nodes,root_node,S_HREF,l_list->leaf_buff[i]->data->attribute.data(),l_list->leaf_buff[i]->data->value.data());

            if (numb_matching_nodes==0) continue;
            else l_list->leaf_buff[i]->state=true;

            for (int j=0;j<numb_matching_nodes;j++) {
                for (int k=0;k<l_list->numb_leafs;k++) {
                    if ((k==i)||(l_list->leaf_buff[k]->data->rule==TAG)) continue;
                    else {
                        switch(l_list->leaf_buff[k]->data->rule) {
                            case HREF:
                                tmp_rule=S_HREF;
                                break;
                            case KEYWORD:
                                tmp_rule=S_KEYWORD;
                                break;
                        }

                        find_nodes(tmp_nodes,numb_tmp,matching_nodes[j],tmp_rule,l_list->leaf_buff[k]->data->attribute.data(),l_list->leaf_buff[i]->data->value.data());

                        if (numb_tmp>0) l_list->leaf_buff[k]->state=true;
                        else l_list->leaf_buff[k]->state=false;
                    }
                }

                if (tree_is_true(l_list)) {
                    url_list[numb_found]=(char *)xmlGetProp(matching_nodes[j],(xmlChar *)"href");
                    numb_found++;
                }
            }
        }
    }

    for (int i=0;i<l_list->numb_leafs;i++) {
        if (l_list->leaf_buff[i]->data->rule!=KEYWORD) continue;
        else {
            numb_matching_nodes=0;
            find_nodes(matching_nodes,numb_matching_nodes,root_node,S_KEYWORD,l_list->leaf_buff[i]->data->attribute.data(),l_list->leaf_buff[i]->data->value.data());

            if (numb_matching_nodes==0) continue;
            else {
                for (int i=0;i<numb_matching_nodes;i++) {
                    url_list[numb_found]=(char *)xmlGetProp(matching_nodes[i],(xmlChar *)"href");
                    numb_found++;
                }
            }
        }
    }

    return url_list;
}
于 2009-05-01T23:19:32.463 回答
0

我注意到无效值 (-1207604106)numb_found是 0xB8056C76,这有点像指针值。这可以通过超出数组来解释,尽管您说它没有被超出......

我建议您验证该数组确实“比需要的大得多”。cerr在将节点添加到数组的行上添加跟踪(使用);至少,让痕迹打印出numb_found每次通过的值。崩溃前您获得的最大值是多少?这实际上与数组大小相比如何?

于 2009-05-01T23:36:53.763 回答
0

这就是 valgrind 返回的内容

==7464==
==7464== Use of uninitialised value of size 4
==7464==    at 0x80494EF: find_nodes(_xmlNode**, int&, _xmlNode*, SearchMode, char const*, char const*) (search_engine.cpp:90)
==7464==    by 0x8049CF2: get_urls(std::string, ParseTreeNode*, int&) (search_engine.cpp:237)
==7464==    by 0x804907B: main (tester.cpp:39)
==7464==
==7464== Invalid write of size 4
==7464==    at 0x80494EF: find_nodes(_xmlNode**, int&, _xmlNode*, SearchMode, char const*, char const*) (search_engine.cpp:90)
==7464==    by 0x8049CF2: get_urls(std::string, ParseTreeNode*, int&) (search_engine.cpp:237)
==7464==    by 0x804907B: main (tester.cpp:39)
==7464==  Address 0xcef11ec0 is not stack'd, malloc'd or (recently) free'd
==7464==
==7464== Process terminating with default action of signal 11 (SIGSEGV)
==7464==  Access not within mapped region at address 0xCEF11EC0
==7464==    at 0x80494EF: find_nodes(_xmlNode**, int&, _xmlNode*, SearchMode, char const*, char const*) (search_engine.cpp:90)
==7464==    by 0x8049CF2: get_urls(std::string, ParseTreeNode*, int&) (search_engine.cpp:237)
==7464==    by 0x804907B: main (tester.cpp:39)
==7464==
==7464== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 18 from 2)
==7464== malloc/free: in use at exit: 73,605 bytes in 3,081 blocks.
==7464== malloc/free: 3,666 allocs, 585 frees, 172,154 bytes allocated.
==7464== For counts of detected errors, rerun with: -v
==7464== searching for pointers to 3,081 not-freed blocks.
==7464== checked 329,952 bytes.
==7464==
==7464== LEAK SUMMARY:
==7464==    definitely lost: 186 bytes in 30 blocks.
==7464==      possibly lost: 8,324 bytes in 6 blocks.
==7464==    still reachable: 65,095 bytes in 3,045 blocks.
==7464==         suppressed: 0 bytes in 0 blocks.
==7464== Rerun with --leak-check=full to see details of leaked memory.

我得到的最大值是 24,数组有 1024 个元素

于 2009-05-01T23:42:24.560 回答
0

说真的,你不应该问我们这个问题。您可以更快地执行以下任一操作:

  • 通过调试器运行它,单步执行代码并numb_found在每条指令之后保持监视;或者
  • 在没有调试器的情况下,numb_found在每个语句之后放置一个 printf/cout (具有唯一的 ID: printf("A:%d\n,numb_found);)。

这将是查看问题原因的最快方法。

无论如何,您的最新答案/评论表明您正在传递一个未初始化的值,find_nodes()并且考虑到它们中的大多数都是指针,这也会导致您写入无效内存。

我无法从 valgrind 输出中分辨出哪个参数未初始化,因此在函数顶部放置一个 printf/cout 以打印出所有指针(而不是指针的内容)。这应该允许您查看哪个参数是错误的或已损坏。

于 2009-05-02T01:12:32.503 回答