0

我的代码中有一个标准的 DFS 实现,它在每次调用时使用动态分配的堆栈。

我经常调用这个函数。通常仅在小型运行(200-1000)节点上,但有时会有一个具有一百万个或更多节点的大型连接组件。

分析器显示大量计算时间浪费在分配堆栈上。我想尝试重用现有内存(例如调用堆栈)。但是,该函数必须保持线程安全。

有没有一种有效的方法来动态使用调用堆栈而不使函数递归?

到目前为止,我最好的想法是使用一个额外的参数使函数递归,该参数使每次后续调用的自动堆栈大小加倍。

伪C:

void dfs(size_t stack_length, void * graph, graphnode_t start_node) {
    graphnode_t stack[stack_length];
    size_t stack_size = 0;

    for (all nodes) {
        // do something useful
        if (stack_size < stack_length) {
            stack[stack_size++] = new_node;
        } else {
            dfs(stack_length * 2, graph, new_node);
        }
    }

}
4

1 回答 1

0

It sounds like you're describing that your algorithm would work fine with just a single graphnode_t array for the system (though you're calling it a stack, I don't think that really applies here), and the only real problem is you're not certain how large it should be when you begin.

If that is the case, I would suggest first that you do not make this (potentially huge) array a local variable, because that will cause problems with your actual program stack. Instead let it be a static pointer that points to dynamically sized memory which you periodically expand if needed.

ensure_size(graphnode_t **not_a_stack_ptr, unsigned long *length_ptr)
{
    if (!*not_a_stack_ptr)
    {
        *not_a_stack_ptr = malloc(sizeof(graphnode_t) * MINIMUM_ENTRY_COUNT);
        *length_ptr = MINIMUM_ENTRY_COUNT;
    }
    else if (size needs to double)
    {
        *length_ptr *= 2;
        *not_a_stack_ptr = realloc(*not_a_stack_ptr, sizeof(graphnode_t) * (*length_ptr));
    }
}


struct thread_arguments {
    void * graph;
    graphnode_t start_node;
}

dfs_thread(void *void_thread_args)
{
    struct thread_arguments *thread_args = void_thread_args;
    graphnode_t *not_a_stack = NULL;
    unsigned long not_a_stack_length = 0;

    for (all nodes)
    {
        ensure_size(&not_a_stack, &not_a_stack_length);
        stack[stack_size++] = new_node;
    }

    if (not_a_stack) free(not_a_stack);
}

Note: your pseudo-code suggests that the maximum size could be determined based on the number of nodes you have. You would get the most performance gain by using this to perform just a single full-sized malloc up front.

于 2013-11-12T11:49:43.093 回答