c++ - 如何有效地存储非常大的图空间并具有快速索引？

Question

我正在使用875713 nodes和绘制图表5105039 edges。使用vector<bitset<875713>> vec(875713)或array<bitset<875713>, 875713>向我抛出段错误。我需要通过路径恢复计算所有对最短路径。我有哪些替代数据结构？

我找到了这个SO Thread，但它没有回答我的查询。

编辑

我在阅读建议后尝试了这个，似乎有效。感谢大家帮助我。

vector<vector<uint>> neighboursOf; // An edge between i and j exists if
                                   // neighboursOf[i] contains j
neighboursOf.resize(nodeCount);

while (input.good())
{
    uint fromNodeId = 0;
    uint toNodeId = 0;

    getline(input, line);

    // Skip comments in the input file
    if (line.size() > 0 && line[0] == '#')
        continue;
    else
    {
        // Each line is of the format "<fromNodeId> [TAB] <toNodeId>"
        sscanf(line.c_str(), "%d\t%d", &fromNodeId, &toNodeId);

        // Store the edge
        neighboursOf[fromNodeId].push_back(toNodeId);
    }
}

score 3 · Accepted Answer

您的图是稀疏的，即|E| << |V|^2，因此您可能应该使用稀疏矩阵来表示您的邻接矩阵，或者等效地为每个节点存储其邻居列表（这会导致锯齿状数组），如下所示 -

vector<vector<int> > V (number_of_nodes);
// For each cell of V, which is a vector itself, push only the indices of adjacent nodes.
V[0].push_back(2);   // Node number 2 is a neighbor of node number 0
...
V[number_of_nodes-1].push_back(...);

这样，您的预期内存需求将O(|E| + |V|)代替O(|V|^2)，在您的情况下应该是 50 MB 左右，而不是数十亿 MB。

这也将导致更快的 Dijkstra（或任何其他最短路径算法），因为您只需要在每个步骤中考虑节点的邻居。

score 2 · Accepted Answer

您可以将每个节点的边列表存储在单个数组中。如果每个节点的边数是可变的，您可以用空边终止列表。这将避免许多小列表（或类似数据结构）的空间开销。结果可能如下所示：

enum {
    MAX_NODES = 875713,
    MAX_EDGES = 5105039,
};

int nodes[MAX_NODES+1];         // contains index into array edges[].
                                // index zero is reserved as null node
                                // to terminate lists.

int edges[MAX_EDGES+MAX_NODES]; // contains null terminated lists of edges.
                                // each edge occupies a single entry in the
                                // array. each list ends with a null node.
                                // there are MAX_EDGES entries and MAX_NODES
                                // lists.

[...]

/* find edges for node */
int node, edge, edge_index;
for (edge_index=nodes[node]; edges[edge_index]; edge_index++) {
    edge = edges[edge_index];
    /* do something with edge... */
}

最小化空间开销非常重要，因为您拥有大量的小型数据结构。每个节点列表的开销只是一个整数，这比例如 stl 向量的开销要小得多。此外，列表在内存中连续布局，这意味着任何两个列表之间都不会浪费空间。对于可变大小的向量，情况并非如此。

读取任何给定节点的所有边将非常快，因为任何节点的边都连续存储在内存中。

这种数据排列的缺点是，当您初始化数组并构造边列表时，您需要手头有一个节点的所有边。如果您按节点对边进行排序，这不是问题，但如果边按随机顺序排列，则效果不佳。

score 1 · Accepted Answer

如果我们声明一个节点如下：

struct{
int node_id;
vector<int> edges; //all the edges starts from this Node.
} Node;

那么所有的节点可以表示如下：

array<Node> nodes;

c++ - 如何有效地存储非常大的图空间并具有快速索引？

3 回答 3

Related

Reference