3

I have a huge directed graph with about a million nodes and more than ten million edges. The edges are not weighted. The graph is a small-world like graph. In fact I see that every node is (on average) connected to another node over three intermediate nodes.

Given this graph can you think of a fast algorithm that returns all paths (without cycles) between a start and a destination node, but only up to a given maximum number N of intermediate nodes (and in my case N most of the time will be between 0 and 3)?

4

2 回答 2

3

If your graph was undirected, you would certainly want to do a bidirectional breadth first search. For length 2 paths, enumerate edges from the start node and the end node and see where they intersect. For the length 3 paths, go 2 deep from the end point with smaller degree, and one deep on the node with greater degree.

Since your graph is directed, you might want to also keep reverse edges so you can do the same trick.

于 2011-05-24T15:12:35.263 回答
0

Perhaps breath-first from both directions at once? Take neighbours of A, and neighbours of B. if you haven't found a link yet, add A to "neighbours of a" and B to "neighbours of B", then find any link between the two sets.

To extend it a bit further than three links, the "neighbours of A/B" lists need to contain a bit more. You will not be able to do it in-memory - you'll need a scratch table with

whatever TRANSACTION_ID; (or use an ORACLE 1-per-session temp table)
boolean MY_BFS_WAS_ROOTED_AT_A;
int NODE_ID;
int previous_node_id;

(you don't need to track depth if you check for loops in your insert statement)

you have found a path when there exists any

select from pathfinder a, pathfinder b
where a.taxn_id = foo and b.tnx_id=foo
and a.MY_BFS_WAS_ROOTED_AT_A = false
and b.MY_BFS_WAS_ROOTED_AT_A = true
and a.node_id = b.node_id

Don't forget to clean out the table when you are done! Doing it all as one transaction and rolling it back might be the easiest way.

于 2012-05-20T05:25:49.613 回答