I have a data structure holding a graph like the one in the following picture:
In this tree, a node can have any number of unique children from the levels below it. In tree in the picture represents a set of paths. Where every path should begin with a node from Level 1, and ends with a node of "*" mark. So the paths of the tree in the picture are:
A then C then G
A then C then G then J
A then D then G
A then D then G the J
A then D then K, and so on...
Actually my original tree is huge (around 2 Million sequences) and the maximum number of nodes per level is 61 (of 11 levels). So it causes many memory consumption problems in my application (a computer vision application for SAMSUNG).
My target is to have an iterative algorithm that represents these paths in a more compact string format. So I think we the problem is divided into three steps as follows. I have built the tree data structure (step 2), but still can not derive an iterative algorithm that gets the output string/sequence in step 3 from the tree.
1- Input String:
(A C G) | (A C G J) | (A D G) | (A D G J ) | (A D K) | ....
,
Where "|" represents alternatives.
2- Building Tree Data Structure of These Paths.
3- Required Output String:
(A (C G [J]) | (D (G [J]) | K)) | (B ....)
.
Where where "|" represents alternatives and "[ ]" encloses options. The target output string should be optimized like there are not more common factors that can be taken to more simplify it.