New to Neo4j but can see so many possibilities in graph databases, in particular IT data workflow and system impact. But unsure of the correct design for maximum efficiency.
Consider a system that takes in files, processes them, stores them in database and makes data available in various reports. However, depending on the file, the data may be in one report, but not the other.
System Architecture and Reality
An important use case is to be able to report the impact on downstream reports if upstream files are missing or components that process those files fail.
I have come up with 4 designs, 3 of which seem to work, but unsure which is best.
Would appreciate any help or advice on this.
Code used:
---------------------------------------------------------------------------
-- Design Experiments
---------------------------------------------------------------------------
// 1. Combination of the Workflows with shared nodes where they interact
with same Process or DataStore
---------------------------------------------------------------------------
MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n, r
CREATE (p1:Provider {name: "Provider 1"})
CREATE (p2:Provider {name: "Provider 2"})
CREATE (f1:File {name: "File 1"})
CREATE (f2:File {name: "File 2"})
CREATE (f3:File {name: "File 3"})
CREATE (pp:PreProcess {name: "PreProcess"})
CREATE (p:Process {name: "Process"})
CREATE (d:DataStore {name: "DataStore"})
CREATE (rA:Report {name: "Report A"})
CREATE (rB:Report {name: "Report B"})
CREATE (p1)-[:PROVIDES{}]->(f1)
CREATE (p1)-[:PROVIDES{}]->(f2)
CREATE (p2)-[:PROVIDES{}]->(f3)
CREATE (f1)-[:DELIVERS_TO{}]->(pp)
CREATE (pp)-[:DELIVERS_TO{}]->(p)
CREATE (f2)-[:DELIVERS_TO{}]->(p)
CREATE (f3)-[:DELIVERS_TO{}]->(p)
CREATE (p)-[:DELIVERS_TO{}]->(d)
CREATE (d)-[:DELIVERS_TO{}]->(rA)
CREATE (d)-[:DELIVERS_TO{}]->(rB)
// Show impacted reports if Provider 1 is down
MATCH (a:Provider {name:"Provider 1"})-[r*]->(rp:Report) RETURN rp
// Show impacted reports if Provider 2 is down
MATCH (a:Provider {name:"Provider 2"})-[r*]->(rp:Report) RETURN rp
// 2. Same node relationship design as #1, but assign a workflow property
to each node and relationship as a property array
---------------------------------------------------------------------------
MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n, r
CREATE (p1:Provider {name: "Provider 1", workflow: ["workflow1","workflow2"]})
CREATE (p2:Provider {name: "Provider 2", workflow: ["workflow3"]})
CREATE (f1:File {name: "File 1", workflow: ["workflow1"]})
CREATE (f2:File {name: "File 2", workflow: ["workflow2"]})
CREATE (f3:File {name: "File 3", workflow: ["workflow3"]})
CREATE (pp:PreProcess {name: "PreProcess", workflow: ["workflow1"]})
CREATE (p:Process {name: "Process", workflow: ["workflow1","workflow2","workflow3"]})
CREATE (d:DataStore {name: "DataStore", workflow: ["workflow1","workflow2","workflow3"]})
CREATE (rA:Report {name: "Report A", workflow: ["workflow1","workflow3"]})
CREATE (rB:Report {name: "Report B", workflow: ["workflow2"]})
CREATE (p1)-[:PROVIDES{workflow: ["workflow1"]}]->(f1)
CREATE (p1)-[:PROVIDES{workflow: ["workflow2"]}]->(f2)
CREATE (p2)-[:PROVIDES{workflow: ["workflow3"]}]->(f3)
CREATE (f1)-[:DELIVERS_TO{workflow: ["workflow1"]}]->(pp)
CREATE (pp)-[:DELIVERS_TO{workflow: ["workflow1"]}]->(p)
CREATE (f2)-[:DELIVERS_TO{workflow: ["workflow2"]}]->(p)
CREATE (f3)-[:DELIVERS_TO{workflow: ["workflow3"]}]->(p)
CREATE (p)-[:DELIVERS_TO{workflow: ["workflow1","workflow2","workflow3"]}]->(d)
CREATE (d)-[:DELIVERS_TO{workflow: ["workflow1","workflow3"]}]->(rA)
CREATE (d)-[:DELIVERS_TO{workflow: ["workflow2"]}]->(rB)
// Show individual workflows
MATCH (p) WHERE filter(x in p.workflow WHERE x = "workflow1") RETURN p
MATCH (p) WHERE filter(x in p.workflow WHERE x = "workflow2") RETURN p
MATCH (p) WHERE filter(x in p.workflow WHERE x = "workflow3") RETURN p
// Show impacted reports if Provider 1 is down
MATCH (a:Provider {name:"Provider 1"}) WITH a.workflow AS workflows
MATCH (r:Report) WHERE filter(x in r.workflow WHERE x in workflows)
RETURN r
// Show impacted reports if Provider 2 is down
MATCH (a:Provider {name:"Provider 2"}) WITH a.workflow AS workflows
MATCH (r:Report) WHERE filter(x in r.workflow WHERE x in workflows)
RETURN r
// 3. Same node relationship design as #1, but create a relationship
with a workflow property for each workflow, resulting in multiple
relatinships between nodes.
---------------------------------------------------------------------------
MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n, r
CREATE (p1:Provider {name: "Provider 1"})
CREATE (p2:Provider {name: "Provider 2"})
CREATE (f1:File {name: "File 1"})
CREATE (f2:File {name: "File 2"})
CREATE (f3:File {name: "File 3"})
CREATE (pp:PreProcess {name: "PreProcess"})
CREATE (p:Process {name: "Process"})
CREATE (d:DataStore {name: "DataStore"})
CREATE (rA:Report {name: "Report A"})
CREATE (rB:Report {name: "Report B"})
CREATE (p1)-[:PROVIDES{workflow: "workflow1"}]->(f1)
CREATE (p1)-[:PROVIDES{workflow: "workflow2"}]->(f2)
CREATE (p2)-[:PROVIDES{workflow: "workflow3"}]->(f3)
CREATE (f1)-[:DELIVERS_TO{workflow: "workflow1"}]->(pp)
CREATE (pp)-[:DELIVERS_TO{workflow: "workflow1"}]->(p)
CREATE (f2)-[:DELIVERS_TO{workflow: "workflow2"}]->(p)
CREATE (f3)-[:DELIVERS_TO{workflow: "workflow3"}]->(p)
CREATE (p)-[:DELIVERS_TO{workflow: "workflow1"}]->(d)
CREATE (p)-[:DELIVERS_TO{workflow: "workflow2"}]->(d)
CREATE (p)-[:DELIVERS_TO{workflow: "workflow3"}]->(d)
CREATE (d)-[:DELIVERS_TO{workflow: "workflow1"}]->(rA)
CREATE (d)-[:DELIVERS_TO{workflow: "workflow3"}]->(rA)
CREATE (d)-[:DELIVERS_TO{workflow: "workflow2"}]->(rB)
// Show impacted reports if Provider 1 is down
MATCH (a:Provider {name:"Provider 1"})-[j]->(n)-[r*]->(g)-[t]->(rp:Report) WHERE j.workflow=t.workflow RETURN rp
// Show impacted reports if Provider 2 is down
MATCH (a:Provider {name:"Provider 2"})-[j]->(n)-[r*]->(g)-[t]->(rp:Report) WHERE j.workflow=t.workflow RETURN rp
// 4. Distinct set of nodes and relationships for each workflow, but all
with same node type so can still be matched
---------------------------------------------------------------------------
MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n, r
CREATE (p1:Provider {name: "Provider 1"})
CREATE (p2:Provider {name: "Provider 1"})
CREATE (p3:Provider {name: "Provider 2"})
CREATE (f1:File {name: "File 1"})
CREATE (f2:File {name: "File 2"})
CREATE (f3:File {name: "File 3"})
CREATE (pp1:PreProcess {name: "PreProcess"})
CREATE (pc1:Process {name: "Process"})
CREATE (pc2:Process {name: "Process"})
CREATE (pc3:Process {name: "Process"})
CREATE (d1:DataStore {name: "DataStore"})
CREATE (d2:DataStore {name: "DataStore"})
CREATE (d3:DataStore {name: "DataStore"})
CREATE (rA1:Report {name: "Report A"})
CREATE (rB2:Report {name: "Report B"})
CREATE (rA3:Report {name: "Report A"})
CREATE (p1)-[:PROVIDES{workflow: "workflow1"}]->(f1)
CREATE (p2)-[:PROVIDES{workflow: "workflow2"}]->(f2)
CREATE (p3)-[:PROVIDES{workflow: "workflow3"}]->(f3)
CREATE (f1)-[:DELIVERS_TO{workflow: "workflow1"}]->(pp1)
CREATE (pp1)-[:DELIVERS_TO{workflow: "workflow1"}]->(pc1)
CREATE (f2)-[:DELIVERS_TO{workflow: "workflow2"}]->(pc2)
CREATE (f3)-[:DELIVERS_TO{workflow: "workflow3"}]->(pc3)
CREATE (pc1)-[:DELIVERS_TO{workflow: "workflow1"}]->(d1)
CREATE (pc2)-[:DELIVERS_TO{workflow: "workflow2"}]->(d2)
CREATE (pc3)-[:DELIVERS_TO{workflow: "workflow3"}]->(d3)
CREATE (d1)-[:DELIVERS_TO{workflow: "workflow1"}]->(rA1)
CREATE (d2)-[:DELIVERS_TO{workflow: "workflow3"}]->(rB2)
CREATE (d3)-[:DELIVERS_TO{workflow: "workflow2"}]->(rA3)
// Show impacted reports if Provider 1 is down
MATCH (a:Provider {name:"Provider 1"})-[j*]->(rp:Report) RETURN rp
// Show impacted reports if Provider 2 is down
MATCH (a:Provider {name:"Provider 2"})-[j*]->(rp:Report) RETURN rp
Following recommendation, have expanded Design 1 to include a direct link between File and Report.
// 1a. Combination of the Workflows with shared nodes where they interact
with same Process or DataStore.
---------------------------------------------------------------------------
MATCH (n) OPTIONAL MATCH (n)-[r]-() DELETE n, r
CREATE (p1:Provider {name: "Provider 1"})
CREATE (p2:Provider {name: "Provider 2"})
CREATE (f1:File {name: "File 1"})
CREATE (f2:File {name: "File 2"})
CREATE (f3:File {name: "File 3"})
CREATE (pp:PreProcess {name: "PreProcess"})
CREATE (p:Process {name: "Process"})
CREATE (d:DataStore {name: "DataStore"})
CREATE (rA:Report {name: "Report A"})
CREATE (rB:Report {name: "Report B"})
CREATE (p1)-[:PROVIDES{}]->(f1)
CREATE (p1)-[:PROVIDES{}]->(f2)
CREATE (p2)-[:PROVIDES{}]->(f3)
CREATE (f1)-[:DELIVERS_TO{}]->(pp)
CREATE (pp)-[:DELIVERS_TO{}]->(p)
CREATE (f2)-[:DELIVERS_TO{}]->(p)
CREATE (f3)-[:DELIVERS_TO{}]->(p)
CREATE (p)-[:DELIVERS_TO{}]->(d)
CREATE (d)-[:DELIVERS_TO{}]->(rA)
CREATE (d)-[:DELIVERS_TO{}]->(rB)
CREATE (f1)-[:USED_BY{}]->(rA)
CREATE (f2)-[:USED_BY{}]->(rB)
CREATE (f3)-[:USED_BY{}]->(rA)
// Show impacted reports (and path) if Provider 1 is down
MATCH path = (:Provider{name:'Provider 1'})-[:PROVIDES|USED_BY*]->(r:Report)
RETURN path, r.name AS report
// Show impacted reports (and path) if Provider 2 is down
MATCH path = (:Provider{name:'Provider 2'})-[:PROVIDES|USED_BY*]->(r:Report)
RETURN path, r.name AS report