I want to be able to take in a maven dependency tree in as an input and parse through it to determine the groupId, artifactId, and version of each dependency with its child(ren) if any, and the child(ren)'s groupId, artifactId, and version (and any additional child(ren) and so on). I'm not sure if it makes the most sense to parse through the mvn dependency tree and store the info as a nested dictionary before preparing the data for neo4j.
I'm also unsure of the best way to parse through the entire mvn dependency tree. The code below is the most progress I've made at attempting to parse, remove unnecessary info in the front and label something a child or parent.
tree=
[INFO] +- org.antlr:antlr4:jar:4.7.1:compile
[INFO] | +- org.antlr:antlr4-runtime:jar:4.7.1:compile
[INFO] | +- org.antlr:antlr-runtime:jar:3.5.2:compile
[INFO] | \- com.ibm.icu:icu4j:jar:58.2:compile
[INFO] +- commons-io:commons-io:jar:1.3.2:compile
[INFO] +- brs:dxprog-lang:jar:3.3-SNAPSHOT:compile
[INFO] | +- brs:libutil:jar:2.51:compile
[INFO] | | +- commons-collections:commons-collections:jar:3.2.2:compile
[INFO] | | +- org.apache.commons:commons-collections4:jar:4.1:compile
[INFO] | | | +- com.fasterxml.jackson.core:jackson-annotations:jar:2.9.0:compile
[INFO] | | | \- com.fasterxml.jackson.core:jackson-core:jar:2.9.5:compile
.
.
.
fileObj = open("tree", "r")
for line in fileObj.readlines():
for word in line.split():
if "[INFO]" in line.split():
line = line.replace(line.split().__getitem__(0), "")
print(line)
if "|" in line.split():
line = line.replace(line.split().__getitem__(0), "child")
print(line)
if "+-" in line.split() and "|" not in line.split():
line = line.replace(line.split().__getitem__(0), "")
line = line.replace(line.split().__getitem__(0), "parent")
print(line, '\n\n')
Output:
| | \- com.google.protobuf:protobuf-java:jar:3.5.1:compile
child child \- com.google.protobuf:protobuf-java:jar:3.5.1:compile
| +- com.h2database:h2:jar:1.4.195:compile
child +- com.h2database:h2:jar:1.4.195:compile
parent com.h2database:h2:jar:1.4.195:compile
I would appreciate any insight on the best way to parse & return data in an organized way given that I'm relatively unfamiliar with the capabilities of python. Thank you in advance!