We're building a data warehouse in BigQuery where we generate a large amount of data marts using standard sql statements. These can be quite large and complex. To track data lineage across a chain of dependencies, we'd like to automatically parse the SQL statements and get all the output columns, matched up with the input table.column(s).
Simple example:
SELECT t1.a, t2.b, t1.a + t2.b AS c
FROM table1 t1 JOIN table2 t2 ON t1.a = t2.a
Should end up giving us:
Input Output
table1.a a
table2.b b
table1.a c
table1.b c
We've tried using this: https://www.npmjs.com/package/node-sql-parser, but it comes up short in some of our complex scenarios.
Is there any library available in any language which supports parsing a SQL statement and returning the AST for the full standard SQL grammar?