我正在尝试使用 SQLParse 编写一个过程,该过程列出了 SQL 语句中存在的表,目前专注于查询的“FROM”子句。我还在尝试识别 FROM 子句中的嵌套查询(或子查询)并再次运行该过程以识别该嵌套查询中的表。
使用此示例查询
from sqlparse.sql import IdentifierList, Identifier, Function, Where, Parenthesis, TokenList
from sqlparse.tokens import Keyword, DML, Punctuation
sql_2 = """select * from luv_main.test_table left join (select * from luv_all.fake_Table where (a = b)) x where a = 4 order by A, B, C"""
下面是代码,它正在工作:
full_tables = []
tables = []
from_seen = False
for item in parsed.tokens:
#stop the process if the Where statement is reached
if isinstance(item, Where):
from_seen = False
if from_seen:
#multiple tables with Join statements in between, or one table. Doesn't consider subqueries
if isinstance(item, Identifier):
#checks to see if there is a parenthesis, meaning a subquery
if 'SELECT' in item.value.upper():
subquery = item.value
#returns the db name
tables.append(item.get_parent_name())
#returns the table name
tables.append(item.get_real_name())
#returns the alias
tables.append(item.get_alias())
full_tables.append(tables)
tables = []
# if multiple tables separated by comma's will be an identifier list. Doesn't consider subqueries
if isinstance(item, IdentifierList):
for identifier in item.get_identifiers():
#returns the db name
tables.append(identifier.get_parent_name())
#returns the table name
tables.append(identifier.get_real_name())
#returns the alias
tables.append(identifier.get_alias())
full_tables.append(tables)
tables = []
else:
if item.ttype is Keyword and item.value.upper() == 'FROM':
from_seen = True
print(full_tables)
print(len(full_tables))
这从查询开始,并通过搜索单词 select 来识别子查询,然后我有了这个。
#process of removing outer-most parentheses and identifying aliases that sit outside that window
#new subquery string ready to parse
res_sub = ""
#capture the alias
alias = ""
#record the number of parentheses as they open and close
paren_cnt = 0
for char in subquery:
#if ( and there's already been a ( , include it
if char == '(' and paren_cnt > 0:
res_sub += char
#if (, add to the count
if char == '(':
paren_cnt += 1
# if ) and there's at least 2 (, include it
if char == ')' and paren_cnt > 1:
res_sub += char
# if ), subtract from the count
if char == ')':
paren_cnt -= 1
# capture the script
if char != '(' and char != ')' and paren_cnt >0:
res_sub += char
# capture the alias
if char != '(' and char != ')' and char != ' ' and paren_cnt == 0:
alias += char
subparsed = sqlparse.parse(res_sub)[0]
然后删除最外层的括号并解析为新的 SQL 语句。这一切正常,如果我通过前面的代码块手动运行这个解析的语句,它会按预期工作。
然后我尝试将其放入单独的函数中:
- 首先解析查询并调用:
- 一个扫描 FROM 子句并返回表的函数,但如果它标识一个子查询,它会调用:
- 一个函数,它删除脚本最外层的括号,然后调用第一个函数将其通过进程发送回来。
但是当它尝试运行时会发生sqlparse.parse(res_sub)[0]
元组索引超出范围。它不应该是一个元组,它应该是一个 str,然后将其解析为 sqlparse.sql.Statement。
我不明白为什么它的行为不同只是因为我把它放到了一系列函数中。功能代码如下:
def parse(sql):
parsed = sqlparse.parse(sql)[0]
#call function to assess the FROM statement of the query
assess_from_clause(parsed)
def assess_from_clause(parsed):
full_tables = []
tables = []
from_seen = False
for item in parsed.tokens:
#stop the process if the Where statement is reached
if isinstance(item, Where):
from_seen = False
#checks to see if there is a parenthesis, meaning a subquery
if 'SELECT' in item.value.upper():
subquery = item.value
subquery_parsing(subquery)
if from_seen:
#multiple tables with Join statements in between, or one table. Doesn't consider subqueries
if isinstance(item, Identifier):
#returns the db name
tables.append(item.get_parent_name())
#returns the table name
tables.append(item.get_real_name())
#returns the alias
tables.append(item.get_alias())
full_tables.append(tables)
tables = []
# if multiple tables separated by comma's will be an identifier list. Doesn't consider subqueries
if isinstance(item, IdentifierList):
for identifier in item.get_identifiers():
#returns the db name
tables.append(identifier.get_parent_name())
#returns the table name
tables.append(identifier.get_real_name())
#returns the alias
tables.append(identifier.get_alias())
full_tables.append(tables)
tables = []
else:
if item.ttype is Keyword and item.value.upper() == 'FROM':
from_seen = True
print(full_tables)
def subquery_parsing(subquery):
#new subquery string ready to parse
res_sub = ''
#capture the alias
alias = ''
#record the number of parentheses as they open and close
paren_cnt = 0
for char in subquery:
#if ( and there's already been a ( , include it
if char == '(' and paren_cnt > 0:
res_sub += char
#if (, add to the count
if char == '(':
paren_cnt += 1
# if ) and there's at least 2 (, include it
if char == ')' and paren_cnt > 1:
res_sub += char
# if ), subtract from the count
if char == ')':
paren_cnt -= 1
# capture the script
if char != '(' and char != ')' and paren_cnt >0:
res_sub += char
# capture the alias
if char != '(' and char != ')' and char != ' ' and paren_cnt == 0:
alias += char
parse(res_sub)
我应该强调我并不精通 Python,而且我在学习的过程中学到了很多东西!
谢谢