0

我正在尝试使用 SQLParse 编写一个过程,该过程列出了 SQL 语句中存在的表,目前专注于查询的“FROM”子句。我还在尝试识别 FROM 子句中的嵌套查询(或子查询)并再次运行该过程以识别该嵌套查询中的表。

使用此示例查询

from sqlparse.sql import IdentifierList, Identifier, Function, Where, Parenthesis, TokenList
from sqlparse.tokens import Keyword, DML, Punctuation

sql_2 = """select * from luv_main.test_table left join (select * from luv_all.fake_Table where (a = b)) x  where a = 4 order by A, B, C"""

下面是代码,它正在工作:

full_tables = []
tables = []

from_seen = False
for item in parsed.tokens:
    
    #stop the process if the Where statement is reached
    if isinstance(item, Where):
            from_seen = False
    
    if from_seen:
     
        #multiple tables with Join statements in between, or one table. Doesn't consider subqueries
        if isinstance(item, Identifier):
            
            #checks to see if there is a parenthesis, meaning a subquery 
            if 'SELECT' in item.value.upper():
                subquery = item.value
                
            #returns the db name 
            tables.append(item.get_parent_name())
            
            #returns the table name
            tables.append(item.get_real_name())

            #returns the alias
            tables.append(item.get_alias())
            
            full_tables.append(tables)
            tables = []
        
        
        # if multiple tables separated by comma's will be an identifier list. Doesn't consider subqueries
        if isinstance(item, IdentifierList):
            for identifier in item.get_identifiers():
                #returns the db name
                tables.append(identifier.get_parent_name())
                
                #returns the table name
                tables.append(identifier.get_real_name())
                
                #returns the alias
                tables.append(identifier.get_alias())
                
                full_tables.append(tables)
                tables = []
                 
    else:
        if item.ttype is Keyword and item.value.upper() == 'FROM':
            from_seen = True
       
print(full_tables)
print(len(full_tables))

这从查询开始,并通过搜索单词 select 来识别子查询,然后我有了这个。

#process of removing outer-most parentheses and identifying aliases that sit outside that window

#new subquery string ready to parse
res_sub = ""

#capture the alias
alias = ""

#record the number of parentheses as they open and close
paren_cnt = 0


for char in subquery:
    
    #if ( and there's already been a ( , include it
    if char == '(' and paren_cnt > 0:
        res_sub += char
    
    #if (, add to the count
    if char == '(':
        paren_cnt += 1
   
    # if ) and there's at least 2 (, include it
    if char == ')' and paren_cnt > 1:
        res_sub += char
          
    # if ), subtract from the count        
    if char == ')':
        paren_cnt -= 1
    
    # capture the script
    if char != '(' and char != ')' and paren_cnt >0:
        res_sub += char
    
    # capture the alias
    if char != '(' and char != ')'  and char != ' ' and paren_cnt == 0:
        alias += char
        
subparsed = sqlparse.parse(res_sub)[0]

然后删除最外层的括号并解析为新的 SQL 语句。这一切正常,如果我通过前面的代码块手动运行这个解析的语句,它会按预期工作。

然后我尝试将其放入单独的函数中:

  • 首先解析查询并调用:
  • 一个扫描 FROM 子句并返回表的函数,但如果它标识一个子查询,它会调用:
  • 一个函数,它删除脚本最外层的括号,然后调用第一个函数将其通过进程发送回来。

但是当它尝试运行时会发生sqlparse.parse(res_sub)[0]元组索引超出范围。它不应该是一个元组,它应该是一个 str,然后将其解析为 sqlparse.sql.Statement。

我不明白为什么它的行为不同只是因为我把它放到了一系列函数中。功能代码如下:

def parse(sql):
    
    parsed = sqlparse.parse(sql)[0]
        
    #call function to assess the FROM statement of the query
    assess_from_clause(parsed)

def assess_from_clause(parsed):
    
    full_tables = []
    tables = []
    
    from_seen = False
    for item in parsed.tokens:
        #stop the process if the Where statement is reached
        if isinstance(item, Where):
            from_seen = False
        
        #checks to see if there is a parenthesis, meaning a subquery 
        if 'SELECT' in item.value.upper():
            subquery = item.value
            subquery_parsing(subquery)
        
        if from_seen:
            #multiple tables with Join statements in between, or one table. Doesn't consider subqueries
            if isinstance(item, Identifier):
                
                #returns the db name 
                tables.append(item.get_parent_name())
            
                #returns the table name
                tables.append(item.get_real_name())

                #returns the alias
                tables.append(item.get_alias())
            
                full_tables.append(tables)
                tables = []
        
        
            # if multiple tables separated by comma's will be an identifier list. Doesn't consider subqueries
            if isinstance(item, IdentifierList):
                for identifier in item.get_identifiers():
                    #returns the db name
                    tables.append(identifier.get_parent_name())
                
                    #returns the table name
                    tables.append(identifier.get_real_name())
                
                    #returns the alias
                    tables.append(identifier.get_alias())
                
                    full_tables.append(tables)
                    tables = []
                 
        else:
            if item.ttype is Keyword and item.value.upper() == 'FROM':
                from_seen = True
       
    print(full_tables)

def subquery_parsing(subquery):
    
    #new subquery string ready to parse
    res_sub = ''

    #capture the alias
    alias = ''

    #record the number of parentheses as they open and close
    paren_cnt = 0


    for char in subquery:
        #if ( and there's already been a ( , include it
        if char == '(' and paren_cnt > 0:
            res_sub += char
    
        #if (, add to the count
        if char == '(':
            paren_cnt += 1
   
        # if ) and there's at least 2 (, include it
        if char == ')' and paren_cnt > 1:
            res_sub += char
          
        # if ), subtract from the count        
        if char == ')':
            paren_cnt -= 1
    
        # capture the script
        if char != '(' and char != ')' and paren_cnt >0:
            res_sub += char
    
        # capture the alias
        if char != '(' and char != ')'  and char != ' ' and paren_cnt == 0:
            alias += char
    
    parse(res_sub)

我应该强调我并不精通 Python,而且我在学习的过程中学到了很多东西!

谢谢

4

2 回答 2

0

我相信我现在已经解决了,触发第三个函数的部分触发得太早并且没有解析代码的子查询。

我已经改变了:

def assess_from_clause(parsed):
    
    full_tables = []
    tables = []
    
    from_seen = False
    for item in parsed.tokens:
        #stop the process if the Where statement is reached
        if isinstance(item, Where):
            from_seen = False
        
        #checks to see if there is a parenthesis, meaning a subquery 
        if 'SELECT' in item.value.upper():
            subquery = item.value
            subquery_parsing(subquery)
        
        if from_seen:

对此:

def assess_from_clause(parsed):
    
    full_tables = []
    tables = []
    
    from_seen = False
    for item in parsed.tokens:
        #stop the process if the Where statement is reached
        if isinstance(item, Where):
            from_seen = False
        
        if from_seen:
        
            #checks to see if there is a parenthesis, meaning a subquery 
            if 'SELECT' in item.value.upper():
                subquery = item.value
                subquery_parsing(subquery)

抱歉,目前对我来说这是一个非常反复试验的学习过程,感谢 Barmar 的评论。

于 2021-05-25T07:41:36.287 回答
0

这对我的库SQLGlot来说是微不足道的

import sqlglot
import sqlglot.expressions as exp

sql = """select * from luv_main.test_table left join (select * from luv_all.fake_Table where (a = b)) x  where a = 4 order by A, B, C"""

for column in sqlglot.parse_one(sql).find_all(exp.Table):
    print(column.text("this"))

fake_Table
test_table
于 2021-11-17T05:20:57.260 回答