2

我正在尝试解析一些 SQL 语句。这是一个示例:

select
    ms.member_sk a,
    dd.date_sk b,
    st.subscription_type,
    (SELECT foo FROM zoo) e
from dim_member_subscription_all p,
     dim_subs_type
where a in (select moo from t10)

我有兴趣只在这个时候获得表格。所以我想看看 [zoo, dim_member_subscription_all, dim_subs_type] & [t10]

我整理了一个小脚本,查看 Paul McGuire 的示例

#!/usr/bin/env python
import sys
import pprint
from pyparsing import *


pp = pprint.PrettyPrinter(indent=4)
semicolon = Combine(Literal(';') + lineEnd)
comma = Literal(',')
lparen = Literal('(')
rparen = Literal(')')

update_kw, volatile_kw, create_kw, table_kw, as_kw, from_kw, \
where_kw, join_kw, left_kw, right_kw, cross_kw, outer_kw, \
on_kw , insert_kw , into_kw= \
    map(lambda x: Keyword(x, caseless=True), \
        ['UPDATE', 'VOLATILE', 'CREATE', 'TABLE', 'AS', 'FROM',
         'WHERE', 'JOIN' , 'LEFT', 'RIGHT' , \
         'CROSS', 'OUTER', 'ON', 'INSERT', 'INTO'])

select_kw = Keyword('SELECT', caseless=True) | Keyword('SEL' , caseless=True)

reserved_words = (update_kw | volatile_kw | create_kw | table_kw | as_kw |
                  select_kw | from_kw | where_kw | join_kw |
                  left_kw | right_kw | cross_kw | on_kw | insert_kw |
                  into_kw)

ident = ~reserved_words + Word(alphas, alphanums + '_')

table = Combine(Optional(ident + Literal('.')) + ident)
column = Combine(Optional(ident + Literal('.')) + (ident | Literal('*')))

column_alias = Optional(Optional(as_kw).suppress() + ident)
table_alias = Optional(Optional(as_kw).suppress() + ident).suppress()

select_stmt = Forward()
nested_table = lparen.suppress() + select_stmt + rparen.suppress() + table_alias
table_list = delimitedList((nested_table | table) + table_alias)
column_list = delimitedList((nested_table | column) + column_alias)

txt = """
select
       ms.member_sk a,
       dd.date_sk b,
       st.subscription_type,
       (SELECT foo FROM zoo) e
from dim_member_subscription_all p,
     dim_subs_type
where a in (select moo from t10)
"""

select_stmt << select_kw.suppress() + column_list + from_kw.suppress() +  \
               table_list.setResultsName('tables', listAllMatches=True)

print txt

for token in select_stmt.searchString(txt):
    pp.pprint(token.asDict())

我得到以下嵌套输出。谁能帮我理解我做错了什么?

{   'tables': ([(['zoo'], {}), (['dim_member_subscription_all', 'dim_subs_type'], {})], {})}
{   'tables': ([(['t10'], {})], {})}
4

1 回答 1

3

searchString将返回所有匹配的列表ParseResults- 你可以看到tables每个使用的值:

for token in select_stmt.searchString(txt):
    print token.tables

给予:

[['zoo'], ['dim_member_subscription_all', 'dim_subs_type']]
[['t10']]

所以 searchString 找到了两个 SELECT 语句。

最新版本的 pyparsing 支持使用 Python builtin 将此列表汇总为单个合并sum。访问tables此合并结果的值如下所示:

print sum(select_stmt.searchString(txt)).tables

[['zoo'], ['dim_member_subscription_all', 'dim_subs_type'], ['t10']]

我认为解析器正在做你想做的一切,你只需要弄清楚如何处理返回的结果。

为了进一步调试,你应该开始使用dumpParseResults 上的方法来查看你得到了什么,它将打印返回标记的嵌套列表,然后是所有命名结果的层次树。对于您的示例:

for token in select_stmt.searchString(txt):
    print token.dump()
    print

印刷:

['ms.member_sk', 'a', 'dd.date_sk', 'b', 'st.subscription_type', 'foo', 'zoo', 'dim_member_subscription_all', 'dim_subs_type']
- tables: [['zoo'], ['dim_member_subscription_all', 'dim_subs_type']]

['moo', 't10']
- tables: [['t10']]
于 2013-06-02T03:56:12.233 回答