4

我正在编写一个语法来解析 HTSQL 语法,并且一直在思考如何处理/段和除法运算符的字符重用。所描述的语法不是很正式,所以我一直在关注 Python 实现的确切输出,粗略一看,它似乎是一个手写的解析器,而不是使用解析器生成器 - 供参考我的解析器生成器当前使用的CL-YACCCL-LEX. (FWIW 完整的东西在这里,虽然可能有点过时了。)

我正在努力解决的一个歧义是由于"/1"被解析为'(:COLLECT (:INTEGER "1"))',但"/1/2"被解析为'(:COLLECT (:OPERATOR / (:INTEGER "1") (:INTEGER "2")))',即一个是段分隔符,另一个是除法;"/1//2"又是'(:COLLECT (:OPERATOR / (:INTEGER "1") (:COLLECT (:INTEGER "2"))))'

因此,问题是,我如何在不求助于手动解析器的情况下在语法规范中处理这个问题?切换到不同的解析器生成器类(而不是 LALR(1))会有帮助吗?

到目前为止,我已经尝试了语法的不同变体,但是整个语法的优先级是固定的这一事实也干扰了斜杠的两种解释。我尝试的另一种方法是在词法分析器中消除歧义,即以不同的方式处理第一个斜杠(在每个“组”中)并返回不同的符号,例如DIV- 但是我找不到一个好的规则并且以某种方式怀疑它是否存在纯粹是看词汇结构。

最后,我很想通过完全偏离给定的解析器来解决这个问题,以使我的生活更轻松。从某种意义上说,具有可预测的语法更容易理解,您是否认为这更可取?

图表 1,检查解析树的 Python 脚本:

import htsql


application = htsql.HTSQL("sqlite:///htsql_demo.sqlite")


global y
y = None


def p(string):
    global y
    with application:
        y = htsql.core.syn.parse.parse(string)
        return y


def l(name):
    result = []
    for c in name:
        if c.isupper() and result:
            result.append("-")
        result.append(c)
    return "".join(result)


def keyword(name):
    return ":{}".format(name.upper())


def n(expression):
    name = expression.__class__.__name__
    name = name[:name.find("Syntax")]
    return keyword(l(name))


def t(expression):
    arguments = [n(expression)]
    d = expression.__dict__
    if "identifier" in d:
        arguments.append(t(expression.identifier))
    if "text" in d:
        arguments.append("\"{}\"".format(expression.text))
    if "symbol" in d:
        if not isinstance(expression, (htsql.core.syn.syntax.ProjectSyntax, htsql.core.syn.syntax.FilterSyntax, htsql.core.syn.syntax.CollectSyntax, htsql.core.syn.syntax.DetachSyntax)):
            arguments.append(expression.symbol)
    if "arm" in d:
        arguments.append(t(expression.arm))
    if "larm" in d:
        arguments.append(t(expression.larm))
    if "rarm" in d:
        arguments.append(t(expression.rarm))
    if "arms" in d:
        arguments.extend(t(x) for x in expression.arms)
    if "rarms" in d:
        arguments.extend(t(x) for x in expression.rarms)
    return "({})".format(" ".join(arguments))


# t(p("/school"))
# '(:COLLECT (:IDENTIFIER "school"))

# t(p("/'school'"))
# '(:COLLECT (:STRING "school"))

图表 2,我当前的解析器,它没有正确处理这个问题:

(defpackage #:cl-htsql
  (:use #:cl #:alexandria #:cl-lex #:yacc)
  (:import-from #:arnesi #:with-collector))

(eval-when (:compile-toplevel :load-toplevel :execute)
  (defun maybe-intern (name &optional (package NIL package-p))
    "If NAME is a SYMBOL, return it, otherwise INTERN it."
    (cond
      ((symbolp name)
       name)
      (package-p
       (intern name package))
      (T
       (intern name))))

  (defmacro define-lexer (name &body patterns)
    "Shortcut for DEFINE-STRING-LEXER."
    `(define-string-lexer ,name
       ,@(mapcar
          (lambda (pattern)
            (etypecase pattern
              ((or symbol string)
               (let ((symbol (maybe-intern pattern))
                     (pattern (string pattern)))
                 `(,pattern (return (values ',symbol ',symbol)))))
              (list
               (destructuring-bind (pattern &optional symbol value) pattern
                 (let* ((symbol (or symbol (intern pattern)))
                        (value (or value symbol)))
                   (etypecase symbol
                     (list
                      `(,pattern ,symbol))
                     (symbol
                      `(,pattern (return (values ',symbol ',value))))))))))
          patterns))))

;; parser are results are to be treated immutable
(define-lexer string-lexer
  /
  ("\\|" \|)
  ("\\&" &)
  <=
  >=
  ==
  =
  !==
  !=
  !~
  !
  ~
  <
  >
  @
  ("\\?" ?)
  ("\\." \.)
  ("\\(" \()
  ("\\)" \))
  ("\\+" +)
  -
  ("\\*" *)
  \:
  ("-?0|[1-9][0-9]*(\\.[0-9]*)?([eE][+-]?[0-9]+)?"
   (return (cond
             ((find #\e $@)
              (values 'float $@))
             ((find #\. $@)
              (values 'decimal $@))
             (T
              (values 'integer $@)))))
  ("([^\"\\.\\?~\'=<>\\(\\)@\\|\\&/:])+" (return (values 'name $@)))
  ("\'([^\\\']|\\.)*?\'" (return (values 'string (string-trim "\'" $@))))
  ("\"([^\\\"]|\\.)*?\"" (return (values 'string (string-trim "\"" $@)))))

(define-parser *expression-parser*
  (:muffle-conflicts (44 0))
  (:start-symbol query)
  (:terminals (|\|| #+(or)div & ! |.| ? / = != !== !~ ~ < > == <= >= \( \) + - * @ name integer decimal float string))
  (:precedence ((:left @) (:left ~) (:left |.|) (:left + -) (:left * div) (:left = != == !== ~ !~ < <= > >=) (:left !) (:left &) (:left |\||) (:left ?) (:left /)))

  (query
   segment)

  (segment
   (/ segment (lambda (x y) (declare (ignore x)) (if (eq y :skip) '(:skip) `(:collect ,y))))
   skip
   group)

  (skip
   ((constantly :skip)))

  (group
   (\( segment \) (lambda (x y z) (declare (ignore x z)) `(:group ,y)))
   sieve)

  (sieve
   (segment ? segment (lambda (x y z) (declare (ignore y)) `(:filter ,x ,z)))
   or)

  (or
   (segment |\|| segment (lambda (x y z) `(:operator ,y ,x ,z)))
   and)

  (and
   (segment & segment (lambda (x y z) `(:operator ,y ,x ,z)))
   not)

  (not
   (! segment (lambda (x y) `(:prefix ,x ,y)))
   comparison)

  (comparison
   (segment = segment (lambda (x y z) `(:operator ,y ,x ,z)))
   (segment != segment (lambda (x y z) `(:operator ,y ,x ,z)))
   (segment ~ segment (lambda (x y z) `(:operator ,y ,x ,z)))
   (segment !~ segment (lambda (x y z) `(:operator ,y ,x ,z)))
   (segment == segment (lambda (x y z) `(:operator ,y ,x ,z)))
   (segment !== segment (lambda (x y z) `(:operator ,y ,x ,z)))
   (segment < segment (lambda (x y z) `(:operator ,y ,x ,z)))
   (segment <= segment (lambda (x y z) `(:operator ,y ,x ,z)))
   (segment > segment (lambda (x y z) `(:operator ,y ,x ,z)))
   (segment >= segment (lambda (x y z) `(:operator ,y ,x ,z)))
   addition)

  (addition
   (segment + segment (lambda (x y z) `(:operator ,y ,x ,z)))
   (segment - segment (lambda (x y z) `(:operator ,y ,x ,z)))
   multiplication)

  (multiplication
   (segment * segment (lambda (x y z) `(:operator ,y ,x ,z)))
   (segment / segment (lambda (x y z) (declare (ignore y)) `(:operator / ,x ,z)))
   composition)

  (composition
   (segment |.| segment (lambda (x y z) (declare (ignore y)) `(:compose ,x ,z)))
   attach)

  (attach
   (segment @ segment (lambda (x y z) (declare (ignore y)) `(:attach ,x ,z)))
   detach)

  (detach
   (@ segment (lambda (x y) (declare (ignore x)) `(:detach ,y)))
   term)

  (term
   (name (lambda (x) `(:identifier ,x)))
   (string (lambda (x) `(:string ,x)))
   (number (lambda (x) `(:integer ,x)))
   (integer (lambda (x) `(:integer ,x)))
   (decimal (lambda (x) `(:decimal ,x)))
   (float (lambda (x) `(:float ,x)))))

(defun make-lexer-for-source (source)
  "Make a lexer for the SOURCE, either a STRING or a STREAM."
  (etypecase source
    (string (string-lexer source))
    (stream
     (flet ((ignore (c)
              (declare (ignore c))))
       (stream-lexer #'read-line #'string-lexer #'ignore #'ignore)))))

(defun lex-source (source)
  "Debug helper to lex a SOURCE into a list of tokens."
  (let ((lexer (make-lexer-for-source source)))
    (loop
      for (x y) = (multiple-value-list (funcall lexer))
      while x
      collect (list x y))))

(define-condition htsql-parse-error (simple-error) ())

(defun translate-yacc-error (error)
  (make-condition
   'htsql-parse-error
   :format-control "Couldn't parse HTSQL query: ~A."
   :format-arguments (list error)))

(defun parse-htsql-query (source)
  "Parse SOURCE into a syntax tree.  The SOURCE may be either a STRING or
a STREAM."
  (handler-case
      (parse-with-lexer
       (make-lexer-for-source source)
       *expression-parser*)
    (yacc-parse-error (error)
      (error (translate-yacc-error error)))))

;; > (parse-htsql-query "/1/")
;; (:OPERATOR / (:COLLECT (:INTEGER "1")) :SKIP)
;; > (parse-htsql-query "/1/2")
;; (:OPERATOR / (:COLLECT (:INTEGER "1")) (:INTEGER "2"))
4

1 回答 1

3

如果您查看运算符列表,您会发现还有另一种情况,即同一符号用作二元运算符和一元运算符,但优先级不同:一元减号。这在表达式语言中是很正常的,而 yacc 和大多数 yacc 派生词提供了一个解决方案:为产生式显式优先声明。在 yacc 中,你可以写

%token UNARY_MINUS UNARY_SLASH

%left UNARY_SLASH
%left '-' '+'
%left '/' '*'
%left UNARY_MINUS
%%
expr: '/' expr %prec UNARY_SLASH
    | expr '*' expr
    | expr '/' expr
    | expr '+' expr
    | expr '-' expr
    | '-' expr %prec UNARY_MINUS

我不知道 CL-YACC 是否提供等价物;我在文档中没有找到任何东西。

您没有义务使用优先声明,我什至不相信它们是一个好主意。以明确的方式说出您的意思只是稍微复杂一点:

term: ID | NUMBER | ... | '(' expr0 ')'
expr0: '/' expr0
     | expr1
expr1: expr1 '+' expr2
     | expr1 '-' expr2
     | expr2
expr2: expr2 '/' expr3
     | expr2 '*' expr3
     | expr3
expr3: '-' expr3
     | term

以上是明确的,根本不需要优先声明。


(非正式的)HTSQL 语法说“段”运算符的语法是/ T而不是/ expr,但我没有看到任何迹象表明 aT可能是什么。在我看来,T是一个术语,或者可能是一个 lexpr ( T := expr),它1 / 2不太可能成为候选对象。但是,正如你所说,这是非正式的。

于 2016-05-22T14:59:29.297 回答