0

I am trying to generate a javascript parser for the ebnf grammar described in this Microsoft article. The ebnf specified in the article does not work when I use it as its written, so I have tried to simplify it to make it work with the REx parser generator.

The goal is in Javascript to be able to parse and evaluate expressions like these to True or False:

  • AttributeA > 2 - The value of AttributeA is greater than 2
  • HasCategory(Assembly) - The node has Category Assembly
  • Assembly - The node has Category Assembly
  • HasValue(AttributeA) - The attribute AttributeA has a value. Its not undefined.
  • AttributeA < AttributeB - The value of attribute AttributeA is less than the value of attribute Attribute B
  • IsReference - The value of the attribute IsReference is True
  • AttributeA + 2 > 5 and AttributeB - 5 != 7
  • AttributeA * 1.25 >= 500

I am using the REx parser generator online here: https://bottlecaps.de/rex/. If you have suggestions for other parser generators that produce JavaScript I would appreciate some links to where I can find them.

The issue I'm struggling with is the definition of the MethodCall. I have tried a lot of different definitions but all fail. When I remove the MethodCall and MethodArgs definition the REx parser generator produces a parser.

So I would appreciate any help to crack this problem a lot.

Below is the grammar as far as I have been able to get.

Expression
    ::= BinaryExpression | MethodCall | "(" Expression ")" | Number
BinaryExpression
    ::= RelationalExpression ( ( '=' | '!=' ) RelationalExpression )*
RelationalExpression
    ::= AdditiveExpression ( ( '<' | '>' | '<=' | '>=' | 'and' | 'or' ) AdditiveExpression )*
AdditiveExpression
    ::= MultiplicativeExpression ( ( '+' | '-' ) MultiplicativeExpression )*
MultiplicativeExpression
    ::= UnaryExpression ( ( '*' | '/' | '%' ) UnaryExpression )*
UnaryExpression
    ::= "!" Identifier | "+" Identifier | "-" Identifier | Identifier

MethodCall
    ::= Identifier "(" MethodArgs* ")"
MethodArgs
    ::= Expression | Expression "," MethodArgs

Identifier
    ::= Letter ( Letter | Digit | "_" )*
Number
    ::= Digit ('.' Digit) | (Digit)*

<?TOKENS?>

Letter
    ::= [A-Za-z]
Digit
    ::= [0-9]

Here are some of the different versions for the MethodCall definition I have tried but all of them fail.

MethodCall
    ::= Identifier "(" MethodArgs? ")"
MethodArgs
    ::= Expression ("," MethodArgs)*

MethodCall
    ::= Identifier "(" MethodArgs ")"
MethodArgs
    ::= ( Expression ("," MethodArgs)* )?

MethodCall
    ::= MethodName "(" MethodArgs? ")"
MethodName
    ::= Identifier
MethodArgs
    ::= Expression ("," MethodArgs)*

MethodCall
    ::= Identifier "(" MethodArgs ")"
MethodArgs
    ::= Expression ("," MethodArgs)* | ""

MethodCall
    ::= Identifier MethodArgs
MethodArgs
    ::= "(" (Expression ("," MethodArgs)* | "")  ")"

MethodCall
    ::= Identifier "(" MethodArgs ")"
MethodArgs
    ::= Expression | Expression "," MethodArgs | ""

MethodCall
    ::= Identifier "(" Expression ")"

I have tried to get inspiration from a number of other languages to see how they do it, but with no luck yet so far:

Update

Just wanted to let you know how this turned out. I struggled to get the EBNF grammar to do what I needed it to do, so I took a look at Nearley like @rici suggested and converted my grammar into the Nearley grammar syntax and got the tooling to work. It was just a much better choice for my project. The documentation is very good, the tooling is also great and the error messages is very helpful. So a huge thanks to @rici for suggesting Nearley.

Below is the grammar I have implemented. I have tested with the following inputs:

'2 + 4', '2 + 4 - 6', '(2 + 4)', '!true', '!(true)', 'hasCategory(test)', 'hasCategory(test,test2)', 'hasCategory( test , test2 )', 'hasCategory(test,test2, test3)', 'IsReference', 'IsReference()', '2 * 4', '(2 / 4)', '2 * 4 + 2', '(2 / 4) + 2', '2 > 4', '2 >= 2', '2 = 4', '2 == 2', '2 != 4', '2 !== 2', '(2 * 4 + 2) > 4', '(2 * 4 + 2) > (4 + 10)', 'true', 'true or false', 'true || false', 'true and false', 'true && false', '(true or false)', '!(true or false)', '2 != 1+1', '2 != (1+1)', '2 != (1+2)', '(2 > 2 or (2 != 1+1))',

@builtin "whitespace.ne" # `_` means arbitrary amount of whitespace
@builtin "number.ne"     # `int`, `decimal`, and `percentage` number primitives
@builtin "postprocessors.ne"

@{%
function methodCall(nameId, argsId = -1) {
  return function(data) {
      return {
          type: 'methodCall',
          name: data[nameId],
          args: argsId == -1 ? [] : data[argsId]
      };
    }
}

function value() {
  return function(data) {
      return {
          type: 'value',
          value: data[0]
      };
    }
}
%}
expression -> 
    methodCall {% id %}
  | relationalExpression {% value() %}
  | booleanExpression  {% value() %}
  | _ identifier _  {% methodCall(1) %}

booleanExpression ->
      parentheses {% id %}
    | parentheses _ "and"i _ parentheses {% d => d[0] && d[4] %}
    | parentheses _ "&&" _ parentheses {% d => d[0] && d[4] %}
    | parentheses _ "or"i _ parentheses {% d => d[0] || d[4] %}
    | parentheses _ "||" _ parentheses {% d => d[0] || d[4] %}

parentheses ->
    _ "(" relationalExpression ")" _ {% nth(2) %}
  |  _ "(" booleanExpression ")" _ {% nth(2) %}
  | unaryExpression {% id %}

relationalExpression -> 
      _ additiveExpression _ {% nth(1) %}
    | relationalExpression _ "=" _ additiveExpression {% d => d[0] == d[4] %}
    | relationalExpression _ "==" _ additiveExpression {% d => d[0] == d[4] %}
    | relationalExpression _ "!=" _ additiveExpression {% d => d[0] != d[4] %}
    | relationalExpression _ "!==" _ additiveExpression {% d => d[0] != d[4] %}
    | relationalExpression _ "<" _ additiveExpression {% d => d[0] < d[4] %}
    | relationalExpression _ ">" _ additiveExpression {% d => d[0] > d[4] %}
    | relationalExpression _ "<=" _ additiveExpression {% d => d[0] <= d[4] %}
    | relationalExpression _ ">=" _ additiveExpression {% d => d[0] >= d[4] %}

additiveExpression -> 
    _ multiplicativeExpression _ {% nth(1) %}
  | additiveExpression _ "+" _ multiplicativeExpression {% d => d[0] + d[4] %}
  | additiveExpression _ "-" _ multiplicativeExpression {% d => d[0] - d[4] %}

multiplicativeExpression ->
    _ parentheses _  {% nth(1) %}
  | parentheses _ "*" _ parentheses {% d => d[0] * d[4] %}
  | parentheses _ "/" _ parentheses {% d => d[0] / d[4] %}
  | parentheses _ "%" _ parentheses {% d => d[0] % d[4] %}

unaryExpression ->  
    _ "!" _ expression _ {% d => !d[3] %}
  | _ decimal _ {% nth(1) %}
  | _ unsigned_int _ {% nth(1) %}
  | _ boolean _ {% nth(1) %}
  | _ identifier _ {% nth(1) %}

methodCall -> 
      identifier "(" methodArgs ")" {% methodCall(0, 2) %}
    | identifier "(" _ ")" {% methodCall(0) %}

methodArgs ->
    _ identifier _  {% d => [d[1]] %}
  | _ identifier _ "," _ methodArgs _ {% d => [d[1]].concat(d[5]) %}

boolean ->
    "true"i {% () => true %} 
  | "false"i {% () => false %}

identifier -> 
  [A-Za-z0-9_]:+ {% (data, l, reject) => {
    var ident = data[0].join('');
    if (ident.toLowerCase() === 'true' || ident.toLowerCase() === 'false') {
      return reject;
    } else {
      return ident;
    }
  }
   %}
4

1 回答 1

1

你的语法有一些问题,但大部分都很好。

  • 'and' 和 'or' 与Identifier. Identifier因此,从其规则中减去这些字符串文字。
  • Number缺少括号。它应该是Number ::= Digit ( ('.' Digit) | (Digit)* )
  • 您缺少 EOF 规则。我知道的几乎每个解析器生成器都需要一个底部/EOF 规则来强制消耗整个输入。我添加了规则“输入”。
  • 确保单击“配置”框,然后单击“回溯”。您的语法模棱两可,这很好,但需要您告诉解析器生成器来处理它。

解析器生成器的“EBNF”语法略有不同,这是 REx 所采用的。REx 添加一个<?TOKENS?>字符串来表示解析器和词法分析器规则之间的边界。微软说语法是“BNF”,但这并不是因为它使用了 Kleene operator <Identifier> ::= [^. ]*,一个 EBNF 结构。它还歪曲了散文的定义<Literal><Number>散文。

我没有测试生成的解析器,但它似乎是一个简单的递归下降实现。转换页面中列出了我熟悉且流行的解析器生成器。(我正在为所有这些以及更多内容编写转换器。)

尝试这个:

Input ::= Expression EOF

Expression
    ::= BinaryExpression | MethodCall | "(" Expression ")" | Number

BinaryExpression
    ::= RelationalExpression ( ( '=' | '!=' ) RelationalExpression )*
RelationalExpression
    ::= AdditiveExpression ( ( '<' | '>' | '<=' | '>=' | 'and' | 'or' ) AdditiveExpression )*
AdditiveExpression
    ::= MultiplicativeExpression ( ( '+' | '-' ) MultiplicativeExpression )*
MultiplicativeExpression
    ::= UnaryExpression ( ( '*' | '/' | '%' ) UnaryExpression )*
UnaryExpression
    ::= "!" Identifier | "+" Identifier | "-" Identifier | Identifier

MethodCall
    ::= Identifier "(" MethodArgs* ")"
MethodArgs
    ::= Expression | Expression "," MethodArgs

<?TOKENS?>

Identifier
    ::= ( ( Letter ( Letter | Digit | "_" )* ) - ( 'and' | 'or' ) )
Number ::= Digit ( ('.' Digit) | (Digit)* )

Letter
    ::= [A-Za-z]
Digit
    ::= [0-9]

EOF      ::= $
于 2021-08-20T11:56:19.660 回答