1

I'm having a bit of a problem with Ragel, mostly due to still trying to grasp how the whole thing works.

I'm trying to make a simple parser for a language similar to SQL (but less flexible), where you have functions (all uppercase), identifiers (all lowercase) and where you could nest functions within functions.

Here's what I have so far:

#include <stdlib.h>
#include <string.h>
#include <stdio.h>

typedef struct Parser {
  int current_line;
  int nesting;

  /* Ragel FSM */
  int cs;
  const char *ts;
  const char *te;
  int act;
} Parser;

%%{
  machine gql;
  access parser->;

  Function    = [A-Z][A-Z_]+ ;

  Identifier  = [a-z][a-z_]+ ;
  Integer     = [0-9]+ ;

  Parameter = ( Identifier | Integer )+ ;

  WhiteSpace  = [ \t\r\n] ;

  action function_call {
    parser->nesting++;
    printf("FUNCTION CALL\n");
  }

  action function_finish {
    parser->nesting--;
    printf("FUNCTION FINISH!\n");
  }

  action function_add_identifier {
    printf("FUNCTION ADD IDENTIFIER\n");
  }

  FunctionCall =
    Function @function_call WhiteSpace* "("
      Parameter %function_add_identifier
      ( WhiteSpace* ',' WhiteSpace* Parameter %function_add_identifier )* WhiteSpace*
    %function_finish ')' ;

  main := FunctionCall ;
}%%


%% write data;

void Parser_Init(Parser *parser) {
  parser->current_line  = 1;
  parser->nesting       = 0;
  %% write init;
}

void Parser_Execute(Parser *parser, const char *buffer, size_t len) {
  if(len == 0) return;

  const char *p, *pe, *eof;
  p   = buffer;
  pe  = buffer+len;
  eof = pe;

  %% write exec;
}

int main(int argc, char *argv[]) {
  Parser *parser = malloc(sizeof(Parser));
  Parser_Init(parser);

  printf("Parsing:\n%s\n\n\n", argv[1]);

  Parser_Execute(parser, argv[1], sizeof(argv[1]));

  printf("Parsed %d lines\n", parser->current_line);
  return 0;
}

It is calling the function_call action once per character, not picking up the Parameters, and I can't think how to make functions work inside functions.

Any tips on what I'm doing wrong here?

4

1 回答 1

4

标准方法是创建一个词法分析器(用 Ragel 或 GNU Flex 编写)来标记您的语言输入。令牌然后被能够解析递归结构(例如嵌套函数)的解析器(不是用 Ragel 编写的)使用 - 使用像 GNU Bison 这样的解析器生成器。

请注意,Ragel 包括(作为高级功能)用于管理堆栈的指令(使您能够解析递归结构) - 但这样您就离开了您在 ragel 规范中使用的常规语言的领域。因此,您可以编写一个能够使用 Ragel 完全解析嵌套函数的解析器。但是一个适当分层的架构(第一层:词法分析器,第二层:解析器,...)简化了任务,即部件更易于调试、测试和维护。

于 2014-05-28T20:29:15.827 回答