ocaml - sedlex 和 ocaml 不会向 $startpos 和 $endpos 发送相同的位置信息

Question

以前，我有一个由 menhir 和 ocamllex 制作的解析器和词法分析器。在解析器中，我使用$startposand $endpos：

e_expression:
| e_expression PLUS e_expression
  { pffo "Expression:\n%a" Loc.print ($startpos, $endpos);
    pffo "PLUS symbol:\n%a" Loc.print ($startpos($2), $endpos($2));
    EE_loc_function_EEL ([($startpos, $endpos); ($startpos($2), $endpos($2))], Function.PLUS, [$1; $3]) }

其中Loc.print定义为：

let print (chan: out_channel) (pos1, pos2: t) : unit =
  let line = pos1.pos_lnum in
  let char1 = pos1.pos_cnum - pos1.pos_bol in
  let char2 = pos2.pos_cnum - pos1.pos_bol in (* intentionally [pos1.pos_bol] *)
  if !Params.print_lexing then pffo "line %d, characters %d-%d:\n" line char1 char2

在词法分析器中，我具有以下功能来打印 loc 信息：

let debug rule = fun lexbuf ->
  let result = rule lexbuf in
  print_endline (string_of_token result);
  let pos = lexbuf.lex_curr_p in
  pffo "%s:%d:%d\n" pos.pos_fname pos.pos_lnum (pos.pos_cnum - pos.pos_bol + 1);
  result

结果，'a\n+b'返回以下输出：

IDENTIFIER(abc)
:1:4
PLUS
:1:6
IDENTIFIER(d)
:1:7
EOF
:1:7
Expression:
line 1, characters 0-6:
PLUS symbol:
line 1, characters 4-5:

然后，我使用 sedlex 制作词法分析器，我有以下功能来打印 loc 信息：

let debug rule = fun lexbuf ->
  let result = rule lexbuf in
  print_endline (string_of_token result);
  let posS, posE = Sedlexing.lexing_positions lexbuf in
  pffo "%s:(%d:%d) to %s:(%d:%d)\n" 
    posS.pos_fname 
    posS.pos_lnum (posS.pos_cnum - posS.pos_bol + 1)
    posE.pos_fname posE.pos_lnum (posE.pos_cnum - posE.pos_bol + 1);
  result

结果， 'abc\n+d'返回以下输出：

IDENTIFIER(abc)
:(0:1) to :(0:4)
PLUS
:(0:1) to :(0:2)
IDENTIFIER(d)
:(0:2) to :(0:3)
EOF
:(0:3) to :(0:3)
Expression:
line 0, characters 0-6:
PLUS symbol:
line 0, characters 0-1:

注意，因为换行，这里的location of+和location ofd没有很好的计算。结果，我的 loc 信息expression不再好。

有谁知道如何解决这一问题？

ocaml - sedlex 和 ocaml 不会向 $startpos 和 $endpos 发送相同的位置信息

0 回答 0

Related

Reference