erlang - 在 Elixir/Erlang 中使用 Yecc 解析器将项目附加到映射

Question

我正在尝试在 Elixir 中使用 Leex/Yecc 解析特定的日志文件。几个小时后，我得到了最简单的工作方案。但是我想进行下一步，但我不知道该怎么做。

首先，这里是一个日志格式的例子：

[!] plugin error detected
 |  check the version of the plugin

我的简单尝试仅使用第一行，但它们的多个条目，例如：

[!] plugin error detected
[!] plugin error 2 detected
[!] plugin error 3 detected

这很有效，并给了我一张包含文本和日志行类型（警告）的漂亮地图：

iex(20)> LogParser.parse("[!] a big warning\n[!] another warning")
[%{text: "a big warning", type: :warning},
 %{text: "another warning", type: :warning}]

那是完美的。但如上所示，日志行可以在下一行继续，用竖线字符表示|。我的词法分析器有管道字符，解析器可以理解它，但我想要的是下一行附加到text我的地图值。现在它只是作为字符串附加到地图中。所以而不是：

[%{text: "a big warning ", type: :warning}, " continues on next line"]

我需要：

[%{text: "a big warning continues on next line", type: :warning}]

我查看了网上的示例，但其中大多数都有非常明确的“结束”标记，例如结束标记或结束括号，然后我仍然不清楚如何添加属性，因此最终的 AST 是正确的.

为了完整起见，这是我的词法分析器：

Definitions.

Char          = [a-zA-Z0-9\.\s\,\[\]]
Word          = [^\t\s\.#"=]+
Space         = [\s\t]
New_Line      = [\n]
%New_Line      = \n|\r\n|\r
Type_Regular  = \[\s\]\s
Type_Warning  = \[!\]\s
Pipe          = \|

Rules.

{Type_Regular}  : {token, {type_regular,  TokenLine}}.
{Type_Warning}  : {token, {type_warning,  TokenLine}}.
{Char}          : {token, {char, TokenLine, TokenChars}}.
{Space}         : skip_token.
{Pipe}          : {token, {pipe, TokenLine}}.
{New_Line}      : skip_token.

Erlang code.

还有我的解析器：

Nonterminals lines line line_content chars.
Terminals type_regular type_warning char pipe.
Rootsymbol lines.

lines -> line lines : ['$1'|['$2']].
lines -> line : '$1'.

line -> pipe line_content : '$2'.
line -> type_regular line_content : #{type => regular, text => '$2'}.
line -> type_warning line_content : #{type => warning, text => '$2'}.

line_content -> chars : '$1'.
line_content -> pipe chars : '$1'.

chars -> char chars : unicode:characters_to_binary([get_value('$1')] ++ '$2').
chars -> char : unicode:characters_to_binary([get_value('$1')]).

Erlang code.

get_value({_, _, Value}) -> Value.

如果你能走到这一步，已经谢谢你了！如果有人能帮忙，那就更感谢了！

score 1 · Accepted Answer

我建议添加一个line_content规则来处理由管道分隔的多行并删除该规则line -> pipe line_content : '$2'.。

您在子句中也有一个不必要的[]周围，单行子句应该返回一个列表以与前一个子句的返回值一致，因此您不会得到不正确的列表。'$2'lines

有了这四个变化，

-lines -> line lines : ['$1'|['$2']].
+lines -> line lines : ['$1'|'$2'].
-lines -> line : '$1'.
+lines -> line : ['$1'].

-line -> pipe line_content : '$2'.
 line -> type_regular line_content : #{type => regular, text => '$2'}.
 line -> type_warning line_content : #{type => warning, text => '$2'}.

 line_content -> chars : '$1'.
-line_content -> pipe chars : '$1'.
+line_content -> line_content pipe chars : <<'$1'/binary, '$3'/binary>>.

我可以很好地解析多行文本：

Belino.parse("[!] Look at the error")
Belino.parse("[!] plugin error detected
 | check the version of the plugin")
Belino.parse("[!] a
 | warning
 [ ] a
 | regular
 [ ] another
 | regular
 [!] and another
 | warning")

输出：

[%{text: "Look at the error", type: :warning}]
[%{text: "plugin error detected  check the version of the plugin",
   type: :warning}]
[%{text: "a  warning ", type: :warning}, %{text: "a  regular ", type: :regular},
 %{text: "another  regular ", type: :regular},
 %{text: "and another  warning", type: :warning}]

erlang - 在 Elixir/Erlang 中使用 Yecc 解析器将项目附加到映射

1 回答 1

Related

Reference