1

我想用 Ruby 中的 Parslet 编写一个解析器,它理解一种简单的配置语法:

alpha = one
beta = two\
three
gamma = four

从解析器的角度来看,反斜杠转义了新行,因此解析时的beta值为twothree。反斜杠虽然在配置文件中(即上面的文本是直接表示 - 它不是您放在 Ruby 字符串引号内的内容)。在 Ruby 中,它可以表示为"alpha = one\nbeta = two\\\nthree\ngamma = four".

我目前的尝试适用于单行设置,但无法处理多行方法:

require "parslet"

class SettingParser < Parslet::Parser
  rule(:term) { match("[a-zA-Z0-9_]").repeat(1) }
  rule(:value) do
    (match("[^\n]").repeat(1) >> match("[^\\\n]") >> str("\\\n")).repeat(0) >>
      match("[^\n]").repeat(0)
  end
  rule(:space) { match("\\s").repeat(1) }
  rule(:setting) do
    term.as(:key) >> space.maybe >> str("=") >> space.maybe >>
      value.as(:value)
  end

  rule(:input) { setting.repeat >> space.maybe }
  root(:input)
end

我想知道这个问题是否与 Parslet 解析事物的方式有关。我的价值规则的第一部分是否在不关心后面部分的上下文的情况下获取尽可能多的字符?

4

2 回答 2

0

是的。Parslet 规则会急切地消耗,因此您需要先匹配转义大小写,然后只有在不匹配的情况下才会消耗非转义字符。

require "parslet"
require "pp"


class SettingParser < Parslet::Parser
  rule(:term) { match("[a-zA-Z0-9_]").repeat(1) }
  rule(:char) { str("\\\n") | match("[^\n]").as(:keep) }
  rule(:value) do
    char.repeat(1)
  end
  rule(:space) { match("\\s").repeat(1) }
  rule(:setting) do
    term.as(:key) >> space.maybe >> str("=") >> space.maybe >>
      value.as(:value) >> str("\n")
  end

  rule(:input) { setting.repeat.as(:settings) >> space.maybe }
  root(:input)
end

s = SettingParser.new

tree =  s.parse("alpha = one\nbeta = two\\\nthree\ngamma = four\n")
pp tree

这会生成以下...

{:settings=>
  [{:key=>"alpha"@0,
    :value=>[{:keep=>"o"@8}, {:keep=>"n"@9}, {:keep=>"e"@10}]},
   {:key=>"beta"@12,
    :value=>
     [{:keep=>"t"@19},
      {:keep=>"w"@20},
      {:keep=>"o"@21},
      {:keep=>"t"@24},
      {:keep=>"h"@25},
      {:keep=>"r"@26},
      {:keep=>"e"@27},
      {:keep=>"e"@28}]},
   {:key=>"gamma"@30,
    :value=>
     [{:keep=>"f"@38}, {:keep=>"o"@39}, {:keep=>"u"@40}, {:keep=>"r"@41}]}]}

在这里,我标记了未转义返回的字符......所以我可以稍后对其进行转换......但您可以捕获包括它们在内的整个字符串并在后期处理中搜索/替换它们。

无论如何...您现在可以通过转换将数据从树中提取出来。

class SettingTransform < Parslet::Transform
    rule(:keep => simple(:c)) {c}
    rule({:key => simple(:k), :value => sequence(:v)}) { {k => v.join} } 
    rule(:settings => subtree(:s)) {s.each_with_object({}){|p,a| a[p.keys[0]] = p.values[0]}}
end

pp SettingTransform.new.apply(tree)
# => {"alpha"@0=>"one", "beta"@12=>"twothree", "gamma"@30=>"four"}

您可能需要添加一些“行尾”逻辑。目前我假设您的配置以“\n”结尾。您可以使用 'any.absent' 检测 EOF(或者总是在末尾添加一个 '\n' ;)

于 2018-10-03T22:44:25.040 回答
0

您需要从setting空间开始规则。

以下片段对我有用。我已经添加ppspace?更好地理解

require "parslet"
require 'pp'

class SettingParser < Parslet::Parser
  rule(:term) { match("[a-zA-Z0-9_]").repeat(1) >> space? }
  rule(:value) do
    (match("[^\n]").repeat(1) >> match("[^\\\n]") >> str("\\\n")).repeat(0) >>
      match("[^\n]").repeat(0)
  end
  rule(:space) { match("\\s").repeat(1) }
  rule(:space?)     { space.maybe }
  rule(:setting) do
    space? >> term.as(:key) >> space? >> str("=") >> space? >>
      value.as(:value)
  end

  rule(:input) { setting.repeat >> space.maybe }
  root(:input)
end

str = %{
alpha = one
beta = two\
three
gamma = four
}

begin
  pp SettingParser.new.parse(str, reporter: Parslet::ErrorReporter::Deepest.new)
rescue Parslet::ParseFailed => error
  puts error.parse_failure_cause.ascii_tree
end

输出是

[{:key=>"alpha "@1, :value=>"one"@9},
 {:key=>"beta "@13, :value=>"twothree"@20},
 {:key=>"gamma "@29, :value=>"four"@37}]
于 2018-01-05T07:09:38.040 回答