parsing - 如何定义语法

Question

我是语言处理的新手，我想用 Irony 创建一个解析器，语法如下：

name1:value1 name2:value2 name3:value ...

其中 name1 是 xml 元素的名称，value 是元素的值，也可以包含空格。

我试图修改包含的样本，如下所示：

    public TestGrammar()
    {
        var name = CreateTerm("name");
        var value = new IdentifierTerminal("value");

        var queries = new NonTerminal("queries");
        var query = new NonTerminal("query");
        queries.Rule = MakePlusRule(queries, null, query);
        query.Rule = name + ":" + value;
        Root = queries;
    }

    private IdentifierTerminal CreateTerm(string name)
    {
        IdentifierTerminal term = new IdentifierTerminal(name, "!@#$%^*_'.?-", "!@#$%^*_'.?0123456789");
        term.CharCategories.AddRange(new[]
                                         {
                                             UnicodeCategory.UppercaseLetter, //Ul
                                             UnicodeCategory.LowercaseLetter, //Ll
                                             UnicodeCategory.TitlecaseLetter, //Lt
                                             UnicodeCategory.ModifierLetter, //Lm
                                             UnicodeCategory.OtherLetter, //Lo
                                             UnicodeCategory.LetterNumber, //Nl
                                             UnicodeCategory.DecimalDigitNumber, //Nd
                                             UnicodeCategory.ConnectorPunctuation, //Pc
                                             UnicodeCategory.SpacingCombiningMark, //Mc
                                             UnicodeCategory.NonSpacingMark, //Mn
                                             UnicodeCategory.Format //Cf
                                         });
        //StartCharCategories are the same
        term.StartCharCategories.AddRange(term.CharCategories);
        return term;
    }

但如果值包含空格，这将不起作用。可以在不修改语法（例如在值周围添加引号）的情况下完成（使用 Irony）吗？

非常感谢！

score 0 · Accepted Answer

If newlines were included between key-value pairs, it would be easily achievable. I have no knowledge of "Irony", but my initial feeling is that almost no parser/lexer generator is going to deal with this given only a naive grammar description. This requires essentially unbounded lookahead.

Conceptually (because I know nothing about this product), here's how I would do it:

Tokenise based on spaces and colons (i.e. every continguous sequence of characters that isn't a space or a colon is an "identifier" token of some sort).

You then need to make it such that every "sentence" is described from colon-to-colon:

sentence = identifier_list
         | : identifier_list identifier : sentence

That's not enough to make it work, but you get the idea at least, I hope. You would need to be very careful to distinguish an identifier_list from a single identifier such that they could be parsed unambiguously. Similarly, if your tool allows you to define precedence and associativity, you might be able to get away with making ":" bind very tightly to the left, such that your grammar is simply:

sentence = identifier : identifier_list

And the behaviour of that needs to be (identifier :) identifier_list.

parsing - 如何定义语法

1 回答 1

Related

Reference