.net - 使用 Superpower 解析列表列表

Question

我想解析以如下格式表示的图书馆的书籍：

#Book title 1
Chapter 1
Chapter 2
#Book title 2
Chapter 1
Chapter 2
Chapter 3

如您所见，引导的标题以#开头，每本书的章节如下几行。为此创建解析器应该相当容易。

到目前为止，我有这个代码（解析器+标记器）：

void Main()
{
    var tokenizer = new TokenizerBuilder<PrjToken>()
                    .Match(Superpower.Parsers.Character.EqualTo('#'), PrjToken.Hash)
                    .Match(Span.Regex("[^\r\n#:=-]*"), PrjToken.Text)
                    .Match(Span.WhiteSpace, PrjToken.WhiteSpace)
                    .Build();


    var input = @"#Book 1
Chapter 1
Chapter 2
#Book 2
Chapter 1
Chapter 2
Chapter 3";

    var library = MyParsers.Library.Parse(tokenizer.Tokenize(input));
}


public enum PrjToken
{
    WhiteSpace,
    Hash,
    Text
}


public class Book
{
    public string Title { get; }
    public string[] Chapters { get; }

    public Book(string title, string[] chapters)
    {
        Title = title;
        Chapters = chapters;
    }
}

public class Library
{
    public Book[] Books { get; }

    public Library(Book[] books)
    {
        Books = books;
    }
}


public class MyParsers
{
    public static readonly TokenListParser<PrjToken, string> Text = from text in Token.EqualTo(PrjToken.Text)
                                                                    select text.ToStringValue();

    public static readonly TokenListParser<PrjToken, Superpower.Model.Token<PrjToken>> Whitespace = from text in Token.EqualTo(PrjToken.WhiteSpace)
                                                                                   select text;

    public static readonly TokenListParser<PrjToken, string> Title =
        from hash in Token.EqualTo(PrjToken.Hash)
        from text in Text
        from wh in Whitespace
        select text;

    public static readonly TokenListParser<PrjToken, Book> Book =
        from title in Title
        from chapters in Text.ManyDelimitedBy(Whitespace)
        select new Book(title, chapters);

    public static readonly TokenListParser<PrjToken, Library> Library =
        from books in Book.ManyDelimitedBy(Whitespace)
        select new Library(books);
}

上面的代码已准备好在此链接上的 .NET Fiddle 中运行https://dotnetfiddle.net/3P5dAJ

一切看起来都很好。但是，解析器出了点问题，因为我收到了这个错误：

语法错误（第 4 行，第 1 列）：意外哈希#，预期文本。

我的解析器有什么问题？

score 1 · Accepted Answer

您可以通过将章节解析为单独的列表来解决此问题，其中每一章都以空格字符结尾：

    public static readonly TokenListParser<PrjToken, string> Chapter =
        from chapterName in Text
        from wh in Whitespace
        select chapterName;

    public static readonly TokenListParser<PrjToken, Book> Book =
        from title in Title
        from chapters in Chapter.Many()
        select new Book(title, chapters);

从本质上讲，我认为当Text.ManyDelimitedBy(Whitespace)在结尾遇到尾随空格（换行符）时，Chapter 2它会期望另一个章节名称实例，而不是新书的开头。

解析器无法区分分隔符之间Chapters和分隔符之间Books（都是空格（换行符）），因此它会期待另一个章节，而不是新的开始Book。

通过将一章的解析器分解为Text后跟一个Whitespace标记，您已经打破了这种歧义。

由于您现在已经吞下了Whitespace本章末尾的，因此每本书都没有用 a 分隔Whitespace，您还必须更改Book解析器的工作方式：

    public static readonly TokenListParser<PrjToken, Book> Book =
        from title in Title
        from chapters in Chapter.Many()
        select new Book(title, chapters);

除此之外，如果您希望在文件末尾没有换行符的情况下解析文件，您还必须将Whitespace末尾的Chapter设为可选：

    public static readonly TokenListParser<PrjToken, string> Chapter =
        from chapterName in Text
        from wh in Whitespace.Optional()
        select chapterName;

最后我们得到（完整的解析器）：

public class MyParsers
{
    public static readonly TokenListParser<PrjToken, string> Text = from text in Token.EqualTo(PrjToken.Text)
        select text.ToStringValue();

    public static readonly TokenListParser<PrjToken, Superpower.Model.Token<PrjToken>> Whitespace = from text in Token.EqualTo(PrjToken.WhiteSpace)
        select text;

    public static readonly TokenListParser<PrjToken, string> Title =
        from hash in Token.EqualTo(PrjToken.Hash)
        from text in Text
        from wh in Whitespace
        select text;

    public static readonly TokenListParser<PrjToken, string> Chapter =
        from chapterName in Text
        from wh in Whitespace.Optional()
        select chapterName;

    public static readonly TokenListParser<PrjToken, Book> Book =
        from title in Title
        from chapters in Chapter.Many()
        select new Book(title, chapters);

    public static readonly TokenListParser<PrjToken, Library> Library =
        from books in Book.Many()
        select new Library(books);
}

.net - 使用 Superpower 解析列表列表

1 回答 1

Related

Reference