parsing - PEGKit 在堆栈上组合匹配的符号

Question

我正在为 PEGKit 编写语法来解析 Twine 导出的 Twee 文件。这是我第一次使用 PEGKit，我正在尝试了解它的工作原理。

我有我正在解析的这个 twee 源文件

:: Passage One
P1 Line One
P1 Line Two

:: Passage Two
P2 Line One
P2 Line Two

目前我已经研究出如何使用以下语法解析上述内容

@before {
    PKTokenizer *t = self.tokenizer;
    [t.symbolState add:@"::"];
    [t.commentState addSingleLineStartMarker:@"::"];

    // New lines as symbols
    [t.whitespaceState setWhitespaceChars:NO from:'\n' to:'\n'];
    [t.whitespaceState setWhitespaceChars:NO from:'\r' to:'\r'];
    [t setTokenizerState:t.symbolState from:'\n' to:'\n'];
    [t setTokenizerState:t.symbolState from:'\r' to:'\r'];
}

start                   = passage+;
passage                 = passageTitle contentLine*;
passageTitle            = passageStart Word+ eol+;
contentLine             = singleLine eol+;
singleLine              = Word+;
passageStart            = '::'!;
eol                     = '\n'! | '\r'!;

我得到的结果是

[Passage, One, P1, Line, One, P1, Line, Two, Passage, Two, P2, Line, One, P2, Line, Two]::/Passage/One/
/P1/Line/One/
/P1/Line/Two/
/
/::/Passage/Two/
/P2/Line/One/
/P2/Line/Two/
^

理想情况下，我希望解析器将匹配的单词组合passageTitle成一个字符串，类似于内置 PEGKitQuotedString语法的工作方式。我还希望将与 a 匹配的单词也contentLine组合起来。

所以，最终，我会把它放在堆栈上

[Passage One, P1 Line One, P1 Line Two, Passage Two, P2 Line One, P2 Line Two]

任何关于如何实现这一目标的想法将不胜感激。

score 2 · Accepted Answer

PEGKit的创建者在这里。

我了解您的最终策略（将行收集/组合为单个字符串对象），并同意这是有道理的，但是，我不同意您提出的实现该策略的策略（更改标记化以尝试将本质上是多个单独的标记组合成单个令牌）。

将行组合成方便的字符串对象是有意义的，但是当有问题的行没有明显的“括号”字符（如引号）时，改变标记化以实现这一点是没有意义的 IMO（至少不使用递归下降解析工具包 PEGKit）或括号。

您可以将passageTitle开头的行::视为单行Comment标记，但我可能不会，因为我认为它们在语义上不是注释。

因此，不要通过标记器合并多个标记，您应该以对 PEGKit 更自然的方式合并多个标记：在解析器委托回调中。

我们这里有两种不同的情况需要处理：

passageTitle线条_
contentLine线条_

在您的语法中，删除此行，这样我们就不会将passageTitles 视为Comment标记（无论如何您没有完全正确配置，但不要介意）：

[t.commentState addSingleLineStartMarker:@"::"];

并且在你的语法中，!从你的passageStart规则中删除，这样这些标记就不会被丢弃：

passageStart            = '::';

这就是语法的全部内容。现在在您的 Parser Delegate 回调中，为标题和内容行实现两个必要的回调方法。并且在每个回调中，将所有必要的标记从PKAssembly' 的堆栈中拉出，并将它们合并成一个字符串（反向）。

@interface TweeDelegate : NSObject
@end

@implementation TweeDelegate

- (void)parser:(PKParser *)p didMatchPassageTitle:(PKAssembly *)a {
    NSArray *toks = [a objectsAbove:[PKToken tokenWithTokenType:PKTokenTypeSymbol stringValue:@"::" doubleValue:0.0]];
    [a pop]; // discard `::`

    NSMutableString *buf = [NSMutableString string];

    for (PKToken *tok in [toks reverseObjectEnumerator]) {
        [buf appendFormat:@"%@ ", tok.stringValue];
    }

    CFStringTrimWhitespace((CFMutableStringRef)buf);

    NSLog(@"Title: %@", buf); // Passage One
}

- (void)parser:(PKParser *)p didMatchContentLine:(PKAssembly *)a {
    NSArray *toks = [a objectsAbove:nil];

    NSMutableString *buf = [NSMutableString string];

    for (PKToken *tok in [toks reverseObjectEnumerator]) {
        [buf appendFormat:@"%@ ", tok.stringValue];
    }

    CFStringTrimWhitespace((CFMutableStringRef)buf);

    NSLog(@"Content: %@", buf); // P1 Line One
}

@end

我收到以下输出：

Title: Passage One
Content: P1 Line One
Content: P1 Line Two
Title: Passage Two
Content: P2 Line One
Content: P2 Line Two

至于创建这些字符串后如何处理它们，我会留给你:)。希望有帮助。

parsing - PEGKit 在堆栈上组合匹配的符号

1 回答 1

Related

Reference