我发现使用moo-lexer使我的语法更简单,因此我通常花更少的时间来修复模棱两可的语法。
我不是设计语法的专家,但这就是我要做的:
词法分析器
word
将匹配一系列字符
comma
将匹配" , "
," ,"
和", "
.","
space
将匹配一个空格" "
period
将匹配一个时期"."
nl
将匹配一个或多个换行符。
const moo = require('moo');
const lexer =
moo.compile
( { word: /[a-zA-Z]+/
, comma:/ ?, ?/
, space: / /
, period: /\./
, nl: {match: /\n+/, lineBreaks: true}
}
);
module.exports = lexer;
语法网
这里我们说:
- 一个文本有一个或多个句子
- 换行符可以出现在每个句子之前和之后
- 一个句子可以
%word
以a%comma
或 a的序列开头,%space
并且必须以 a%word
后跟a 结尾%period.
所有的后处理规则都是扁平化标记列表并.value
从标记中提取,以便我们最终得到单词列表。
@{% const lexer = require("./lexer.js"); %}
@lexer lexer
text
-> %nl sentence:+ {% ([_, sentences]) => sentences %}
sentence
-> seq:* %word %period %nl {% ([seq, w, p, n]) => [...seq, w.value] %}
seq
-> (%word %space) {% ([[w]]) => w.value %}
| (%word %comma) {% ([[w]]) => w.value %}
此语法允许解析此文本:
After breakfast, I went to work.
After lunch , I went to my desk.
After the pub,I went home.
sleep.
例子:
const nearley = require('nearley');
const grammar = require('./grammar.js');
const parser = new nearley.Parser(nearley.Grammar.fromCompiled(grammar));
parser.feed(`
After breakfast, I went to work.
After lunch , I went to my desk.
After the pub,I went home.
sleep.
`);
if (parser.results.length > 1) throw new Error('grammar is ambiguous');
JSON.stringify(parser.results[0], null, 2);
输出:
[
[
"After",
"breakfast",
"I",
"went",
"to",
"work"
],
[
"After",
"lunch",
"I",
"went",
"to",
"my",
"desk"
],
[
"After",
"the",
"pub",
"I",
"went",
"home"
],
[
"sleep"
]
]