parsing - 如何使用 PEG.js 对非空行进行分组

Question

如何对类别进行分组（一组非空行后跟一个空行）

stopwords:fr:aux,au,de,le,du,la,a,et,avec

synonyms:en:flavoured, flavored

synonyms:en:sorbets, sherbets

en:Artisan products
fr:Produits artisanaux

< en:Artisan products
fr:Gressins artisanaux

en:Baby foods
fr:Aliments pour bébé, aliment pour bébé, alimentation pour bébé, aliment bébé, alimentation bébé, aliments bébé

< en:Baby foods
fr:Céréales pour bébé, céréales bébé

< en:Whisky
fr:Whisky écossais
es:Whiskies escoceses
wikipediacategory:Q8718387

现在我可以用这段代码逐行解析：

start = stopwords* synonyms* category+

language_and_words = l:[^:]+ ":" w:[^\n]+ {return {language: l.join(''), words: w.join('')};}

stopwords = "stopwords:" w:language_and_words "\n"+ {return {stopwords: w};}

synonyms = "synonyms:" w:language_and_words "\n"+ {return {synonyms: w};}

category_line = "< "? w:language_and_words "\n"+ {return w;}

category = c:category_line+ {return c;}

我有：

{
    "language": "en",
    "words": "Artisan products"
},
{
    "language": "fr",
    "words": "Produits artisanaux"
}

但我想要（对于每个组）：

{
    {
        "language": "en",
        "words": "Artisan products"
    },
    {
        "language": "fr",
        "words": "Produits artisanaux"
    }
}

我也试过这个，但它没有分组，我在一些行的开头得到了 \n 。

category_line = "< "? w:language_and_words "\n" {return w;}

category = c:category_line+ "\n" {return c;}

score 0 · Accepted Answer

我找到了部分解决方案：

start = category+

word = c:[^,\n]+ {return c.join('');}

words = w:word [,]? {return w.trim();}

parent = p:"< "? {return (p !== null);}

line = p:parent w:words+ "\n" {return {parent: p, words: w};}

category = l:line+ "\n"? {return l;}

我可以解析这个...

< fr:a,b
fr:aa,bb

en:d,e,f
fr:dd,ee, ffff

并分组：

[
    [ {...}, {...} ],
    [ {...}, {...} ]
]

但是每个类别开头的“lang：”都有问题，如果我尝试解析“lang：”我的类别没有分组......

score 0 · Accepted Answer

我发现迭代地分解解析很有用（问题分解，老派 à la Wirth）。这是我认为可以让您朝着正确方向前进的部分解决方案（我没有解析Line类别的元素。

start = 
  stopwords 
  synonyms 
  category+

category "category"
  = category:(Line)+ categorySeparator { return category }

stopwords "stopwords"
  = stopwordLine*

stopwordLine "stopword line"
  = stopwordLine:StopWordMatch EndOfLine* { return stopwordLine }

StopWordMatch 
  = "stopwords:" match:Text { return match }

synonyms "stopwords"
  = synonymLine*

synonymLine "stopword line"
  = synonymLine:SynonymMatch EndOfLine* { return synonymLine }

SynonymMatch 
  = "synonyms:" match:Text { return match }

Line "line"
  = line:Text [\n] { return line }

Text "text"
  = [^\n]+ { return text() }

EndOfLine "(end of line)"
  = '\n'

EndOfFile 
  = !. { return "EOF"; }

categorySeparator "separator"
  = EndOfLine EndOfLine* / EndOfLine? EndOfFile

我对混合大小写的使用是任意的，而且不是很时尚。还有一种在线保存解决方案的方法：http: //peg.arcanis.fr/2WQ7CZ/

parsing - 如何使用 PEG.js 对非空行进行分组

2 回答 2

Related

Reference