我正在尝试使用PEG.js解析类别文件
如何对类别进行分组(一组非空行后跟一个空行)
stopwords:fr:aux,au,de,le,du,la,a,et,avec
synonyms:en:flavoured, flavored
synonyms:en:sorbets, sherbets
en:Artisan products
fr:Produits artisanaux
< en:Artisan products
fr:Gressins artisanaux
en:Baby foods
fr:Aliments pour bébé, aliment pour bébé, alimentation pour bébé, aliment bébé, alimentation bébé, aliments bébé
< en:Baby foods
fr:Céréales pour bébé, céréales bébé
< en:Whisky
fr:Whisky écossais
es:Whiskies escoceses
wikipediacategory:Q8718387
现在我可以用这段代码逐行解析:
start = stopwords* synonyms* category+
language_and_words = l:[^:]+ ":" w:[^\n]+ {return {language: l.join(''), words: w.join('')};}
stopwords = "stopwords:" w:language_and_words "\n"+ {return {stopwords: w};}
synonyms = "synonyms:" w:language_and_words "\n"+ {return {synonyms: w};}
category_line = "< "? w:language_and_words "\n"+ {return w;}
category = c:category_line+ {return c;}
我有:
{
"language": "en",
"words": "Artisan products"
},
{
"language": "fr",
"words": "Produits artisanaux"
}
但我想要(对于每个组):
{
{
"language": "en",
"words": "Artisan products"
},
{
"language": "fr",
"words": "Produits artisanaux"
}
}
我也试过这个,但它没有分组,我在一些行的开头得到了 \n 。
category_line = "< "? w:language_and_words "\n" {return w;}
category = c:category_line+ "\n" {return c;}