0

我正在解析荷兰语维基百科,它包含以下类别标记:

[Categorie:Nederlands beeldhouwer]]

但是英文维基百科使用以下标记:

[[Category:Japanese diplomats]]

因此,标记(类别/类别)取决于语言。是否可以将Lucene WikipediaTokenizer用于非英语 wiki?如果可能,怎么做?

4

1 回答 1

0

I think wikipedia markups are language dependent, API results also will be different by languages.

As per http://www.mediawiki.org/wiki/API I did quick experiment with same query and got different results for http://en.wikipedia.org/w/api.php and http://nl.wikipedia.org/w/api.php

LuceneWikipediaTokenizer is extension of StandardTokenizer thus it should support and index all languages.

于 2013-06-04T21:10:45.580 回答