6

I'm shopping for an open-source framework for writing natural language grammar rules for pattern matching over annotations. You could think of it like regexps but matching at the token rather than character level. Such a framework should enable the match criteria to reference other attributes attached to the input tokens or spans, as well as modify such attributes in an action.

There are three options I know of which fit this description:

Are there any other options like these available at this time?

Related Tools

  • While I know that general parser generators like Antlr can also serve this purpose, I'm looking for something which are more specifically tailored for natural language processing or information extraction.
  • UIMA includes a Regex Annotator plugin for declaring rules in XML, but appears to operate at the character rather than high-level objects.
  • I know that this kind of task is often performed with statistical models, but for narrow, structured domains there's benefit in hand-crafting rules.

* With GExp 'rules' are actually implemented in code but since there are so few options I chose to include it.

4

2 回答 2

0

您也可以检查 HTQL。它支持令牌的正则表达式搜索。从美国地址搜索州和邮编的示例是:

a=htql.RegEx(); 
a.setNameSet('states', states);
a.reSearchList(address.split(), r"&[ws:states]<,>?<\d{5}>", case=False) 
于 2013-08-25T00:26:50.863 回答
0

来自 University Paris East 的法国学术软 Unitex 也符合您的描述(http://www-igm.univ-mlv.fr/~unitex/

它基于 C++,包含许多可选的预处理规则和 20 多种语言的词典。

GUI 是基于图形的(您设计自动机,即“语法”)。

于 2014-02-04T10:59:15.387 回答