我是 UIMA Ruta 的初学者。我按照这个将 Ruta 与 Maven 一起使用。我明白这一点,但我想做更多的事情。
下面是我运行这个例子的简单测试用例
/**
* Test Annotators on default input to ascertain their proper functioning.
*
*/
public class TestAnnotator {
/**
* UIMA type holding dates.
*/
public static final String DATE_TYPE = "uima.ruta.example.TestAnnotator.DateValue";
/**
* UIMA type holding Eight Digit Numbers.
*/
public static final String EIGHT_DIGIT_TYPE = "uima.ruta.example.TestAnnotator.EightDigit";
/**
* UIMA type holding Number block.
*/
public static final String NUMBER_VALUE_BLOCK = "uima.ruta.example.TestAnnotator.NumberValueBlock";
/**
*
*/
public static final String NUM = "org.apache.uima.ruta.type.NUM";
/**
* Text to process.
*/
public static final String TEXT = "I will be out of office from September 2nd, 2013 to September 20th, 2013.\n"
+ "Dates like 9/2/2013 and 9/20/13 are also recognized. Eight Digit Number 99192293 and Nine digit Number <Span> 113993910"
+ " <Value> 900839806 </Value> <Coordinates> <x0> 916 </x0> <y0> 268 </y0> <x1> 1069 </x1> <y1> 290 </y1> </Coordinates> <OcrConfidence /></Span>";
/**
* Engine instance used for UIMA analysis
*/
private AnalysisEngine engine;
/**
* CAS instance used for analysis
*/
private CAS cas;
/**
* Sets Annotator Engine
*
* @throws Exception
*/
@Before
public void setup() throws Exception {
System.out.println("Starting Analysis Engine");
engine = AnalysisEngineFactory
.createEngine("com.demo.extraction.uima.annotators.TestAnnotatorEngine");
cas = engine.newCAS();
cas.setDocumentText(TEXT);
engine.process(cas);
}
/**
* Tests the input
*
* @throws Exception
*/
@Test
public void test() throws Exception {
System.out.println("Extracting date from");
System.out.println(TEXT);
for (AnnotationFS date : CasUtil.select(cas, cas.getTypeSystem()
.getType(DATE_TYPE))) {
System.out.println("Found: " + date.getCoveredText());
}
System.out.println("Extracting Number Block from");
System.out.println(TEXT);
for (AnnotationFS number : CasUtil.select(cas, cas.getTypeSystem()
.getType(NUM))) {
System.out.println("Found: " + number.getCoveredText());
}
}
}
这是我下面提到的脚本。
DECLARE DateValue, DateRange, DayValue, MonthValue, YearValue, DateValueBlock;
// Number Related Types
// A date is a month, followed by a day, optionally an ordinal suffix and a year//
//examples : "january 1st 2008", "february 28, 2010"
W{REGEXP("(?i)
(january|february|march|april|may|june|july|august|september|october|november|december)"
) -> MARK(MonthValue)}
NUM{-> MARK(DayValue)}
W?{REGEXP("(?i)(th|st|nd|rd)")}
COMMA?
NUM{-> MARK(YearValue), MARK(DateValue, 1, 5)};
// A date can also be specified in the MM/DD/YYYY format
NUM{-> MARK(MonthValue)} "/" NUM{-> MARK(DayValue)} "/" NUM{-> MARK(YearValue), MARK(DateValue, 1,
5)};// A date range
"from" DateValue "to" DateValue{-> MARK(DateRange, 1, 4)};
正如教程中所建议的那样。我在同一个 Eclipse 工作区中使用不同的 UIMA Ruta 项目(称为示例项目)制作了这个脚本。我将它作为 UIMA ruta 文件执行并获得了相应的描述符文件(TestAnnotator.xml)。我粘贴了相同的内容,只是将以下内容更改为
<imports><import name="org.apache.uima.ruta.engine.BasicTypeSystem"/></imports>
它最初指向 BasicTypeSystem.xml 文件。目前我能够构建它并执行此脚本,但生成的描述符文件(TestAnnotator.xml)使用相应 xml 中的本地文件路径。这些本地文件路径是以前 UIMA 项目(示例项目)中的文件夹。
当我尝试在上述路径中配置脚本文件的运行设置时,即在我的 Maven 项目中,配置无法在给定路径中找到脚本。
我应该如何在 Maven 项目中添加 Ruta 脚本的动态特性?
请帮忙。