2

我是 UIMA Ruta 的初学者。我按照这个将 Ruta 与 Maven 一起使用。我明白这一点,但我想做更多的事情。

下面是我运行这个例子的简单测试用例

/**
* Test Annotators on default input to ascertain their proper functioning.
* 
*/

public class TestAnnotator {

/**
 * UIMA type holding dates.
 */
public static final String DATE_TYPE = "uima.ruta.example.TestAnnotator.DateValue";

/**
 * UIMA type holding Eight Digit Numbers.
 */
public static final String EIGHT_DIGIT_TYPE = "uima.ruta.example.TestAnnotator.EightDigit";

/**
 * UIMA type holding Number block.
 */
public static final String NUMBER_VALUE_BLOCK = "uima.ruta.example.TestAnnotator.NumberValueBlock";

/**
 * 
 */
public static final String NUM = "org.apache.uima.ruta.type.NUM";
/**
 * Text to process.
 */
public static final String TEXT = "I will be out of office from September 2nd, 2013 to September 20th, 2013.\n"
        + "Dates like 9/2/2013 and 9/20/13 are also recognized. Eight Digit Number 99192293 and Nine digit Number <Span> 113993910"
        + " <Value> 900839806 </Value> <Coordinates> <x0> 916 </x0> <y0> 268 </y0> <x1> 1069 </x1> <y1> 290 </y1> </Coordinates> <OcrConfidence /></Span>";

/**
 * Engine instance used for UIMA analysis
 */
private AnalysisEngine engine;

/**
 * CAS instance used for analysis
 */
private CAS cas;

/**
 * Sets Annotator Engine
 * 
 * @throws Exception
 */
@Before
public void setup() throws Exception {
    System.out.println("Starting Analysis Engine");
    engine = AnalysisEngineFactory
            .createEngine("com.demo.extraction.uima.annotators.TestAnnotatorEngine");
    cas = engine.newCAS();
    cas.setDocumentText(TEXT);
    engine.process(cas);
}

/**
 * Tests the input
 * 
 * @throws Exception
 */
@Test
public void test() throws Exception {

    System.out.println("Extracting date from");
    System.out.println(TEXT);
    for (AnnotationFS date : CasUtil.select(cas, cas.getTypeSystem()
            .getType(DATE_TYPE))) {
        System.out.println("Found: " + date.getCoveredText());
    }

    System.out.println("Extracting Number Block from");
    System.out.println(TEXT);
    for (AnnotationFS number : CasUtil.select(cas, cas.getTypeSystem()
            .getType(NUM))) {
        System.out.println("Found: " + number.getCoveredText());
    }
}
}

这是我下面提到的脚本。

DECLARE DateValue, DateRange, DayValue, MonthValue, YearValue, DateValueBlock;
// Number Related Types

// A date is a month, followed by a day, optionally an ordinal suffix and a year//
//examples : "january 1st 2008", "february 28, 2010"
W{REGEXP("(?i)
         (january|february|march|april|may|june|july|august|september|october|november|december)"
    ) -> MARK(MonthValue)} 
  NUM{-> MARK(DayValue)} 
   W?{REGEXP("(?i)(th|st|nd|rd)")} 
   COMMA? 
   NUM{-> MARK(YearValue), MARK(DateValue, 1, 5)};
// A date can also be specified in the MM/DD/YYYY format
NUM{-> MARK(MonthValue)} "/" NUM{-> MARK(DayValue)} "/" NUM{-> MARK(YearValue),       MARK(DateValue, 1, 
     5)};// A date range
 "from" DateValue "to" DateValue{-> MARK(DateRange, 1, 4)};

正如教程中所建议的那样。我在同一个 Eclipse 工作区中使用不同的 UIMA Ruta 项目(称为示例项目)制作了这个脚本。我将它作为 UIMA ruta 文件执行并获得了相应的描述符文件(TestAnnotator.xml)。我粘贴了相同的内容,只是将以下内容更改为

<imports><import name="org.apache.uima.ruta.engine.BasicTypeSystem"/></imports>

它最初指向 BasicTypeSystem.xml 文件。目前我能够构建它并执行此脚本,但生成的描述符文件(TestAnnotator.xml)使用相应 xml 中的本地文件路径。这些本地文件路径是以前 UIMA 项目(示例项目)中的文件夹。

当我尝试在上述路径中配置脚本文件的运行设置时,即在我的 Maven 项目中,配置无法在给定路径中找到脚本。

我应该如何在 Maven 项目中添加 Ruta 脚本的动态特性

请帮忙。

4

0 回答 0