我是ClearTK和UIMA的新手。到目前为止,我找不到任何关于如何创建不涉及文件的管道的示例。
我正在尝试使用 cleartk 和 UIMA 处理存储在 Java 字符串变量中的小文本,并返回一个 XML 字符串(ClearTK TimeML注释器的结果)。
我能够提供一个字符串作为输入(参见代码摘录),但代码远非优雅(需要执行 set 和 CAS 的空 URI。)此外,输出正在保存到文件中,但我想返回一个字符串(将输出保存到文件然后将文件读回内存是没有意义的)。
import org.apache.uima.analysis_component.JCasAnnotator_ImplBase;
import org.apache.uima.analysis_engine.AnalysisEngineProcessException;
import org.apache.uima.fit.factory.AnalysisEngineFactory;
import org.apache.uima.fit.pipeline.SimplePipeline;
import org.apache.uima.jcas.JCas;
import org.cleartk.corpus.timeml.TempEval2007Writer;
import org.cleartk.opennlp.tools.PosTaggerAnnotator;
import org.cleartk.snowball.DefaultSnowballStemmer;
import org.cleartk.timeml.event.*;
import org.cleartk.timeml.time.TimeTypeAnnotator;
import org.cleartk.timeml.tlink.TemporalLinkEventToDocumentCreationTimeAnnotator;
import org.cleartk.timeml.tlink.TemporalLinkEventToSameSentenceTimeAnnotator;
import org.cleartk.timeml.tlink.TemporalLinkEventToSubordinatedEventAnnotator;
import org.cleartk.timeml.type.DocumentCreationTime;
import org.cleartk.token.tokenizer.TokenAnnotator;
import org.cleartk.util.cr.FilesCollectionReader;
...
String documentText = "First make sure that you are using eggs that are several days old...";
JCas sourceCas = createJCas();
sourceCas.setDocumentText(documentText);
ViewUriUtil.setURI(sourceCas, new URI(""));
SimplePipeline.runPipeline(
sourceCas,
org.cleartk.opennlp.tools.SentenceAnnotator.getDescription(),
TokenAnnotator.getDescription(),
PosTaggerAnnotator.getDescription(),
DefaultSnowballStemmer.getDescription("English"),
org.cleartk.opennlp.tools.ParserAnnotator.getDescription(),
org.cleartk.timeml.time.TimeAnnotator.FACTORY.getAnnotatorDescription(),
TimeTypeAnnotator.FACTORY.getAnnotatorDescription(),
EventAnnotator.FACTORY.getAnnotatorDescription(),
EventTenseAnnotator.FACTORY.getAnnotatorDescription(),
EventAspectAnnotator.FACTORY.getAnnotatorDescription(),
EventClassAnnotator.FACTORY.getAnnotatorDescription(),
EventPolarityAnnotator.FACTORY.getAnnotatorDescription(),
EventModalityAnnotator.FACTORY.getAnnotatorDescription(),
AnalysisEngineFactory.createEngineDescription(AddEmptyDCT.class),
TemporalLinkEventToDocumentCreationTimeAnnotator.FACTORY.getAnnotatorDescription(),
TemporalLinkEventToSameSentenceTimeAnnotator.FACTORY.getAnnotatorDescription(),
TemporalLinkEventToSubordinatedEventAnnotator.FACTORY.getAnnotatorDescription(),
TempEval2007Writer.getDescription("file:///tmp/out.tml"));
让管道将字符串作为输入并生成另一个字符串作为执行结果的推荐方法是什么?