1

I am writing a custom java annotator for our UIMA pipeline in Watson Explorer Content Analytics.

There are two places (I know of ) where I can try to get the URL or Filename of the document that is currently being processed.

Initialize

public class CustomAnnotator extends JCasAnnotator_ImplBase {

@Override
public void initialize(UimaContext aContext)
        throws ResourceInitializationException {
    super.initialize(aContext);
.... HERE MAYBE ? ....

Or

Process

@Override
public void process(JCas jcas) throws AnalysisEngineProcessException {
    try {
.... HERE ....

I have tried several options:

  • via context in initialize method(Running the pipeline on the server , I could get the PearID for example),
  • via the Sofa in the process method (e.g. jcas.getSofa().getSofaURI())

I also found SourceDocumentInformation , but this is an example and although the method getUri() seems promising, I depend on IBM to implement the setUri(String) method...

But so far I have not been successful, I hope I have overlooked something...

4

1 回答 1

1

我在 IBM dwanwsers 上问了同样的问题。简而言之,当管道在 Watson Explorer Content Analytics 服务器中运行时,您可以访问多个视图。对于元数据,我们需要检查 _InitialView 而不是 rlw-view,它包含您在 Content Analytics Studio 中创建的自定义管道创建的所有注释更多详细信息可以在这里找到,还请查看响应! https://www.ibm.com/developerworks/community/blogs/ibmandgoogle/entry/Exporting_annotations_from_Watson_Explorer_Content_Analytics?lang=en

于 2017-09-29T06:04:00.997 回答