java - How can I access document filename or URL in custom uima annotator using IBM Content Analytics?

Question

I am writing a custom java annotator for our UIMA pipeline in Watson Explorer Content Analytics.

There are two places (I know of ) where I can try to get the URL or Filename of the document that is currently being processed.

Initialize

public class CustomAnnotator extends JCasAnnotator_ImplBase {

@Override
public void initialize(UimaContext aContext)
        throws ResourceInitializationException {
    super.initialize(aContext);
.... HERE MAYBE ? ....

Or

Process

@Override
public void process(JCas jcas) throws AnalysisEngineProcessException {
    try {
.... HERE ....

I have tried several options:

via context in initialize method(Running the pipeline on the server , I could get the PearID for example),
via the Sofa in the process method (e.g. jcas.getSofa().getSofaURI())

I also found SourceDocumentInformation , but this is an example and although the method getUri() seems promising, I depend on IBM to implement the setUri(String) method...

But so far I have not been successful, I hope I have overlooked something...

score 1 · Accepted Answer

我在 IBM dwanwsers 上问了同样的问题。简而言之，当管道在 Watson Explorer Content Analytics 服务器中运行时，您可以访问多个视图。对于元数据，我们需要检查 _InitialView 而不是 rlw-view，它包含您在 Content Analytics Studio 中创建的自定义管道创建的所有注释更多详细信息可以在这里找到，还请查看响应！ https://www.ibm.com/developerworks/community/blogs/ibmandgoogle/entry/Exporting_annotations_from_Watson_Explorer_Content_Analytics?lang=en

java - How can I access document filename or URL in custom uima annotator using IBM Content Analytics?

1 回答 1

Related

Reference