I am writing a custom java annotator for our UIMA pipeline in Watson Explorer Content Analytics.
There are two places (I know of ) where I can try to get the URL or Filename of the document that is currently being processed.
Initialize
public class CustomAnnotator extends JCasAnnotator_ImplBase {
@Override
public void initialize(UimaContext aContext)
throws ResourceInitializationException {
super.initialize(aContext);
.... HERE MAYBE ? ....
Or
Process
@Override
public void process(JCas jcas) throws AnalysisEngineProcessException {
try {
.... HERE ....
I have tried several options:
- via context in initialize method(Running the pipeline on the server , I could get the PearID for example),
- via the Sofa in the process method (e.g. jcas.getSofa().getSofaURI())
I also found SourceDocumentInformation , but this is an example and although the method getUri() seems promising, I depend on IBM to implement the setUri(String) method...
But so far I have not been successful, I hope I have overlooked something...