我正在尝试将 UIMA 与 Solr 集成。我正在按照https://cwiki.apache.org/confluence/display/solr/UIMA+Integration中提到的步骤进行操作。但是当我尝试索引文档时,终端中会抛出异常,并且 solr 日志也会记录错误跟踪。我一直在尝试解决一段时间,但无法为该问题找到适当的解决方案。我已经包含了文档中提到的所有罐子。我已经为 API 生成了有效的密钥。
分析领域:
<arr name="fields">
<str>content</str>
</arr>
内容字段是字段类型 text_general。它不是复制字段。该字段包含相应的文档内容。
<field name="content" type="text_general" indexed="true" termOffsets="true" stored="true" termPositions="true" termVectors="true" multiValued="true" required="true"/>
solrconfig.xml:
<updateRequestProcessorChain name="uima" >
<processor class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory">
<lst name="uimaConfig">
<lst name="runtimeParameters">
<str name="keyword_apikey">VALID_ALCHEMYAPI_KEY</str>
<str name="concept_apikey">VALID_ALCHEMYAPI_KEY</str>
<str name="lang_apikey">VALID_ALCHEMYAPI_KEY</str>
<str name="cat_apikey">VALID_ALCHEMYAPI_KEY</str>
<str name="entities_apikey">VALID_ALCHEMYAPI_KEY</str>
<str name="oc_licenseID">VALID_OPENCALAIS_KEY</str>
</lst>
<str name="analysisEngine">/org/apache/uima/desc/OverridingParamsExtServicesAE.xml</str>
<!-- Set to true if you want to continue indexing even if text processing fails.
Default is false. That is, Solr throws RuntimeException and
never indexed documents entirely in your session. -->
<bool name="ignoreErrors">true</bool>
<str name="logField">fileName</str>
<!-- This is optional. It is used for logging when text processing fails.
If logField is not specified, uniqueKey will be used as logField.
<str name="logField">id</str>
-->
<lst name="analyzeFields">
<bool name="merge">false</bool>
<arr name="fields">
<str>content</str>
</arr>
</lst>
<lst name="fieldMappings">
<lst name="type">
<str name="name">org.apache.uima.alchemy.ts.concept.ConceptFS</str>
<lst name="mapping">
<str name="feature">text</str>
<str name="field">concept</str>
</lst>
</lst>
<lst name="type">
<str name="name">org.apache.uima.alchemy.ts.language.LanguageFS</str>
<lst name="mapping">
<str name="feature">language</str>
<str name="field">language</str>
</lst>
</lst>
<lst name="type">
<str name="name">org.apache.uima.SentenceAnnotation</str>
<lst name="mapping">
<str name="feature">coveredText</str>
<str name="field">sentence</str>
</lst>
</lst>
</lst>
</lst>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
<requestHandler name="/update" class="solr.UpdateRequestHandler">
<lst name="defaults">
<str name="update.chain">uima</str>
</lst>
</requestHandler>
终端错误跟踪:
Mar 18, 2017 2:51:53 PM WhitespaceTokenizer typeSystemInit
INFO: "Whitespace tokenizer typesystem initialized"
Mar 18, 2017 2:51:53 PM WhitespaceTokenizer process
INFO: "Whitespace tokenizer starts processing"
Mar 18, 2017 2:51:53 PM WhitespaceTokenizer process
INFO: "Whitespace tokenizer finished processing"
Mar 18, 2017 2:51:53 PM org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEn
gine_impl callAnalysisComponentProcess(405)
SEVERE: Exception occurred
org.apache.uima.analysis_engine.AnalysisEngineProcessException
at org.apache.uima.annotator.calais.OpenCalaisAnnotator.process(OpenCala
isAnnotator.java:206)
at org.apache.uima.analysis_component.CasAnnotator_ImplBase.process(CasA
nnotator_ImplBase.java:56)
at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.cal
lAnalysisComponentProcess(PrimitiveAnalysisEngine_impl.java:377)
at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.pro
cessAndOutputNewCASes(PrimitiveAnalysisEngine_impl.java:295)
at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterato
r.processUntilNextOutputCas(ASB_impl.java:567)
at org.apache.uima.analysis_engine.asb.impl.ASB_impl$AggregateCasIterato
r.<init>(ASB_impl.java:409)
at org.apache.uima.analysis_engine.asb.impl.ASB_impl.process(ASB_impl.ja
va:342)
at org.apache.uima.analysis_engine.impl.AggregateAnalysisEngine_impl.pro
cessAndOutputNewCASes(AggregateAnalysisEngine_impl.java:267)
at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(A
nalysisEngineImplBase.java:267)
at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.process(A
nalysisEngineImplBase.java:280)
.....
solr.log:
2017-03-19 05:41:24.466 WARN (qtp1389647288-13) [ x:star] o.a.s.u.p.UIMAUpdateRequestProcessor skip the text processing due to null. id=3aedc166-c9ad-4b30-8bcb-d27177d2ae16, text="nullget acquainted with ams application release readiness confidential – not for distribution 1 ..."
2017-03-19 05:41:24.492 INFO (qtp1389647288-13) [ x:star] o.a.s.u.p.LogUpdateProcessorFactory [star] webapp=/solr path=/update params={wt=javabin&version=2}{add=[3aedc166-c9ad-4b30-8bcb-d27177d2ae16 (1562275568121020416)]} 0 12088
2017-03-19 05:41:39.493 INFO (commitScheduler-10-thread-1) [ x:star] o.a.s.u.DirectUpdateHandler2 start
...
我一直在为这个问题苦苦挣扎一段时间。
谢谢并恭祝安康。