1

我正在尝试使用 solr 的 langid UpdateRequestProcessor。这是配置:

<updateRequestProcessorChain name="languages">
    <processor class="solr.LangDetectLanguageIdentifierUpdateProcessorFactory">
        <lst name="invariants">
            <str name="langid.fl">focus, expertise, platforms, partners, participation, additional</str>
            <str name="langid.whitelist">en,fr</str>
            <str name="langid.fallback">en</str>
            <str name="langid.langField">detectedlang</str>
            <bool name="langid.map">true</bool>
            <bool name="langid.map.keepOrig">false</bool>
        </lst>
    </processor>
    <processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>

我的字段如下所示:

<fields>
    <field name="_root_" type="string" indexed="true" stored="false"/>
    <field name="_version_" type="long" indexed="true" stored="true" multiValued="false"/>

    <field name="id" type="string" indexed="true" stored="true" required="true" />

    <!-- raw fields from sql db -->
    <field name="expertise_id" type="int" indexed="true" stored="true" />
    <field name="person_id" type="int" indexed="true" stored="true" />
    <field name="mod_date" type="date" indexed="true" stored="true" />
    <field name="lang" type="string" indexed="true" stored="true" />
    <field name="focus" type="text_general" indexed="true" stored="true" />
    <field name="expertise" type="text_general" indexed="true" stored="true" />
    <field name="platforms" type="text_general" indexed="true" stored="true" />
    <field name="partners" type="text_general" indexed="true" stored="true" />
    <field name="participation" type="text_general" indexed="true" stored="true" />
    <field name="additional" type="text_general" indexed="true" stored="true" />
    <field name="tag" type="text_general" termVectors="true" multiValued="true" />      
    <field name="facet_tag" type="string" stored="false" indexed="false" docValues="true" multiValued="true" default=""/>

    <!-- language detected by solr -->
    <field name="detectedlang" type="string" indexed="true" stored="true" />

    <!-- defined locale fields -->
    <dynamicField name="*_en" type="text_en" indexed="true" stored="true" />
    <dynamicField name="*_fr" type="text_fr" indexed="true" stored="true" />

    <copyField source="tag" target="facet_tag"/>

</fields>

当我运行更新或数据导入时,我知道使用了“语言”更新链,因为focus它已映射到focus_en并设置了检测语言。但是,没有映射中的其他字段。langid.fl为什么?

更新查询示例:

{
  "additional": "here is some other information about me.",
  "expertise_id": "10000",
  "id": "foo_10000",
  "focus": "this is my new focus. It is very exciting. When I am done I expect to be super experienced."
}

这是查询的结果expertise_id=10000。请注意,additional尚未移至additional_en

  "response":{"numFound":1,"start":0,"docs":[
      {
        "additional":"here is some other information about me.",
        "expertise_id":10000,
        "id":"foo_10000",
        "detectedlang":"en",
        "focus_en":"this is my new focus. It is very exciting. When I am done I expect to be super experienced.",
        "_version_":1447088846110982144}]
  }
4

1 回答 1

1

原来问题是语法错误。这一行:

<str name="langid.fl">focus, expertise, platforms, partners, participation, additional</str>

一定是

<str name="langid.fl">focus,expertise,platforms,partners,participation,additional</str>

文档声明字段列表应该是逗号或空格分隔的值。显然,逗号和空格搞砸了(尽管它在其他 Solr 上下文中工作正常,例如fl在 langid.fl 应该建模的 requestHandler 中)。我也尝试了空格分隔的语法,但它并没有解决我的问题。

我希望这可以帮助别人。

于 2013-09-25T12:41:40.107 回答