我正在实施一个更像这个处理程序的 solr 来寻找类似的客户。
我有 2 个客户,他们的名字不同,住在同一个地址。我想给 solr 一个 entity_id 并让所有具有相似名称/地址的客户端返回。客户将能够通过单击按钮将两个客户链接在一起。
我正在使用 SolariumBundle 在代码中执行此操作,但它应该足以让它首先与原始查询一起工作,如果可行的话,我可以自己将其调整到日光浴室。
这是我的solrconfig.xml
<?xml version="1.0" encoding="UTF-8" ?>
<config>
<luceneMatchVersion>LUCENE_36</luceneMatchVersion>
<directoryFactory name="DirectoryFactory" class="${solr.directoryFactory:solr.StandardDirectoryFactory}"/>
<updateHandler class="solr.DirectUpdateHandler2" />
<requestDispatcher handleSelect="true" >
<requestParsers enableRemoteStreaming="false" multipartUploadLimitInKB="2048" />
</requestDispatcher>
<!-- request handlers -->
<requestHandler name="standard" class="solr.StandardRequestHandler" default="true" />
<requestHandler name="/update" class="solr.XmlUpdateRequestHandler" />
<requestHandler name="/mlt" class="solr.MoreLikeThisHandler">
<lst name="defaults">
<int name="mlt.mintf">2</int>
<int name="mlt.mindf">1</int>
<int name="mlt.minwl">5</int>
<int name="mlt.maxwl">1000</int>
<int name="mlt.maxqt">50</int>
<int name="mlt.maxntp">50000</int>
<bool name="mlt.boost">true</bool>
<str name="mlt.fl">customer_data,entity_data,street</str>
<bool name="mlt.match.include">false</bool>
</lst>
</requestHandler>
<requestHandler name="/admin/" class="org.apache.solr.handler.admin.AdminHandlers" />
<!-- config for the admin interface -->
<admin>
<defaultQuery>solr</defaultQuery>
</admin>
</config>
我的schema.xml的相关部分是:
<fields>
<!-- general -->
<field name="id" type="string" indexed="true" stored="true" multiValued="false" required="true" />
<field name="type" type="string" indexed="true" stored="true" multiValued="false" required="true"/>
<field name="entity_id" type="string" indexed="true" stored="true" multiValued="false" required="true"/>
<field name="sort_id" type="int" indexed="true" stored="true" multiValued="false"/>
<field name="external_id" type="string" indexed="true" stored="true" multiValued="false"/>
<field name="status" type="text" indexed="true" stored="true" multiValued="false"/>
<field name="language" type="string" indexed="true" stored="true" multiValued="false"/>
<field name="created" type="int" indexed="true" stored="true" multiValued="false"/>
<field name="name" type="text" indexed="true" stored="true" multiValued="false"/>
<field name="email" type="string" indexed="true" stored="true" multiValued="false"/>
<field name="city" type="string" indexed="true" stored="false" multiValued="false"/>
<field name="country" type="string" indexed="true" stored="false" multiValued="false"/>
<field name="street" type="string" indexed="true" stored="false" multiValued="false"/>
<field name="zipcode" type="string" indexed="true" stored="false" multiValued="false"/>
<field name="entity_data" type="text_ngrm" indexed="true" stored="true" multiValued="true"/>
<field name="customer_data" type="text_ngrm" indexed="true" stored="true" multiValued="true" termVectors="true" />
<!-- Entity data filling -->
<copyField source="entity_id" dest="entity_data"/>
<copyField source="briljant_id" dest="entity_data"/>
<copyField source="name" dest="entity_data"/>
<copyField source="email" dest="entity_data"/>
<!-- End entity data -->
<!-- Customer data -->
<copyField source="name" dest="customer_data"/>
<copyField source="email" dest="customer_data"/>
<copyField source="city" dest="customer_data"/>
<copyField source="country" dest="customer_data"/>
<copyField source="street" dest="customer_data"/>
<copyField source="zipcode" dest="customer_data"/>
<!-- End customer data -->
</fields>
我目前执行此查询:http://localhost:8983/solr/core0/mlt?q=entity_id%3A50&wt=json&indent=true&mlt.fl:customer_data
并且确实为具有相似名称的客户返回结果。例如,如果 customer_id:50(我要查询的那个)的名称为“Foo Bar”,它会返回名称为“Foo Bar”、“Bar Foo”、“John Foo”的客户。街道/国家/邮政编码的相似性不起作用。
在 debug:parsedquery 中,我可以看到customer_data:Foo customer_data:Bar customer_data oo Bar, ...
地址部分的不同突变,但没有任何变化。
如何确保查询是针对:customer_data:Foo customer_data:Bar customer_data:teststreet customer_data:Antwerp
?