mongodb - MongoDB 集合的 Solr 索引

Question

假设我有一个代表一些朋友列表的测试应用程序。该应用程序使用一个集合，其中所有文档都采用以下格式：

_id : ObjectId("someString"),
name : "George",
description : "some text",
age : 35,
friends : {
    [
        {
         name: "Peter",
         age: 30
         town: {
                  name_town: "Paris",
                  country: "France"
               }
        },
        {
         name: "Thomas",
         age: 25
         town: {
                  name_town: "Berlin",
                  country: "Germany"
               }
        }, ...                // more friends
    ]
}
...                          // more documents

如何在schema.xml中描述这样的集合？我需要产生诸如“给我乔治的朋友居住的国家”之类的方面查询。另一个用例可能是 - “返回所有文件（人），其朋友是 30 岁。” 等等

我最初的想法是通过这个schema.xml定义将“朋友”属性标记为文本字段：

<fieldType name="text_wslc" class="solr.TextField" positionIncrementGap="100">
....
<field name="friends" type="text_wslc" indexed="true" stored="true" />

并尝试搜索例如。文本中的“年龄”和“30”字样，但它不是一个非常可靠的解决方案。

请撇开不合逻辑的合集架构。这只是我刚刚面临的类似问题的一个例子。

任何帮助或想法将不胜感激。

编辑：示例'schema.xml'

<?xml version="1.0" encoding="UTF-8" ?>
<schema name="text-schema" version="1.5">
    <types>
        <fieldType name="string" class="solr.StrField" sortMissingLast="true" omitNorms="true"/>
        <fieldType name="long" class="solr.TrieLongField" precisionStep="0" omitNorms="true" positionIncrementGap="0" />
        <fieldType name="trInt" class="solr.TrieIntField" precisionStep="0" omitNorms="true" />
        <fieldType name="text_p" class="solr.TextField" positionIncrementGap="100">
            <analyzer type="index">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.TrimFilterFactory"/>
                <filter class="solr.WordDelimiterFilterFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
            </analyzer>
            <analyzer type="query">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.TrimFilterFactory"/>
                <filter class="solr.WordDelimiterFilterFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
            </analyzer>
        </fieldType>
    </types>

    <fields>
            <field name="_id" type="string" indexed="true" stored="true" required="true" />
            <field name="_version_" type="long" indexed="true" stored="true"/>
            <field name="_ts" type="long" indexed="true" stored="true"/>
            <field name="ns" type="string" indexed="true" stored="true"/>               
            <field name="description" type="text_p" indexed="true" stored="true" />
            <field name="name" type="text_p" indexed="true" stored="true" />
            <field name="age" type="trInt" indexed="true" stored="true" />  
            <field name="friends" type="text_p" indexed="true" stored="true" />         <!-- Here is the problem - when the type is text_p, all fields are considered as a text; optimal solution would be something like "collection" tag to mark name_town and town as descendant of the field 'friends' but unfortunately, this is not how the solr works-->

            <field name="town" type="text_p" indexed="true" stored="true"/> 
            <field name="name_town" type="string" indexed="true" stored="true"/>    
            <field name="town" type="string" indexed="true" stored="true"/> 
    </fields>

    <uniqueKey>_id</uniqueKey>

score 0 · Accepted Answer

由于 Solr 以文档为中心，因此您需要尽可能地扁平化。根据您提供的示例，我将创建一个如下所示的schema.xml 。

<?xml version="1.0" encoding="UTF-8" ?>
<schema name="friends" version="1.0">

    <fields>
        <field name="id" 
            type="int" indexed="true" stored="true" multiValued="false" />
        <field name="name" 
            type="text" indexed="true" stored="true" multiValued="false" />
        <field name="description" 
            type="text" indexed="true" stored="true" multiValued="false" />
        <field name="age" 
            type="int" indexed="true" stored="true" multiValued="false" />
        <field name="town" 
            type="text" indexed="true" stored="true" multiValued="false" />
        <field name="townRaw" 
            type="string" indexed="true" stored="true" multiValued="false" />
        <field name="country" 
            type="text" indexed="true" stored="true" multiValued="false" />
        <field name="countryRaw" 
            type="string" indexed="true" stored="true" multiValued="false" />
        <field name="friends" 
            type="int" indexed="true" stored="true" multiValued="true" />
    </fields>
    <copyField source="country" dest="countryRaw" />
    <copyField source="town" dest="townRaw" />

    <types>
        <fieldType name="string" class="solr.StrField" sortMissingLast="true"/>
        <fieldType name="int" class="solr.TrieIntField" 
            precisionStep="0" positionIncrementGap="0" />
        <fieldType name="text" class="solr.TextField" 
            positionIncrementGap="100">
            <analyzer>
                <tokenizer class="solr.StandardTokenizerFactory" />
                <filter class="solr.LowerCaseFilterFactory" />
            </analyzer>
        </fieldType>
    </types>
</schema>

我会采用为每个人自己建模的方法。两个人之间的关系通过属性 Friends 建模，该属性转换为 ID 数组。因此，在索引时，您需要获取一个人的所有朋友的 ID 并将它们放入该字段中。

大多数其他领域都是直截了当的。有趣的是两个Raw字段。既然您说要对国家/地区进行刻面，则需要更改国家/地区以进行刻面优化。通常，字段的类型根据它们的用途而有所不同（搜索它们、按它们分面、自动建议它们等）。在这种情况下，国家和城镇的索引就像它们给出的一样。

现在到您的用例，

给我乔治的朋友们居住的国家

这可以通过刻面来完成。你可以查询

乔治的身份证
countryRaw 的方面

这样的查询看起来像q=friends:1&rows=0&facet=true&facet.field=countryRaw&facet.mincount=1

交回朋友30岁的所有文件（人）。

这个更难。首先，您将需要Solr 的加入功能。您需要在solrconfig.xml中进行配置。

<config>
    <!-- loads of other stuff -->
    <queryParser name="join" class="org.apache.solr.search.JoinQParserPlugin" />
    <!-- loads of other stuff -->
</config>

相应的连接查询看起来像这样q={!join from=id to=friends}age:[30 TO *]

这解释如下

与age:[30 TO *]您一起搜索所有 30 岁或以上的人
然后你把他们的 id 加入到所有其他人的朋友属性中
这将返回所有在他们的朋友属性中具有与初始查询匹配的 id 的人

由于我还没有把它写下来，你可以看看我在 github 上的 solrsample 项目。我在那里添加了一个测试用例来处理这个问题：

https://github.com/chriseverty/solrsample/blob/master/src/main/java/de/cheffe/solrsample/FriendJoinTest.java

mongodb - MongoDB 集合的 Solr 索引

1 回答 1

Related

Reference