0

我正在尝试索引嵌套结构,如下所示,并且很难同时使用 SOlrJ 和 DIH 进行索引。我已经为此奋斗了一段时间,非常感谢您对此的帮助。

如何使用 SolrJ 或 DIH 解决此问题。谢谢

我希望我的数据看起来像我的索引:

“文档”:[

{
    "name": "MR INCREDIBLE ",
    "id": 101,
    "job": "super hero",
    "_version_": "1483934897344086016"
    "children": [
            {
                "c_name":"Violet"  
                "c_age":10
                "c_gender":"female"
            },
            {
                "c_name":"Dash"  
                "c_age":8
                "c_gender":"male"
            }
    ]
}

]

我的 schema.xml

<schema name="datasearch" version="1.5">
<uniqueKey>id</uniqueKey>
<fields>
    <field name="_version_" type="long" indexed="true" stored="true" />
    <field name="_root_" type="string" indexed="true" stored="false"/>

    <field name="id" type="string" indexed="true" stored="true" />
    <field name="name" type="text" indexed="true" stored="true" />
    <field name="job" type="string" indexed="true" stored="true"/>

    <!-- I want to add children here -->
    <!-- <field name="children" indexed="true" stored="true"/> -->
    <field name="c_name" type="string" indexed="true" stored="true"/>
    <field name="c_age" type="int" indexed="true" stored="true"/>
    <field name="c_sex" type="string" indexed="true" stored="true"/>
</fields>

<types>
    <fieldType name="string" class="solr.TrieLongField" />
    <fieldType name="int" class="solr.TrieIntField" />
    <fieldType name="date" class="solr.TrieDateField" omitNorms="true" />
    <fieldType name="long" class="solr.StrField" sortMissingLast="true"/>
    <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
        <analyzer>
            <tokenizer class="solr.StandardTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.PorterStemFilterFactory"/>
        </analyzer>
    </fieldType>
</types>

<defaultSearchField>name</defaultSearchField>

</schema>

SolrJ 尝试

val serverUrl = current.configuration.getString("solr.server.url").get
val solr = new HttpSolrServer(serverUrl)

def testAddChildDoc={
val doc = {
  new SolrInputDocument(){
    addField("id", "101")
    addField("name", "Mr Incredible")
  }
}
val c1 = new SolrInputDocument(){
    addField("c_name", "violet")
    addField("c_age", 10)
}
val c2 = new SolrInputDocument(){
    addField("c_name", "dash")
    addField("c_age", 8)
}

doc.addChildDocument(c1)
doc.addChildDocument(c2)

solr.deleteByQuery("*:*")
solr.add(doc)
solr.commit(true, true)
}

回复

=>ERROR org.apache.solr.core.SolrCore  – org.apache.solr.common.SolrException: [doc=null] missing required field: id
[RemoteSolrException: [doc=null] missing required field: id]

所以我继续添加 id 到 childDocs 制作上述内容

...    
val c1 = new SolrInputDocument(){
    addField("id", "101")
    addField("c_name", "violet")
    addField("c_age", 10)
}
val c2 = new SolrInputDocument(){
    addField("id", "101")
    addField("c_name", "dash")
    addField("c_age", 8)
}
.....

然后重新运行 get-all 查询,现在我得到以下结果

SolrJ Attempt 2 plus get-all 查询

{
  "responseHeader": {
    "status": 0,
    "QTime": 0,
    "params": {
      "indent": "true",
      "q": "*:*",
      "_": "1415194092582",
      "wt": "json"
    }
  },
  "response": {
    "numFound": 3,
    "start": 0,
    "docs": [
      {
        "id": 101,
        "c_name": violet,
        "c_age": "10",
      },
      {
        "id": 101,
        "c_name": dash,
        "c_age": "8"
      },
      {
        "id": 101,
        "name": "Mr Incredible",
        "_version_": "1483938552238571520"
      }
    ]
  }
}

所以我在这里放弃并尝试如下的DIH

db-dataconfig.xml

<dataConfig>
    <dataSource type="JdbcDataSource"
                driver="org.postgresql.Driver"
                url="jdbc:postgresql://xxx:5432/xxxx"
                user="xx" password="xx"
                readOnly="true" autoCommit="false" transactionIsolation="TRANSACTION_READ_COMMITTED" holdability="CLOSE_CURSORS_AT_COMMIT" />
    <document>
        <entity name="parent" query="select id,name, job from PARENTS LIMIT 1" >
            <field column="name"/>
            <field column="id"/>
            <field column="job"/>

                <entity child="true" name="children" query="select c_name, c_gender, c_age from CHILDREN" where="pid = ${parent.id}" processor="CachedSqlEntityProcessor">
                    <field column="c_age" />
                    <field column="c_gender" />
                    <field column="c_name"/>
                </entity>
        </entity>
    </document>
</dataConfig>

使用上面的 DIH 完全导入后查询 get-all 并且没有子索引

{
  "responseHeader": {
    "status": 0,
    "QTime": 0,
    "params": {
      "indent": "true",
      "q": "*:*",
      "_": "1415195060664",
      "wt": "json"
    }
  },
  "response": {
    "numFound": 1,
    "start": 0,
    "docs": [
      {
        "name": "Mr Incredible",
        "id": 101,
        "_version_": "1483939357483073536"
      }
    ]
  }
}
4

2 回答 2

0

为了从 Solr 4.10.1 获得以下响应

{
    "name": "MR INCREDIBLE ",
    "id": 101,
    "job": "super hero",
    "type": "parent",
    "_root_":"101"
    "_version_": "1483934897344086016"
    "childDocuments": [
        {
            "c_name":"Violet",
            "c_age":10,
            "c_gender":"female",
            "id":"101_Violet",
            "_root_":"101"
        },
        {
            "c_name":"Dash",
            "c_age":8,
            "c_gender":"male",
            "id":"101Dash",
            "_root_":"101"
        }
    ]
}

需要在模式中定义“类型”字段以区分父文档和子文档:

<fields>
    <field name="_version_" type="long" indexed="true" stored="true" />
    <field name="_root_" type="string" indexed="true" stored="false"/>

    <field name="id" type="string" indexed="true" stored="true" />
    <field name="name" type="text" indexed="true" stored="true" />
    <field name="job" type="string" indexed="true" stored="true"/>

    <field name="c_name" type="string" indexed="true" stored="true"/>
    <field name="c_age" type="int" indexed="true" stored="true"/>
    <field name="c_gender" type="string" indexed="true" stored="true"/>

    <field name="type" type="string" indexed="true" stored="true" />
</fields>

子文档也需要有一个唯一的“id”,就像任何其他文档一样。索引中的所有文档都应该是父子关系,否则查询可能会返回意外结果。如果您需要既不是父母也不是孩子的文件,请为他们分配一个假父母。

SolrJ

要使用子/父文档,需要 solrj.jar 版本 4.5 或更高版本。

SolrServer solr = new HttpSolrServer(serverUrl);

SolrInputDocument doc = new SolrInputDocument();
String id = "101";
doc.addField("id", id);
doc.addField("name", "Mr Incredible");
doc.addField("job", "super hero");
doc.addField("type", "parent");

SolrInputDocument childDoc1 = new SolrInputDocument();
String name1 = "Violet";
childDoc1.addField("id", id + "_" + name1);
childDoc1.addField("c_name", name1);
childDoc1.addField("c_age", 10);
childDoc1.addField("c_gender", "female");
doc.addChildDocument(childDoc1);

SolrInputDocument childDoc2 = new SolrInputDocument();
String name2 = "Dash";
childDoc2.addField("id", id + "_" + name2);
childDoc2.addField("c_name", name2);
childDoc2.addField("c_age", 8);
childDoc2.addField("c_gender", "male");
doc.addChildDocument(childDoc2);

solr.add(doc);
solr.commit();

最后,查询如下所示:

http://localhost/solr/core/select?q={!parent which='type:parent'}&fl=*,[child parentFilter=type:parent]&wt=json&indent=true

仅获得女性性别的结果:

http://localhost/solr/core/select?q={!parent which='type:parent'}c_gender:female&fl=*,[child parentFilter=type:parent childFilter=c_gender:female]&wt=json&indent=true
于 2014-11-28T11:32:37.270 回答
0

为了能够child="true"在 DIH 中使用,请应用来自https://issues.apache.org/jira/browse/SOLR-5147的补丁(我认为它与 solr-3076 的 DIH 补丁相同)。

补丁本身似乎与当前主干在可忽略的细节上不兼容。

于 2014-11-05T20:10:08.130 回答