1

我还是 solr 的新手。我正在尝试索引嵌套结构,如下所示,并且难以使用 SolrJ 6.1 进行索引。

架构.xml

<?xml version="1.0" encoding="UTF-8"?>
<schema name="example" version="1.6">
  <uniqueKey>id</uniqueKey>
  <defaultSearchField>title</defaultSearchField>
  ...
  // Here are described all the fieldType
  ...
  <field name="_root_" type="string" indexed="true" stored="false"/>
  <field name="_version_" type="long" indexed="true" stored="false"/>
  <field name="id" type="string" multiValued="false" indexed="true" required="true" stored="true"/>
  <field name="imdbId" type="string" indexed="true" stored="true"/>
  <field name="rating" type="float" indexed="true" stored="true"/>
  <field name="title" type="text_en" indexed="true" stored="true"/>
  <field name="type" type="string" indexed="true" stored="true"/>
  <field name="userId" type="string" indexed="true" stored="true"/>
</schema>

SolrJ 尝试

我分三步做。

SolrClient solr = new HttpSolrClient.Builder("http://localhost:8983/solr/ml_core").build();

SolrInputDocument doc, childDoc;
String[] line;
CSVReader reader;

// Step 1: Create a document - Very good 
reader = new CSVReader(new FileReader("movies.csv")); // structure of the file: movieId,title
while ((line = reader.readNext()) != null) {
    doc = new SolrInputDocument();
    doc.addField("id", line[0]);
    doc.addField("title", line[1]);
    doc.addField("type", "film");
    solr.add(doc); 
}

// Step 2: Updating a document that I created - Very good 
reader = new CSVReader(new FileReader("links.csv")); // structure of the file: movieId,imdbId 
while ((line = reader.readNext()) != null) {

    doc = new SolrInputDocument();
    doc.addField("id", line[0]);

    Map<String, Object> imdbIdModifier = new HashMap<>(1);
    imdbIdModifier.put("set", line[1]);
    doc.addField("imdbId", imdbIdModifier);  // add the map as the field value

    solr.add(doc); 
}

// Step 3: Updating deeply nested structures - Here is the error
reader = new CSVReader(new FileReader("ratings.csv")); // structure of the file: movieId,userId,rating
while ((line = reader.readNext()) != null) {
    doc = new SolrInputDocument();
    doc.addField("id", line[0]);

    childDoc = new SolrInputDocument();
    childDoc.addField("id", line[0] + "_" + line[1]);
    childDoc.addField("userId", line[1]);
    childDoc.addField("type", "user");
    childDoc.addField("rating", line[2]);
    doc.addChildDocument(childDoc);

    solr.add(doc); 
}

solr.commit();
solr.optimize();

我收到了以下信息:

我的查询: http://localhost:8983/solr/ml_core/select?indent=on&q=id:1&wt=json

    {
  "responseHeader":{
    "status":0,
    "QTime":1,
    "params":{
      "q":"id:1",
      "indent":"on",
      "wt":"json",
      "_":"1471440200579"}},
  "response":{"numFound":2,"start":0,"docs":[
      {
        "id":"1",
        "title":"Toy Story (1995)",
        "type":"film",
        "imdbId":"0114709",
        "_version_":1542910355358875648},
      {
        "id":"1",
        "_version_":1542910730357964800,
        "_root_":"1"}]
  }}

回应 - 不正确。“id”字段重复,但在文件 schema.xml 中,该字段被标记为唯一。

我的查询: http://localhost:8983/solr/ml_core/select?fl= *,[child%20parentFilter=type:film]&indent=on&q={!parent%20which=%27type:film%27}&wt=json

{
  "error":{
    "msg":"Parent query yields document which is not matched by parents filter, docID=19957",
    "trace":"java.lang.IllegalStateException: Parent query yields document which is not matched by parents filter, docID=19957\r\n\tat org.apache.lucene.search.join.ToChildBlockJoinQuery$ToChildBlockJoinScorer.validateParentDoc(ToChildBlockJoinQuery.java:305)\r\n\tat org.apache.lucene.search.join.ToChildBlockJoinQuery$ToChildBlockJoinScorer.access$300(ToChildBlockJoinQuery.java:158)\r\n\tat 
...
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)\r\n\tat java.lang.Thread.run(Thread.java:745)\r\n",
    "code":500}
}

响应 - 不正确。

我期望:

我的查询: http://localhost:8983/solr/ml_core/select?indent=on&q=id:1&wt=json

我需要下一个正确答案:

  {
  "responseHeader":{
    "status":0,
    "QTime":0,
    "params":{
      "q":"id:1",
      "indent":"on",
      "wt":"json",
      "_":"1471440410850"}},
  "response":{"numFound":1,"start":0,"docs":[
      {
        "id":"1",
        "title":"Toy Story (1995)",
        "type":"film",
        "imdbId":"0114709",
        "_version_":1542910355358875648,
        "_root_":"1"}]
  }}

我的查询: http://localhost:8983/solr/ml_core/select?fl= *,[child%20parentFilter=type:film]&indent=on&q={!parent%20which=%27type:film%27}&wt=json

我需要下一个正确答案:

{
  "responseHeader":{
    "status":0,
    "QTime":7,
    "params":{
      "q":"{!parent which='type:film'}",
      "indent":"on",
      "fl":"*,[child parentFilter=type:film]",
      "wt":"json",
      "_":"1471440410850"}},
  "response":{"numFound":1,"start":0,"docs":[
      {
        "id":"1",
        "title":"Toy Story (1995)",
        "type":"film",
        "imdbId":"0114709",
        "_version_":1542910355358875648,
        "_root_":"1",
        "_childDocuments_":[
        {
          "id":"1_Violet",
          "userId":"Violet",
          "type":"user",
          "rating":5.0,
        {
          "id":"1_Mcka",
          "userId":"Mcka",
          "type":"user",
          "rating":4.0}]}]
  }}

我需要做什么才能获得所需的文档结构?我如何用 SolrJ 解决这个问题。谢谢。

4

0 回答 0