3

I'm using ElasticSearch along with the tire gem to power the search functionality of my site. I'm having trouble figuring out how to map and query the data to get the results I need.

Relevant code is below. I will explain the desired outbut below that as well.

# models/product.rb

class Product < ActiveRecord::Base
  include Tire::Model::Search
  include Tire::Model::Callbacks

  has_many :categorizations
  has_many :categories, :through => :categorizations
  has_many :product_traits
  has_many :traits, :through => :product_traits

  mapping do
    indexes :id, type: 'integer'
    indexes :name, boost: 10
    indexes :description, analyzer: 'snowball'
    indexes :categories do
      indexes :id, type: 'integer'
      indexes :name, type: 'string', index: 'not_analyzed'
    end
    indexes :product_traits, type: 'string', index: 'not_analyzed'
  end

  def self.search(params={})

    out = tire.search(page: params[:page], per_page: 12, load: true) do
      query do
        boolean do

          must { string params[:query], default_operator: "OR" } if params[:query].present?
          must { term 'categories.id', params[:category_id] } if params[:category_id].present?

          # if we aren't browsing a category, search results are "drill-down"
          unless params[:category_id].present?
            must { term 'categories.name', params[:categories] } if params[:categories].present?
          end
          params.select { |p| p[0,2] == 't_' }.each do |name,value|
            must { term :product_traits, "#{name[2..-1]}##{value}" }
          end

        end
      end

      # don't show the category facets if we are browsing a category
      facet("categories") { terms 'categories.name', size: 20 } unless params[:category_id].present?
      facet("traits") {
        terms :product_traits, size: 1000 #, all_terms: true
      }

      # raise to_curl
    end

    # process the trait facet results into a hash of arrays
    if out.facets['traits']
      facets = {}
      out.facets['traits']['terms'].each do |f|
        split = f['term'].partition('#')
        facets[split[0]] ||= []
        facets[split[0]] << { 'term' => split[2], 'count' => f['count'] }
      end
      out.facets['traits']['terms'] = facets
    end

    out
  end

  def to_indexed_json
    {
      id: id,
      name: name,
      description: description,
      categories: categories.all(:select => 'categories.id, categories.name, categories.keywords'),
      product_traits: product_traits.includes(:trait).collect { |t| "#{t.trait.name}##{t.value}" }
    }.to_json
  end

end

As you can see above, I'm doing some pre/post processing of the data to/from elasticsearch in order to get what i want from the 'product_traits' field. This is what doesn't feel right and where my questions originate.

I have a large catalog of products, each with a handful of 'traits' such as color, material and brand. Since these traits are so varied, I modeled the data to include a Trait model which relates to the Product model via a ProductTrait model, which holds the value of the trait for the given product.

First question is: How can i create the elasticsearch mapping to index these traits properly? I assume that this involves a nested type but I can't make enough sense of the docs to figure it out.

Second question: I want the facets to come back in groups (in the manner that I am processing them at the end of the search method above) but with counts that reflect how many matches there are without taking into account the currently selected value for each trait. For example: If the user searches for 'Glitter' and then clicks the link corresponding to the 'Blue Color' facet, I want all the 'Color' facets to remain visible and show counts correspinding the query results without the 'Blue Color' filter. I hope that is a good explanation, sorry if it needs more clarification.

4

1 回答 1

21

如果您将您的特征索引为:

[
    {
        trait: 'color', 
        value: 'green'
    },
    {
        trait: 'material', 
        value: 'plastic'
    }
]

这将在内部索引为:

{
    trait: ['color', 'material' ],
    value: ['green', 'plastic' ]
}

这意味着您只能查询具有traitwith value 'color' 和 a valuewith value的文档greentrait和之间没有关系value

你有几个选择来解决这个问题。

作为单项

第一个您已经在做,这是一个很好的解决方案,即将特征存储为单个术语,例如:

['color#green`','material#plastic']

作为对象

另一种选择(假设您的特征名称数量有限)是将它们存储为:

{
    traits: {
        color:    'green',
        material: 'plastic'
    }
}

然后您可以针对traits.coloror运行查询traits.material

作为嵌套

如果要保留数组结构,则可以使用嵌套类型,例如:

{
   "mappings" : {
      "product" : {
         "properties" : {

            ... other fields ...

            "traits" : {
               "type" : "nested",
               "properties" : {
                  "trait" : {
                     "index" : "not_analyzed",
                     "type" : "string"
                  },
                  "value" : {
                     "index" : "not_analyzed",
                     "type" : "string"
                  }
               }
            }
         }
      }
   }
}

每个特征/值对都将在内部作为单独的(但相关的)文档进行索引,这意味着特征与其值之间存在关系。您需要使用嵌套查询嵌套过滤器来查询它们,例如:

curl -XGET 'http://127.0.0.1:9200/test/product/_search?pretty=1'  -d '
{
   "query" : {
      "filtered" : {
         "query" : {
            "text" : {
               "name" : "my query terms"
            }
         },
         "filter" : {
            "nested" : {
               "path" : "traits",
               "filter" : {
                  "and" : [
                     {
                        "term" : {
                           "trait" : "color"
                        }
                     },
                     {
                        "term" : {
                           "value" : "green"
                        }
                     }
                  ]
               }
            }
         }
      }
   }
}
'

结合方面、过滤和嵌套文档

您声明,当用户过滤时,例如color == green您只想显示结果 where color == green,但您仍想显示所有颜色的计数。

为此,您需要使用搜索 APIfilter的参数而不是过滤查询。过滤查询在计算方面之前过滤掉结果。该参数在计算分面后应用于查询结果。filter

这是一个示例,其中最终查询结果仅限于文档,color == green但为所有颜色计算了构面:

curl -XGET 'http://127.0.0.1:9200/test/product/_search?pretty=1'  -d '
{
   "query" : {
      "text" : {
         "name" : "my query terms"
      }
   },
   "filter" : {
      "nested" : {
         "path" : "traits",
         "filter" : {
            "and" : [
               {
                  "term" : {
                     "trait" : "color"
                  }
               },
               {
                  "term" : {
                     "value" : "green"
                  }
               }
            ]
         }
      }
   },
   "facets" : {
      "color" : {
         "nested" : "traits",
         "terms" : { "field" : "value" },
         "facet_filter" : {
            "term" : {
               "trait" : "color"
            }
         }
      }
   }
}
'
于 2012-09-21T13:16:45.037 回答