4

我使用 Tire gem 在 Elasticsearch 中索引了一些 PDF 附件。这一切都很好,但我将拥有许多 GB 的 PDF,我们可能会将 PDF 存储在 S3 中以供访问。现在 base64 编码的 PDF 存储在 Elasticsearch _source 中,这将使索引变得巨大。我想让附件被索引,但不存储,而且我还没有找到正确的咒语来放入轮胎的“映射”块以防止它。块现在是这样的:

mapping do
  indexes :id, :type => 'integer'
  indexes :title
  indexes :last_update, :type => 'date'
  indexes :attachment, :type => 'attachment'
end

我尝试了一些变化,例如:

indexes :attachment, :type => 'attachment', :_source => { :enabled => false }

当我运行轮胎:import rake 任务时它看起来不错,但它似乎并没有什么不同。有谁知道A)如果这是可能的?B)怎么做?

提前致谢。

4

2 回答 2

4

_source 字段设置包含应从源中排除的字段列表。我猜想在轮胎的情况下,应该这样做:

mapping :_source => { :excludes => ['attachment'] } do
  indexes :id, :type => 'integer'
  indexes :title
  indexes :last_update, :type => 'date'
  indexes :attachment, :type => 'attachment'
end
于 2012-08-09T17:35:16.407 回答
0

@imotov 的解决方案对我不起作用。当我执行 curl 命令时

curl -X GET "http://localhost:9200/user_files/user_file/_search?pretty=true" -d '{"query":{"query_string":{"query":"rspec"}}}'

我仍然可以看到搜索结果中包含的附件​​文件的内容。

"_source" : {"user_file":{"id":5,"folder_id":1,"updated_at":"2012-08-16T11:32:41Z","attachment_file_size":179895,"attachment_updated_at":"2012-08-16T11:32:41Z","attachment_file_name":"hw4.pdf","attachment_content_type":"application/pdf","created_at":"2012-08-16T11:32:41Z","attachment_original":"JVBERi0xL .....

这是我的实现:

include Tire::Model::Search
include Tire::Model::Callbacks

def self.search(folder, params)
  tire.search() do
    query { string params[:query], default_operator: "AND"} if params[:query].present?
    filter :term, folder_id: folder.id
    highlight :attachment_original, :options => {:tag => "<em>"}
  end
end

mapping :_source => { :excludes => ['attachment_original'] } do
  indexes :id, :type => 'integer'
  indexes :folder_id, :type => 'integer'
  indexes :attachment_file_name
  indexes :attachment_updated_at, :type => 'date'
  indexes :attachment_original, :type => 'attachment'
end

def to_indexed_json
   to_json(:methods => [:attachment_original])
end

def attachment_original
  if attachment_file_name.present?
    path_to_original = attachment.path
    Base64.encode64(open(path_to_original) { |f| f.read })
  end    
end
于 2012-08-16T16:01:28.143 回答