任何人都知道有关如何在 Rails 应用程序中索引静态页面以添加搜索功能的任何好的宝石或文档?到目前为止,我的搜索将我带到了Sunspot和Cobweb,但两者似乎都比我想要实现的要复杂一些。
这是我的视图目录外观的示例:
views
|
|__Folder_1
|
|__ View-1
|__ View-2
|
Folder_2
|
|__ View-3
|__ View-4
每个文件夹都是一个控制器,其视图作为定义的操作,如果在考虑如何设置它时有任何区别。最终目标是返回包含搜索词的页面的链接列表。
编辑:
每个搜索查询都旨在爬取所有静态页面的 HTML 内容,并返回与所搜索的任何非停用词术语匹配的页面的链接列表。我还计划根据静态页面中搜索词的频率和单词放置来增加搜索的相关性。
例子:
搜索查询:“炒鸡蛋食谱” - 将返回包含“食谱”、“炒”和“鸡蛋”字样的任何页面的链接,其中最相关的链接位于返回列表的顶部:
Search Results:
Page 1 (Most relevant because includes all 3 terms)
Page 2 (Includes 2 terms)
Page 3 (Includes 1 terms)
优选地,搜索功能将仅尝试将搜索项与每个视图的文本进行匹配,以便如果用户输入“div”作为搜索项,它不会返回每个单独的页面,因为 div 元素存在于 HTML 内容中。
回答:
经过几周的 Ruby 学习后,这就是我想出的 - 基本上我正在过滤 /app/views/ 目录中的每个子目录,读取子目录内容中的每个文件,处理文本以删除 HTML标签和常用停用词,并将其存储在搜索索引哈希中。
search_controller.rb
#include sanitize helper to enable use of strip_tags method in controller
include ActionView::Helpers::SanitizeHelper
class SearchController < ApplicationController
prepend_before_filter :search
def search
if params[:q]
stopwords = ["a", "about", "above", "after", "again", "against", "all", "am", "an", "and", "any", "are", "aren't", "as", "at", "be", "because", "been", "before", "being", "below", "between", "both", "but", "by", "can't", "cannot", "could", "couldn't", "did", "didn't", "do", "does", "doesn't", "doing", "don't", "down", "during", "each", "few", "for", "from", "further", "had", "hadn't", "has", "hasn't", "have", "haven't", "having", "he", "he'd", "he'll", "he's", "her", "here", "here's", "hers", "herself", "him", "himself", "his", "how", "how's", "i", "i'd", "i'll", "i'm", "i've", "if", "in", "into", "is", "isn't", "it", "it's", "its", "itself", "let's", "me", "more", "most", "mustn't", "my", "myself", "no", "nor", "not", "of", "off", "on", "once", "only", "or", "other", "ought", "our", "ours", "ourselves", "out", "over", "own", "same", "shan't", "she", "she'd", "she'll", "she's", "should", "shouldn't", "so", "some", "such", "than", "that", "that's", "the", "their", "theirs", "them", "themselves", "then", "there", "there's", "these", "they", "they'd", "they'll", "they're", "they've", "this", "those", "through", "to", "too", "under", "until", "up", "very", "was", "wasn't", "we", "we'd", "we'll", "we're", "we've", "were", "weren't", "what", "what's", "when", "when's", "where", "where's", "which", "while", "who", "who's", "whom", "why", "why's", "with", "won't", "would", "wouldn't", "you", "you'd", "you'll", "you're", "you've", "your", "yours", "yourself"]
#cleanse all stop words from search query
@search_terms = strip_tags(params[:q]).downcase.split.delete_if{|x| stopwords.include?(x)}
#declare empty index hash
@search_index = {}
#filter through each view and add view text to search entry
Rails.root.join('app', "views").entries.each do |view_dir|
unless %w(. .. search shared layouts).include?(view_dir.to_s)
Rails.root.join('app', "views", view_dir.to_s).entries.each do |view|
unless %w(. ..).include?(view.to_s)
#add relative path for view and processed contents to search index hash as key, value pair
@search_index["/" + view_dir.to_s + "/" + view.to_s.gsub('.html.erb', '')] = strip_tags(IO.read(Rails.root.join('app', "views", view_dir.to_s, view.to_s))).downcase.squish.split.delete_if{|x| stopwords.include?(x)}.join(" ")
end
end
end
end
end
end
end
如果有人有任何改进或建议,我很想听听他们的意见!