ruby-on-rails - 循环文本并在 Rails 中提取预定义的单词和单词对

Question

我有一大串文本description，最长可达 500 字。我想做以下事情：

循环description并从 array 中查找大量预定义的单词keywords，其中包含单个单词、单词对和单词三元组。
每次找到匹配项时，将此匹配项添加到一个新数组中matches（除非已经在该过程的早期添加）并从description.

我已经四处寻找解决方案，但他们中的大多数似乎要么深入自然语言处理的深处，这对于我当前的需求来说太复杂了，要么只是将文本字符串拆分为空格，这意味着它是然后不可能寻找单词对。

非常感谢任何关于如何有效地做到这一点的想法。

score 1 · Accepted Answer

description = "The quick brown fox jumped over the lazy dog, and another brown dog"

keywords = ["brown", "lazy", "apple"]

matches = []

keywords.each do |keyword|
  matches << description.match(keyword).to_s if description.match(keyword)
end

puts matches
 #=> ["brown", "lazy"]

matches.each do |keyword|
  description.gsub!(Regexp.new(keyword), '')
end

description.gsub!('  ', ' ')

puts description
 #=> "The quick fox jumped over the dog, and another dog"

score 0 · Accepted Answer

这是我想到的粗略的黑客攻击:)

keywords.select do |keyword| 
  description =~ /\b#{Regexp.escape(keyword)}\b/
  # -or-
  description.gsub(/\b#{Regexp.escape(keyword)}\b/) do |match|
    # whatever
  end
end

score 0 · Accepted Answer

您可以为数组中的每个单词设置阈值频率

循环浏览描述中的文本

If word matches exactly with description text then increase the threshold frequency by 1 point

最后，频率大于 0 的单词将其放入新的数组匹配并从中删除description

例如，

If any word repeated for 2 times,
It's frequency will be 0 + 2 and
Initially it should be 0.

ruby-on-rails - 循环文本并在 Rails 中提取预定义的单词和单词对

3 回答 3

Related

Reference