ruby - 如何在文本中找到首字母缩写词？

Question

我的项目读取了许多文件（这些文件具有标题文本和部分），并且应该找到包含首字母缩写词的文件的标题。这是我的文档类：

class Doc
  def initialize(id, secciones)
    @id, @secciones = id, secciones
  end
  def to_s
    result = "" + @id.to_s + "\n" + @secciones.to_s
    return result
  end
  def tiene_acronimo(acr)
    puts "a ver si tiene acronimos el docu.."
    tiene_acronimo = false
    secciones.each do |seccion|
      if seccion.tiene_acronimo(acr)
        tiene_acronimo = true
      end
    end
    return tiene_acronimo
  end
  attr_accessor :id
  attr_accessor :secciones
end

这是我的部分课程：

class Section
  def initialize ()
    @title = ""
    @text = ""   
  end
  def tiene_acronimo(acr)
    return title.include?(acr) || text.include?(acr)
  end
end

这是我的方法：

def test()
  results = Array.new
  puts "Dame el acronimo"
  acr = gets
  documentos_cientificos.each do |d|
  if d.tiene_acronimo(acr)
    results << d
  end  
end

该方法得到一个首字母缩写词，并且应该找到包含它的所有文档。如果文档包含任何子字符串（如首字母缩略词），则方法inclue?[sic] 会输入大写字母并返回。true例如：

Multiple sclerosis (**MS**), also known as # => `true`
Presenting signs and sympto**ms** # => `false` (but `include?` returns `true`)

如何更轻松地找到首字母缩略词？

score 1 · Accepted Answer

您可以使用一些带有匹配功能的正则表达式。如果内容包含提供的完整单词，则以下正则表达式将找到匹配项。它将忽略子字符串，并且区分大小写。

arc = "MS"
title = "Multiple sclerosis (MS), also known as"
text = "Presenting signs and symptoms"

title.match(/\b#{Regexp.escape(acr)}\b/) # => #<MatchData "MS">
text.match(/\b#{Regexp.escape(acr)}\b/) # => nil

或等效地

title.match(/\b#{Regexp.escape(acr)}\b/).to_a.size > 0 # => true
text.match(/\b#{Regexp.escape(acr)}\b/).to_a.size > 0 # => false

...因此您可以将您的功能重新定义为：

def tiene_acronimo(acr)
  regex_to_match = /\b#{Regexp.escape(acr)}\b/
  has_acr = false
  if (title.match(regex_to_match)) || (text.match(regex_to_match))
    has_acr = true
  end

  return has_acr
end

ruby - 如何在文本中找到首字母缩写词？

1 回答 1

Related

Reference