1

所以我试图定义“#titleize”,一种将字符串中所有单词的首字母大写的方法,除了诸如“the”、“and”和“if”之类的绒毛词。

到目前为止我的代码:

def titleize(string)
words = []
stopwords = %w{the a by on for of are with just but and to the my had some in} 

string.scan(/\w+/) do |word|
    if !stopwords.include?(word) 
        words << word.capitalize
    else 
        words << word 
    end

words.join(' ')
end

我的麻烦在于 if/else 部分 - 当我在字符串上运行该方法时,我收到“语法错误,意外 $end,期望关键字_end”。

我认为如果我使用 if/else 的简写版本,代码会起作用,它通常进入 {花括号} 内的代码块。我知道这个速记的语法看起来像

string.scan(/\w+/) { |word| !stopwords.include?(word) words << word.capitalize : words       
    <<  word }

...和

words << word.capitalize 

如果 !stopwords.include?(word) 返回 true,则发生,并且

words << word

如果 !stopwords.include?(word) 返回 false,则会发生。但这也不起作用!

它也可能看起来像这样(这是一种不同/更有效的方法 - 没有实例化单独的数组):

string.scan(/\w+/) do |word|
    !stopwords.include?(word) word.capitalize : word
end.join(' ')

(从Calling methods within methods 到 Titleize in Ruby)...但是当我运行此代码时,我也会收到“语法错误”消息。

所以!有谁知道我所指的语法?你能帮我记住吗?或者,您能指出这段代码不起作用的另一个原因吗?

4

4 回答 4

3

我认为你错过了一个end

string.scan(/\w+/) do |word|
    if !stopwords.include?(word) 
        words << word.capitalize
    else 
        words << word 
    end
end #<<<<add this

对于速记版本,请执行以下操作:

string.scan(/\w+/).map{|w| stopwords.include?(w) ? w : w.capitalize}.join(' ')
于 2013-10-21T23:11:23.390 回答
1

Active Support 有这个titleize方法,它作为一个起点很有用,因为它会将字符串中的单词大写,但它并不完全智能;它浪费了停用词。不过,进行一些后期处理以恢复它们可以很好地解决这个问题。

这是我的做法:

require 'active_support/core_ext/string/inflections'

STOPWORDS = Hash[
  %w{the a by on for of are with just but and to the my had some in}.map{ |w| 
    [w.capitalize, w]
  }
]


def my_titlize(str)
  str.titleize.gsub(
    /(?!^)\b(?:#{ STOPWORDS.keys.join('|') })\b/,
    STOPWORDS
  )
end
# => /(?!^)\b(?:The|A|By|On|For|Of|Are|With|Just|But|And|To|My|Had|Some|In)\b/

my_titlize('Jackdaws love my giant sphinx of quartz.')
# => "Jackdaws Love my Giant Sphinx of Quartz."

my_titlize('the rain in spain stays mainly in the plain.')
# => "The Rain in Spain Stays Mainly in the Plain."

my_titlize('Negative lookahead is indispensable')
# => "Negative Lookahead Is Indispensable"

我这样做的原因是构建一个 YAML 文件或数据库表来提供停用词列表非常容易。从这个单词数组中,很容易构建一个散列和一个正则表达式,它被馈送到gsub,然后使用正则表达式引擎来修饰停用词。

创建的哈希是:

{
  "The"=>"the",
  "A"=>"a",
  "By"=>"by",
  "On"=>"on",
  "For"=>"for",
  "Of"=>"of",
  "Are"=>"are",
  "With"=>"with",
  "Just"=>"just",
  "But"=>"but",
  "And"=>"and",
  "To"=>"to",
  "My"=>"my",
  "Had"=>"had",
  "Some"=>"some",
  "In"=>"in"
}

创建的正则表达式是:

/(?!^)\b(?:The|A|By|On|For|Of|Are|With|Just|But|And|To|My|Had|Some|In)\b/

gsub命中正则表达式模式中的单词时,它会在哈希中查找并将值替换回字符串中。

代码可以使用downcase或其他计算方式来反转大写单词,但这会增加开销。gsub和正则表达式引擎非常快。部分原因是散列和正则表达式避免循环遍历停用词列表,因此该列表可以很大而不会大大降低代码速度。当然,引擎已经在不同版本的 Ruby 上发生了变化,所以旧版本的效果并不好,所以运行 Ruby < 2.0 的基准测试。

于 2013-10-22T03:07:40.733 回答
0

您不仅缺少一个end(关闭方法),而且您words.join(' ')scan块内,这意味着words is joining every time you iterate throughscan`。

我想你想要这个:

def titleize(string)
  words = []
  stopwords = %w{the a by on for of are with just but and to the my had some in} 

  string.scan(/\w+/) do |word|
      if !stopwords.include?(word) 
          words << word.capitalize
      else 
          words << word 
      end
  end

  words.join(' ')
end

虽然可以清理您的代码,但此时基本流程是合理的。

于 2013-10-22T15:21:35.280 回答
0

很难在次优代码中寻找错误。以规范的方式执行此操作,并使可能的错误易于发现。

class String
  SQUELCH_WORDS = %w{the a by on for of are with just but and to the my had some in}

  def titleize
    gsub /\w+/ do |s|
      SQUELCH_WORDS.include?( s ) ? s : s.capitalize
    end
  end
end

"20,000 miles under the sea".titleize #=> "20,000 Miles Under the Sea"
于 2013-10-21T23:18:52.873 回答