1

i want do do multiple regular expression replacements on a array, i have this working code but it seems not the ruby-way, anyone who has a better solution ?

#files contains the string that need cleaning
files = [
   "Beatles - The Word ",
  "The Beatles - The Word",
  "Beatles - Tell Me Why",
  "Beatles - Tell Me Why (remastered)",
  "Beatles - Love me do"
]

#ignore contains the reg expr that need to bee checked
ignore = [/the/,/\(.*\)/,/remastered/,/live/,/remix/,/mix/,/acoustic/,/version/,/  +/]

files.each do |file|
  ignore.each do |e|
    file.downcase!
    file.gsub!(e," ")
    file.strip!
  end
end
p files
#=>["beatles - word", "beatles - word", "beatles - tell me why", "beatles - tell me why", "beatles - love me do"]
4

3 回答 3

3
ignore = ["the", "(", ".",  "*", ")", "remastered", "live", "remix",  "mix", "acoustic", "version", "+"]
re = Regexp.union(ignore)
p re #=> /the|\(|\.|\*|\)|remastered|live|remix|mix|acoustic|version|\+/

Regexp.union照顾逃跑。

于 2012-04-30T10:05:25.383 回答
1

您可以将其中大部分放在单个正则表达式替换操作中。此外,您应该使用单词边界锚 ( \b) 或者例如the也将 match There's a Place

file.gsub!(/(?:\b(?:the|remastered|live|remix|mix|acoustic|version)\b)|\([^()]*\)/, ' ')

应该注意这一点。

然后,您可以在第二步中去除多个空格:

file.gsub!(/  +/, ' ')

如果要将正则表达式保留在数组中,则需要遍历数组并为每个正则表达式进行替换。但是您至少可以从循环中取出一些命令:

files.each do |file|
  file.downcase!
  ignore.each do |e|
    file.gsub!(e," ")
  end
  file.strip!
end

当然,您需要在忽略列表中的每个单词周围设置单词边界:

ignore = [/\bthe\b/, /\([^()]*\)/, /\bremastered\b/, ...]
于 2012-04-30T09:40:20.137 回答
0

我根据您的答案制作了这个解决方案,有 2 个版本,一个带有转换为字符串(不更改文件数组,另一个带有 Array 的扩展,它确实更改了文件数组本身。类 approuch 快 2 倍。如果还有人有建议,请分享。

files = [
   "Beatles - The Word ",
  "The Beatles - The Word",
  "Beatles - Tell Me Why",
  "The Beatles - Tell Me Why (remastered)",
  "Beatles - wordwiththein wordwithlivein"
]

ignore = /\(.*\)|[_]|\b(the|remastered|live|remix|mix|acoustic|version)\b/

class Array
  def cleanup ignore
    self.each do |e|
      e.downcase!
      e.gsub!(ignore," ")
      e.gsub!(/  +/," ")
      e.strip!
    end
  end
end

p files.join("#").downcase!.gsub(ignore," ").gsub(/  +/," ").split(/ *# */)
#=>["beatles - word", "beatles - word", "beatles - tell me why", "beatles - tell me why", "beatles - wordwiththein wordwithlivein"]

Benchmark.bm do |x| 
  x.report("string method")  { 10000.times { files.join("#").downcase!.gsub(ignore," ").gsub(/  +/," ").split(/ *# */) } }
  x.report("class  method")   { 10000.times { files.cleanup ignore } }
end

=begin
       user     system      total        real
string method  0.328000   0.000000   0.328000 (  0.327600)
class  method  0.187000   0.000000   0.187000 (  0.187200)
=end
于 2012-04-30T11:52:48.483 回答