ruby - 如何从 StringScanner 捕获项目？

Question

我正在使用 Ruby 的 StringScanner 来规范化一些英文文本。

def normalize text
  s = ''
  ss = StringScanner.new text
  while ! ss.eos? do
    s += ' ' if ss.scan(/\s+/)             # mutiple whitespace => single space
    s += 'mice' if ss.scan(/\bmouses\b/)   # mouses => mice
    s += '' if ss.scan(/\bthe\b/)          # remove 'the'
    s += "#$1 #$2" if ss.scan(/(\d)(\w+)/) # should split 3blind => 3 blind
  end
  s
end

normalize("3blind the   mouses")  #=> should return "3 blind mice"

相反，我只是得到" mice".

StringScanner#scan没有捕获(\d)and (\w+)。

score 4 · Accepted Answer

要访问捕获的 StringScanner（在 Ruby 1.9 及更高版本中），请使用StringScanner#[]：

  s += "#{ss[1]} #{ss[2]}" if ss.scan(/(\d)(\w+)/) # splits 3blind => 3 blind

在 Ruby 2.1 中，您应该能够按名称捕获（参见 Peter Alfvin 的链接）

  s += "#{ss[:num]} #{ss[:word]}" if ss.scan(/(?<num>\d)(?<word>\w+)/)

score 2 · Accepted Answer

注意：根据评论线程，这个/我的答案的第一个版本完全不符合标准。道歉。

基于http://ruby-doc.org/stdlib-1.9.2/libdoc/strscan/rdoc/StringScanner.html的实验和审查，似乎StringScanner没有设置匹配变量$1,$2等，所以最后一条s += ...语句只是在s.

看起来strscan.c确实不支持提供捕获的匹配信息，但我确实找到了https://www.ruby-forum.com/topic/4413436，这似乎是某种正在进行的努力实施这个

ruby - 如何从 StringScanner 捕获项目？

2 回答 2

Related

Reference