6

给定一个字符串,返回字符串中换行符开头的字符位置数组的最有效方法是什么?

text =<<_
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu
fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in 
culpa qui officia deserunt mollit anim id est laborum.
_

预期的:

find_newlines(text) # => [0, 80, 155, 233, 313, 393]

我发布我自己的答案。我想接受最快的方式作为接受的答案。


添加新答案时,此处的基准测试结果将更新

require "fruity"

compare do
  padde1 {find_newlines_padde1(text)}
  digitalross1 {find_newlines_digitalross1(text)}
  sawa1 {find_newlines1(text)}
  sawa2 {find_newlines2(text)}
end

# Running each test 512 times. Test will take about 1 second.
# digitalross1 is faster than sawa2 by 5x ± 0.1
# sawa2 is faster than sawa1 by 21.999999999999996% ± 1.0%
# sawa1 is faster than padde1 by 4.0000000000000036% ± 1.0%
4

3 回答 3

3
def find_newlines text
  s = 0
  [0] + text.to_a[0..-2].map { |e| s += e.size }
end

如前所述,text.each_line.to_a用于 1.9。调用each_line也适用于 1.8.7,但比仅调用慢 20%to_a.

于 2013-02-18T19:05:20.660 回答
2

类似于你的答案:

def find_newlines_padde1 text
  text.enum_for(:scan, /^/).map do
    $~.begin(0)
  end
end

您仍然可以使用ruby​​inline获得一些性能:

require "inline"
module Kernel
  inline :C do |builder|
    builder.add_compile_flags '-std=c99'
    builder.c %q{
      static VALUE find_newlines_padde2(VALUE str) {
        char newline = '\n';
        char* s = RSTRING_PTR(str);
        VALUE res = rb_ary_new();
        str = StringValue(str);
        rb_ary_push(res, LONG2FIX(0));
        for (long pos=0; pos<RSTRING_LEN(str)-1; pos++) {
          if (s[pos] == newline) {
             rb_ary_push(res, LONG2FIX(pos+1));
          }
        }
        return res;
      }
    }
  end
end

请注意,我人为地提前结束pos<RSTRING_LEN(str)-1以获得您要求的相同结果。您可以根据需要将其更改为pos<RSTRING_LEN(str),因此最后一个空行也算作行开头。您将决定哪一个适合您。

果味 说padde2 is faster than sawa2 by 22x ± 0.1

于 2013-02-18T17:25:03.507 回答
0
def find_newlines_sawa1 s
  a = []
  s.scan(/^/){a.push($~.offset(0)[0])}
  a
end

find_newlines_sawa1(text) # => [0, 80, 155, 233, 313, 393]

def find_newlines_sawa2 s
  a = [0]
  s.split(/^/).each{|s| a.push(a.last + s.length)}
  a.pop
  a
end

find_newlines_sawa2(text) # => [0, 80, 155, 233, 313, 393]
于 2013-02-18T16:40:00.677 回答