我有一个包含许多新行和空格的字符串。我需要将其拆分为固定长度的子字符串。例如
a = "This is some\nText\nThis is some text"
现在我想把它分成长度为 17 的字符串。所以现在它应该导致
["This is some\nText", "\nThis is some tex", "t"]
评论:我的字符串可能包含任何字符(空格/单词等)
"This is some\nText\nThis is some text".scan(/.{1,17}/m)
# => ["This is some\nText", "\nThis is some tex", "t"]
还有一种方式:
(0..(a.length / 17)).map{|i| a[i * 17,17] }
#=> ["This is some\nText", "\nThis is some tex", "t"]
更新
和基准测试:
require 'benchmark'
a = "This is some\nText\nThis is some text" * 1000
n = 100
Benchmark.bm do |x|
  x.report("slice") { n.times do ; (0..(a.length / 17)).map{|i| a[i * 17,17] } ; end}
  x.report("regex") { n.times do ; a.scan(/.{1,17}/m) ; end}
  x.report("eachc") { n.times do ; a.each_char.each_slice(17).map(&:join) ; end }
end
结果:
         user     system      total        real
slice  0.090000   0.000000   0.090000 (  0.091065)
regex  0.230000   0.000000   0.230000 (  0.233831)
eachc  1.420000   0.010000   1.430000 (  1.442033)
具有enumerable的解决方案:将数组拆分为单个字符each_char,然后each_slice用于进行分区,join结果:
"This is some\nText\nThis is some text"
  .each_char # => ["T", "h", "i", "s", " ", "i", "s", " ", "s", "o", "m", "e", "\n", T", "e", "x", "t", "\n", "T", "h", "i", "s", " ", "i", "s", " ", "s", "o", "m", "e", " ", t", "e", "x", "t"]
  .each_slice(17) # => [["T", "h", "i", "s", " ", "i", "s", " ", "s", "o", "m", "e", \n", "T", "e", "x", "t"], ["\n", "T", "h", "i", "s", " ", "i", "s", " ", "s", "o", "m", e",  ", "t", "e", "x"], ["t"]]
  .map(&:join) # => ["This is some\nText", "\nThis is some tex", "t"]
另一个解决方案:解包。
您需要为它构造一个字符串a17a17a17a17a8(如果字符串不完全是 x 乘以 17 个字符长,则最后一个块需要更短。
a = "This is some\nText\nThis is some text\nThis is some more text"
a.unpack(('a17' * (a.length / 17)) + (a.size % 17 == 0 ? "" : "a#{a.length - (a.length / 17) * 17}"))
 => ["This is some\nText", "\nThis is some tex", "t\nThis is some mo", "re text"]
这似乎是迄今为止最快的建议之一,当然如果输入字符串很大,解包字符串也会很大。如果是这种情况,您将需要一个缓冲读取器,以 x * 17 的块读取它,并对每个块执行上述操作。
require 'benchmark'
a = "This is some\nText\nThis is some text" * 1000
n = 100
Benchmark.bm do |x|
  x.report("slice ") { n.times do ; (0..(a.length / 17)).map{|i| a[i * 17,17] } ; end}
  x.report("regex ") { n.times do ; a.scan(/.{1,17}/m) ; end}
  x.report("eachc ") { n.times do ; a.each_char.each_slice(17).map(&:join) ; end }
  x.report("unpack") { n.times do ; a.unpack(('a17' * (a.length / 17)) + (a.size % 17 == 0 ? "" : "a#{a.length - (a.length / 17) * 17}")) ; end }
end
结果:
user    system     total      real
slice   0.120000   0.000000   0.120000 (  0.130709)
regex   0.190000   0.000000   0.190000 (  0.186407)
eachc   1.430000   0.000000   1.430000 (  1.427662)
unpack  0.030000   0.000000   0.030000 (  0.032807)
我注意到上面@yevgeniy 的回答存在问题(我会直接发表评论,但我缺乏声誉)。
如果字符串除以没有余数 ( a.length % divisor = 0),则最终会得到一个额外的数组元素“”。
a = "123456789"
(0..(a.length / 3)).map{|i| a[i * 3,3] }
# => ["123", "456", "789", ""]
我已经解决了这个问题并将解决方案推广到一个函数(该函数使用带有必需关键字的关键字参数,需要 Ruby 2.1+):
def string_prettifier(a_string: , split_char_count: 3)
  splits = (0...(a_string.length / split_char_count.to_f).ceil).map{|i| a_string[i * split_char_count, split_char_count] }
  return splits
end
s = "123456789"
string_prettifier(a_string: , split_char_count: 3)
# => ["123", "456", "789"]
s = "12345678"
string_prettifier(a_string: , split_char_count: 3)
# => ["123", "456", "78"]
s = "1234567890"
string_prettifier(a_string: , split_char_count: 3)
# => ["123", "456", "789", "0"]