ruby - 将Ruby中的字符串切成固定长度的字符串，忽略（不考虑/不考虑）换行符或空格字符

Question

我有一个包含许多新行和空格的字符串。我需要将其拆分为固定长度的子字符串。例如

a = "This is some\nText\nThis is some text"

现在我想把它分成长度为 17 的字符串。所以现在它应该导致

["This is some\nText", "\nThis is some tex", "t"]

评论：我的字符串可能包含任何字符（空格/单词等）

score 8 · Accepted Answer

"This is some\nText\nThis is some text".scan(/.{1,17}/m)
# => ["This is some\nText", "\nThis is some tex", "t"]

score 4 · Accepted Answer

还有一种方式：

(0..(a.length / 17)).map{|i| a[i * 17,17] }
#=> ["This is some\nText", "\nThis is some tex", "t"]

更新

和基准测试：

require 'benchmark'
a = "This is some\nText\nThis is some text" * 1000
n = 100

Benchmark.bm do |x|
  x.report("slice") { n.times do ; (0..(a.length / 17)).map{|i| a[i * 17,17] } ; end}
  x.report("regex") { n.times do ; a.scan(/.{1,17}/m) ; end}
  x.report("eachc") { n.times do ; a.each_char.each_slice(17).map(&:join) ; end }
end

结果：

         user     system      total        real
slice  0.090000   0.000000   0.090000 (  0.091065)
regex  0.230000   0.000000   0.230000 (  0.233831)
eachc  1.420000   0.010000   1.430000 (  1.442033)

score 1 · Accepted Answer

具有enumerable的解决方案：将数组拆分为单个字符each_char，然后each_slice用于进行分区，join结果：

"This is some\nText\nThis is some text"
  .each_char # => ["T", "h", "i", "s", " ", "i", "s", " ", "s", "o", "m", "e", "\n", T", "e", "x", "t", "\n", "T", "h", "i", "s", " ", "i", "s", " ", "s", "o", "m", "e", " ", t", "e", "x", "t"]
  .each_slice(17) # => [["T", "h", "i", "s", " ", "i", "s", " ", "s", "o", "m", "e", \n", "T", "e", "x", "t"], ["\n", "T", "h", "i", "s", " ", "i", "s", " ", "s", "o", "m", e",  ", "t", "e", "x"], ["t"]]
  .map(&:join) # => ["This is some\nText", "\nThis is some tex", "t"]

score 0 · Accepted Answer

另一个解决方案：解包。

您需要为它构造一个字符串a17a17a17a17a8（如果字符串不完全是 x 乘以 17 个字符长，则最后一个块需要更短。

a = "This is some\nText\nThis is some text\nThis is some more text"
a.unpack(('a17' * (a.length / 17)) + (a.size % 17 == 0 ? "" : "a#{a.length - (a.length / 17) * 17}"))
 => ["This is some\nText", "\nThis is some tex", "t\nThis is some mo", "re text"]

这似乎是迄今为止最快的建议之一，当然如果输入字符串很大，解包字符串也会很大。如果是这种情况，您将需要一个缓冲读取器，以 x * 17 的块读取它，并对每个块执行上述操作。

require 'benchmark'
a = "This is some\nText\nThis is some text" * 1000
n = 100

Benchmark.bm do |x|
  x.report("slice ") { n.times do ; (0..(a.length / 17)).map{|i| a[i * 17,17] } ; end}
  x.report("regex ") { n.times do ; a.scan(/.{1,17}/m) ; end}
  x.report("eachc ") { n.times do ; a.each_char.each_slice(17).map(&:join) ; end }
  x.report("unpack") { n.times do ; a.unpack(('a17' * (a.length / 17)) + (a.size % 17 == 0 ? "" : "a#{a.length - (a.length / 17) * 17}")) ; end }
end

结果：

user    system     total      real
slice   0.120000   0.000000   0.120000 (  0.130709)
regex   0.190000   0.000000   0.190000 (  0.186407)
eachc   1.430000   0.000000   1.430000 (  1.427662)
unpack  0.030000   0.000000   0.030000 (  0.032807)

score 0 · Accepted Answer

我注意到上面@yevgeniy 的回答存在问题（我会直接发表评论，但我缺乏声誉）。

如果字符串除以没有余数 ( a.length % divisor = 0)，则最终会得到一个额外的数组元素“”。

a = "123456789"
(0..(a.length / 3)).map{|i| a[i * 3,3] }
# => ["123", "456", "789", ""]

我已经解决了这个问题并将解决方案推广到一个函数（该函数使用带有必需关键字的关键字参数，需要 Ruby 2.1+）：

def string_prettifier(a_string: , split_char_count: 3)
  splits = (0...(a_string.length / split_char_count.to_f).ceil).map{|i| a_string[i * split_char_count, split_char_count] }
  return splits
end

s = "123456789"
string_prettifier(a_string: , split_char_count: 3)
# => ["123", "456", "789"]

s = "12345678"
string_prettifier(a_string: , split_char_count: 3)
# => ["123", "456", "78"]

s = "1234567890"
string_prettifier(a_string: , split_char_count: 3)
# => ["123", "456", "789", "0"]

ruby - 将Ruby中的字符串切成固定长度的字符串，忽略（不考虑/不考虑）换行符或空格字符

5 回答 5

Related

Reference