我有一个包含许多新行和空格的字符串。我需要将其拆分为固定长度的子字符串。例如
a = "This is some\nText\nThis is some text"
现在我想把它分成长度为 17 的字符串。所以现在它应该导致
["This is some\nText", "\nThis is some tex", "t"]
评论:我的字符串可能包含任何字符(空格/单词等)
"This is some\nText\nThis is some text".scan(/.{1,17}/m)
# => ["This is some\nText", "\nThis is some tex", "t"]
还有一种方式:
(0..(a.length / 17)).map{|i| a[i * 17,17] }
#=> ["This is some\nText", "\nThis is some tex", "t"]
更新
和基准测试:
require 'benchmark'
a = "This is some\nText\nThis is some text" * 1000
n = 100
Benchmark.bm do |x|
x.report("slice") { n.times do ; (0..(a.length / 17)).map{|i| a[i * 17,17] } ; end}
x.report("regex") { n.times do ; a.scan(/.{1,17}/m) ; end}
x.report("eachc") { n.times do ; a.each_char.each_slice(17).map(&:join) ; end }
end
结果:
user system total real
slice 0.090000 0.000000 0.090000 ( 0.091065)
regex 0.230000 0.000000 0.230000 ( 0.233831)
eachc 1.420000 0.010000 1.430000 ( 1.442033)
具有enumerable的解决方案:将数组拆分为单个字符each_char
,然后each_slice
用于进行分区,join
结果:
"This is some\nText\nThis is some text"
.each_char # => ["T", "h", "i", "s", " ", "i", "s", " ", "s", "o", "m", "e", "\n", T", "e", "x", "t", "\n", "T", "h", "i", "s", " ", "i", "s", " ", "s", "o", "m", "e", " ", t", "e", "x", "t"]
.each_slice(17) # => [["T", "h", "i", "s", " ", "i", "s", " ", "s", "o", "m", "e", \n", "T", "e", "x", "t"], ["\n", "T", "h", "i", "s", " ", "i", "s", " ", "s", "o", "m", e", ", "t", "e", "x"], ["t"]]
.map(&:join) # => ["This is some\nText", "\nThis is some tex", "t"]
另一个解决方案:解包。
您需要为它构造一个字符串a17a17a17a17a8
(如果字符串不完全是 x 乘以 17 个字符长,则最后一个块需要更短。
a = "This is some\nText\nThis is some text\nThis is some more text"
a.unpack(('a17' * (a.length / 17)) + (a.size % 17 == 0 ? "" : "a#{a.length - (a.length / 17) * 17}"))
=> ["This is some\nText", "\nThis is some tex", "t\nThis is some mo", "re text"]
这似乎是迄今为止最快的建议之一,当然如果输入字符串很大,解包字符串也会很大。如果是这种情况,您将需要一个缓冲读取器,以 x * 17 的块读取它,并对每个块执行上述操作。
require 'benchmark'
a = "This is some\nText\nThis is some text" * 1000
n = 100
Benchmark.bm do |x|
x.report("slice ") { n.times do ; (0..(a.length / 17)).map{|i| a[i * 17,17] } ; end}
x.report("regex ") { n.times do ; a.scan(/.{1,17}/m) ; end}
x.report("eachc ") { n.times do ; a.each_char.each_slice(17).map(&:join) ; end }
x.report("unpack") { n.times do ; a.unpack(('a17' * (a.length / 17)) + (a.size % 17 == 0 ? "" : "a#{a.length - (a.length / 17) * 17}")) ; end }
end
结果:
user system total real
slice 0.120000 0.000000 0.120000 ( 0.130709)
regex 0.190000 0.000000 0.190000 ( 0.186407)
eachc 1.430000 0.000000 1.430000 ( 1.427662)
unpack 0.030000 0.000000 0.030000 ( 0.032807)
我注意到上面@yevgeniy 的回答存在问题(我会直接发表评论,但我缺乏声誉)。
如果字符串除以没有余数 ( a.length % divisor = 0
),则最终会得到一个额外的数组元素“”。
a = "123456789"
(0..(a.length / 3)).map{|i| a[i * 3,3] }
# => ["123", "456", "789", ""]
我已经解决了这个问题并将解决方案推广到一个函数(该函数使用带有必需关键字的关键字参数,需要 Ruby 2.1+):
def string_prettifier(a_string: , split_char_count: 3)
splits = (0...(a_string.length / split_char_count.to_f).ceil).map{|i| a_string[i * split_char_count, split_char_count] }
return splits
end
s = "123456789"
string_prettifier(a_string: , split_char_count: 3)
# => ["123", "456", "789"]
s = "12345678"
string_prettifier(a_string: , split_char_count: 3)
# => ["123", "456", "78"]
s = "1234567890"
string_prettifier(a_string: , split_char_count: 3)
# => ["123", "456", "789", "0"]