Ruby 的 Array 继承group_by
自 Enumerable,它很好地做到了这一点:
Hash[*data.group_by{ |v| v }.flat_map{ |k, v| [k, v.size] }]
返回:
{
0 => 1,
1 => 1,
2 => 5,
3 => 6,
4 => 4,
5 => 2,
6 => 3,
7 => 5,
8 => 1,
9 => 2,
10 => 1
}
这只是一个不错的'n clean hash。如果您想要每个 bin 和频率对的数组,您可以缩短它并使用:
data = [0,1,2,2,3,3,3,4]
data.group_by{ |v| v }.map{ |k, v| [k, v.size] }
# => [[0, 1], [1, 1], [2, 2], [3, 3], [4, 1]]
以下是代码和group_by
对较小数据集所做的事情:
data.group_by{ |v| v }
# => {0=>[0], 1=>[1], 2=>[2, 2], 3=>[3, 3, 3], 4=>[4]}
data.group_by{ |v| v }.flat_map{ |k, v| [k, v.size] }
# => [0, 1, 1, 1, 2, 2, 3, 3, 4, 1]
正如 Telmo Costa 在评论中提到的,Rubytally
在 v2.7.0 中引入。运行快速基准测试显示速度快了tally
大约 3 倍:
require 'fruity'
puts "Ruby v#{RUBY_VERSION}"
data = [0,1,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,5,5,6,6,6,7,7,7,7,7,8,9,9,10]
data.group_by{ |v| v }.map{ |k, v| [k, v.size] }.to_h
# => {0=>1, 1=>1, 2=>5, 3=>6, 4=>4, 5=>2, 6=>3, 7=>5, 8=>1, 9=>2, 10=>1}
data.group_by { |v| v }.transform_values(&:size)
# => {0=>1, 1=>1, 2=>5, 3=>6, 4=>4, 5=>2, 6=>3, 7=>5, 8=>1, 9=>2, 10=>1}
data.tally
# => {0=>1, 1=>1, 2=>5, 3=>6, 4=>4, 5=>2, 6=>3, 7=>5, 8=>1, 9=>2, 10=>1}
data.group_by{ |v| v }.keys.sort.map { |key| [key, data.group_by{ |v| v }[key].size] }.to_h
# => {0=>1, 1=>1, 2=>5, 3=>6, 4=>4, 5=>2, 6=>3, 7=>5, 8=>1, 9=>2, 10=>1}
compare do
gb { data.group_by{ |v| v }.map{ |k, v| [k, v.size] }.to_h }
rriemann { data.group_by { |v| v }.transform_values(&:size) }
telmo_costa { data.tally }
CBK {data.group_by{ |v| v }.keys.sort.map { |key| [key, data.group_by{ |v| v }[key].size] }.to_h }
end
导致:
# >> Ruby v2.7.0
# >> Running each test 1024 times. Test will take about 2 seconds.
# >> telmo_costa is faster than rriemann by 2x ± 0.1
# >> rriemann is similar to gb
# >> gb is faster than CBK by 8x ± 1.0
所以使用tally
.