ruby-on-rails - 在 Ruby 中从哈希中分箱数据

Question

我正在尝试对用户进行分组，以从他们的 ruby 哈希中的数据创建散点图，如下所示：

[{"userid"=>"1275", "num"=>"1", "amount"=>"15.00"}, 
 {"userid"=>"1286", "num"=>"3", "amount"=>"26.67"}, .... ]

基本上，num 中的值可以是从 1 到 4 的整数，而 amount 最高可达 ~100。我想分两层深，首先按 num 分组，然后 4 个新箱中的每一个都应按数量（0-20、20-50、50-80、80+）进一步划分，总共 16 个组。

最终产品应该是散列数组或数组数组，然后我可以将其传递给我的视图以在 d3 中绘制内容。我有一个功能版本，它使用案例语句和基本的流控制条件，但我想使用 group_by 子句来实现更优雅/更短的代码。

我不太了解 group_by 上的文档，因此将不胜感激。

编辑：输出应该或多或少像这样

[[{"userid"=>"1", "num"=>"1", "amount"=>"15.00"}
  {"userid"=>"2", "num"=>"1", "amount"=>"19.00"}],
 [{"userid"=>"3", "num"=>"1", "amount"=>"25.00"}
  {"userid"=>"4", "num"=>"1", "amount"=>"30.00"}],
 [{"userid"=>"5", "num"=>"2", "amount"=>"15.00"}]]

基本上是一个包含 16 个键值对子数组的数组。

score 0 · Accepted Answer

也许像这样？

我正在使用数组 group_by 函数，但也通过将其分箱并将其放入 group_by 条件来考虑数量

arr = [{"userid"=>"1", "num"=>"1", "amount"=>"15.00"},{"userid"=>"2", "num"=>"1", "amount"=>"19.00"},{"userid"=>"3", "num"=>"1", "amount"=>"25.00"},{"userid"=>"4", "num"=>"1", "amount"=>"30.00"},{"userid"=>"5", "num"=>"2", "amount"=>"15.00"}]

a2 = arr.group_by {|i| ((i['num'].to_i-1) + 4 * bin(i['amount'])) }.values

def bin val
    iVal = val.to_i
    if iVal<=20 then return 0 end
    if iVal<=50 then return 1 end
    if iVal<=80 then return 2 end
    return 3
end

结果和你想要的完全一样

[[{"amount"=>"15.00", "num"=>"1", "userid"=>"1"}, {"amount"=>"19.00", "num"=>"1", "userid"=>"2"}], [{"amount"=>"15.00", "num"=>"2", "userid"=>"5"}], [{"amount"=>"25.00", "num"=>"1", "userid"=>"3"}, {"amount"=>"30.00", "num"=>"1", "userid"=>"4"}]]

我实际上是将两个参数映射为一维参数（散列函数）所以，函数实际上是

<max value of num>*<bin according to amount>+<num-1>

如果 num 的最大值为 4，则 bin 0 将映射到 0..3 ， bin 1 将映射到 4..7 ， bin 2 将映射到 8..11 并且 bin 3 将映射到 12..15 - 见，没有重叠，这很重要。

score 0 · Accepted Answer

看起来您可以通过应用两种不同的group_by操作来做到这一点：

data = [
  {"userid"=>"1", "num"=>"1", "amount"=>"15.00"},
  {"userid"=>"2", "num"=>"1", "amount"=>"19.00"},
  {"userid"=>"3", "num"=>"1", "amount"=>"25.00"},
  {"userid"=>"4", "num"=>"1", "amount"=>"30.00"},
  {"userid"=>"5", "num"=>"2", "amount"=>"15.00"}
]

# Establish the arbitrary groupings as a set of functions which
# can be evaluated. If these overlap in ranges, the first match
# will be used.
groupings = [
  lambda { |v| v >= 0 && v <= 20 },
  lambda { |v| v > 20 && v <= 50 },
  lambda { |v| v > 50 && v <= 80 },
  lambda { |v| v > 80 }
]

data.group_by do |element|
  # Group by the 'num' key first
  element['num']
end.flat_map do |num, elements|
  # Then group these sets by which of the range buckets
  # they should be sorted into.
  elements.group_by do |element|
    # Create an array that looks like [ false, true, false, ... ]
    # based on the test results, then find the index of the
    # first true entry.
    groupings.map do |fn|
      fn.call(element['amount'].to_f)
    end.index(true)
  end.values
end

# => [[{"userid"=>"1", "num"=>"1", "amount"=>"15.00"}, {"userid"=>"2", "num"=>"1", "amount"=>"19.00"}], [{"userid"=>"3", "num"=>"1", "amount"=>"25.00"}, {"userid"=>"4", "num"=>"1", "amount"=>"30.00"}], [{"userid"=>"5", "num"=>"2", "amount"=>"15.00"}]]

调用.valuesa 的结果group_by只会给你分组集合，而不是指示它们是哪个组的键。

score 0 · Accepted Answer

我想出了一种方法，并添加了另一段代码来取消引用哈希并返回每个组中用户 ID 的值：

users_by_number = firstMonth.group_by {|i| i["num"]}
users_by_number.each_pair do |key, value|
    users_by_number[key] = value.group_by do |j|
        case 
            when j["amount"].to_f <=20 then :twenty
            when j["amount"].to_f <=50 then :twenty_fifty
            when j["amount"].to_f <=80 then :fifty_eighty
            when j["amount"].to_f > 80 then :eighty_plus                                                    
        end
    end

users_by_number[key].each_pair do |group, users|
users_by_number[key][group] = users.map! {|user| user["userid"].to_i}
    end
end

ruby-on-rails - 在 Ruby 中从哈希中分箱数据

3 回答 3

Related

Reference