2

使用以下架构(定义如下)。我可以使用 map reduce 来聚合所有天的 Delivered_count 字段(这是活动文档中的嵌入式数组)。

  {
    campaign_id: 1,
    status: 'running',
    dates: {
      '20130926' => {
        delivered: 1,
        failed: 1,
        queued: 1,
        clicked: 1,
        males_count: 1,
        females_count: 1,
        pacific_region: { clicked_count: 10 },
        america_region: { clicked_count: 10 },
        atlantic_region: { clicked_count: 10 },
        europe_region: { clicked_count: 10 },
        africa_region: { clicked_count: 10 },
        etc_region: { clicked_count: 10 },
        asia_region: { clicked_count: 10 },
        australia_region: { clicked_count: 10 }
      },
      '20130927' => {
        delivered: 1,
        failed: 1,
        queued: 1,
        clicked: 1,
        males_count: 1,
        females_count: 1,
        pacific_region: { clicked_count: 10 },
        america_region: { clicked_count: 10 },
        atlantic_region: { clicked_count: 10 },
        europe_region: { clicked_count: 10 },
        africa_region: { clicked_count: 10 },
        etc_region: { clicked_count: 10 },
        asia_region: { clicked_count: 10 },
        australia_region: { clicked_count: 10 }
      },
      '20130928' => {
        delivered: 1,
        failed: 1,
        queued: 1,
        clicked: 1,
        males_count: 1,
        females_count: 1,
        pacific_region: { clicked_count: 10 },
        america_region: { clicked_count: 10 },
        atlantic_region: { clicked_count: 10 },
        europe_region: { clicked_count: 10 },
        africa_region: { clicked_count: 10 },
        etc_region: { clicked_count: 10 },
        asia_region: { clicked_count: 10 },
        australia_region: { clicked_count: 10 }
      }
    }
  }

下面的代码通过 field 解析asia_regions输出 field clicked_count=> 30 的值(所有数据的组合值)

$rethinkdb.table(:daily_stat_campaigns).filter { |daily_stat_campaign| daily_stat_campaign[:campaign_id].eq 1 }[0][:dates].do { |doc|
  doc.keys.map { |key|
    doc.get_field(key)[:asia_region][:clicked_count].default(0)
  }.reduce { |left, right|
    left+right
  }
}.run

是否可以针对多个区域运行上面的代码?这样我就可以运行一个查询,该查询将返回多个总和。我试图实现的输出类似于下面的伪结果。

[{ asia_region: {clicked_count: 30}}, {america_region: {clicked_count: 30} }]
4

2 回答 2

1

我对您发布的代码有点困惑。为什么一切都在 a 中filter?要输出类似您想要的内容,请执行以下操作:

regions = [:pacific_region, :america_region, ...]
reg_clicks = r.table(:daily_stat_campaigns).concat_map { |row|
                 row[:dates]
                 .coerce_to("ARRAY")
                 .map{ |date| date[0] }
                 .pluck(regions)
                 .coerce_to("ARRAY")
              }

您现在可以运行 reg_clicks,它应该看起来像这样:

$ reg_clicks.run()
[[:asia_region, {clicked_count: 30}], [:etc_region, {clicked_count: 30}], ...]

现在我们需要做最后一个转换来聚合它:

$ aggregate = reg_clicks.map{ |reg|
                  {reg: reg[0], clicked_count: reg[0][:clicked_count]}
              }
              .group_by(:reg, r.sum(:clicked_count))

这将为您提供如下所示的输出:

[{group: :asia_region, reduction: 150} ...]

如果您希望它看起来与您想要的完全一样,那么您可以应用最终转换:

aggregate.map{ |row|
    [row[:group], row[:reduction]]
}
.coerce_to("OBJECT")

如果您将数据标准化一点,这些查询肯定会更好一些。将事情分解成另外 2 个名为 :dates 和 :region_clicks 的表,看起来像这样:

#dates
{
    id: 0
    campaign_id: 1
    date: '20130927'
    delivered: 1,
    failed: 1,
    queued: 1,
    clicked: 1,
    males_count: 1
}

#region_clicks
{
    region: "asia_region",
    click_count: 30,
    date_id: 0
}

那么您的查询将非常简单:

r.table(:region_clicks).group_by(:region, r.sum(:click_count)).run()
于 2013-09-27T16:22:09.700 回答
1

这似乎有效:

require 'awesome_print' # For better readability on output

regions = [:pacific_region, :america_region]
reg_clicks = $rethinkdb.table(:daily_stat_campaigns).filter { |daily_stat_campaign| daily_stat_campaign[:campaign_id].eq 1 }[0][:dates].do { |doc|
  doc.keys.concat_map { |key|
    doc
    .get_field(key)
    .pluck(regions)
    .coerce_to("ARRAY")
  }
}
ap reg_clicks.run

将输出如下内容:[["america_region", {"clicked_count"=>10}], ["pacific_region", {"clicked_count"=>10}], ["america_region", {"clicked_count"=>10}], ["pacific_region", {"clicked_count"=>10}], ["america_region", {"clicked_count"=>10}], ["pacific_region", {"clicked_count"=>10}]]

aggregate = reg_clicks.map { |reg|
  { reg: reg[0], clicked_count: reg[1][:clicked_count] }
}
ap aggregate.run

将输出:[{"reg"=>"america_region", "clicked_count"=>10}, {"reg"=>"pacific_region", "clicked_count"=>10}, {"reg"=>"america_region", "clicked_count"=>10}, {"reg"=>"pacific_region", "clicked_count"=>10}, {"reg"=>"america_region", "clicked_count"=>10}, {"reg"=>"pacific_region", "clicked_count"=>10}]

ap aggregate.group_by(:reg, $rethinkdb_rql.sum(:clicked_count)).run

输出:[{"reduction"=>30, "group"=>{"reg"=>"america_region"}}, {"reduction"=>30, "group"=>{"reg"=>"pacific_region"}}]

于 2013-09-28T14:37:59.740 回答