java - Java：如何对支持每个组中的最小值、最大值、平均值、最后一种聚合的列表进行聚合

Question

我早些时候在 MySQL 本身中完成了此操作，因为这似乎是正确的方法，但是我必须进行一些业务逻辑计算，然后需要在结果列表中应用 group by，任何建议在 Java 中执行此操作而不影响性能（有看着 lambdaj，似乎由于大量使用代理而变慢了，虽然没有尝试过）。

List<Item>包含名称、值、unixtimestamp 作为属性，并由数据库返回。每条记录间隔 5 分钟。

我应该能够按动态采样时间（例如 1 小时）进行分组，这意味着必须将每 12 条记录分组为一条记录，然后在每个组上应用 min、max、avg、last。

任何建议表示赞赏。

[更新] 有以下工作，尚未对索引地图值上的每个列表元素进行聚合。如您所见，我创建了一个列表映射，其中 key 是请求的整数表示样本时间（30 是此处请求的样本）。

private List<Item> performConsolidation(List<Item> items) {
        ListMultimap<Integer, Item> groupByTimestamp = ArrayListMultimap.create();
        List<Item> consolidatedItems = new ArrayList<>();
        for (Item item : items) {
            groupByTimestamp.put((int)floor(((Double.valueOf(item.getItem()[2])) / 1000) / (60 * 30)), item);
        }
        return consolidatedItems;
    }

score 1 · Accepted Answer

这里有一个建议：

public Map<Long,List<Item>> group_items(List<Item> items,long sample_period) {
  Map<Long,List<Item>> grouped_result = new HashMap<Long,List<Item>>();
  long group_key;

  for (Item item: items) {
    group_key = item.timestamp / sample_period;
    if (grouped_result.containsKey(group_key)) {  
      grouped_result.get(group_key).add(item);
    }
    else {
      grouped_result.put(group_key, new ArrayList<Item>());
      grouped_result.get(group_key).add(item);
    }
  }
  return grouped_result;
}

sample_period 是分组的秒数：3600 = 小时，900 = 15 分钟

映射中的键当然可以是相当大的数字（取决于采样周期），但是这种分组将保留组的内部时间顺序，即较低的键是时间顺序中最先出现的那些。如果我们假设原始列表中的数据是按时间顺序排列的，我们当然可以得到第一个键的值，然后从键中减去它。这样我们将获得键 0、1 等。在这种情况下，在 for 循环开始之前，我们需要：

整数减法 = items.get(0).timestamp / sample_period; // 注意，因为两个数字都是整数/长整数，所以我们有一个整数除法

然后在for循环内：

group_key = items.timestamp / sample_period - 减去；

这些方面的东西会起作用，即按照您的描述对您的数据集进行分组。然后您可以将 min max avg 等应用于结果列表。但是由于这些函数当然必须再次遍历各个组列表，因此将这些计算合并到这个解决方案中可能会更好，并让函数返回类似 Map 的东西，其中 Aggregates 是一种包含 avg、min、max 字段的新类型，然后是组中的项目列表？至于性能，我认为这是可以接受的。这是一个简单的 O(N) 解决方案。编辑：

好的，只是想添加一个更完整的解决方案/建议，它还计算最小值、最大值和平均值：

public class Aggregate {
  public double avg;
  public double min;
  public double max;

  public List<Item> items = new ArrayList<Item>();

  public Aggregate(Item item) {
    min = item.value;
    max = item.value;
    avg = item.value;
    items.add(item);
  }

  public void addItem(Item item) {
    items.add(item);
    if (item.value < this.min) {
      this.min = item.value;
    }
    else if (item.value > this.max) {
      this.max = item.value;
    }
    this.avg = (this.avg * (this.items.size() - 1) + item.value) / this.items.size(); 
  }
}

public Map<Long,Aggregate> group_items(List<Item> items,long sample_period) {

  Map<Long,Aggregate> grouped_result = new HashMap<Long,Aggregate>();
  long group_key;

  long subtract = items.get(0).timestamp / sample_period;
  for (Item item: items) {
    group_key = items.timestamp / sample_period - subtract;
    if (grouped_result.containsKey(group_key)) {  
      grouped_result.get(group_key).addItem(item);
    }
    else {
      grouped_result.put(group_key, new Aggregate(item));
    }
  }
  return grouped_result;
}

这只是一个粗略的解决方案。我们可能想向聚合等添加更多属性。

score 0 · Accepted Answer

撇开 min/max/etc. 的计算不谈，我注意到您的performConsolidation方法看起来可以使用Multimaps.index. 只需将 items 和 a 传递给它Function<Item, Integer>即可计算您想要的值：

return (int) floor(((Double.valueOf(item.getItem()[2])) / 1000) / (60 * 30));

这不会节省大量代码，但它可能会更容易一目了然地看到正在发生的事情：index(items, timeBucketer).

score 0 · Accepted Answer

如果您可以使用我的xpresso项目，您可以执行以下操作：

让您的输入列表为：

list<tuple> items = x.list(x.tuple("name1",1d,100),x.tuple("name2",3d,105),x.tuple("name1",4d,210));

您首先解压缩您的元组列表以获取列表元组：

tuple3<list<String>,list<Double>,list<Integer>> unzipped = x.unzip(items, String.class, Double.class, Integer.class);

然后你可以聚合你想要的方式：

x.print(x.tuple(x.last(unzipped.value0), x.avg(unzipped.value1), x.max(unzipped.value2)));

前面将产生：

(name1,2.67,210)

java - Java：如何对支持每个组中的最小值、最大值、平均值、最后一种聚合的列表进行聚合

3 回答 3

Related

Reference