2

我有 100 个组,每个组里面都有一些元素。对于交叉验证,我想制作五个大小尽可能相等的箱子。

有没有为此目的的算法。

5 个组和 2 个箱的示例:

Group_1: 5
Group_2: 6
Group_3: 2
Group_4: 7
Group_5: 1

这两个垃圾箱将是:

G1 和 G2 -> 它们的和等于 11。

G3、G4 和 G5 -> 它们的和等于 10。

4

3 回答 3

2

This is not a cluster analysis problem (I rewrote the question to use the more appropriate wording for you). Cluster analysis is a structure discovery task.

Instead, have a look at the following two related problems from computer science:

  • Multiprocessor scheduling seems to be what you need: given n processors, distribute the tasks such that the least time is unused
  • Bin packing problem is a classic NP-hard problem, solving the reverse problem: use as few bins of fixed size to accomodate all tasks.
  • k-Partition Problem this is probably what you want to do.

All of these appear to be NP-hard, so you will want to use an approximation only (if you have large data, with just 5 examples you can easily brute-force all combinations)

于 2014-12-07T16:33:39.107 回答
2

这似乎与集合划分问题有关,它是NP难的,但幸运的是承认了许多好的近似算法和伪多项式时间动态规划算法。你可能想把这些作为一个起点,因为在这个领域已经做了很多工作。

希望这可以帮助!

于 2014-12-07T02:30:02.263 回答
1

If you're looking for a clustering algorithm (partitioning method) with equal size constraint, I would suggest the Spectral Clustering. It will satisfy your demand for clusters with almost the same sizes because it solves the normalized cut problem, which try to find a balanced cut.

于 2014-12-07T05:59:41.920 回答