我有 100 个组,每个组里面都有一些元素。对于交叉验证,我想制作五个大小尽可能相等的箱子。
有没有为此目的的算法。
5 个组和 2 个箱的示例:
Group_1: 5
Group_2: 6
Group_3: 2
Group_4: 7
Group_5: 1
这两个垃圾箱将是:
G1 和 G2 -> 它们的和等于 11。
G3、G4 和 G5 -> 它们的和等于 10。
我有 100 个组,每个组里面都有一些元素。对于交叉验证,我想制作五个大小尽可能相等的箱子。
有没有为此目的的算法。
5 个组和 2 个箱的示例:
Group_1: 5
Group_2: 6
Group_3: 2
Group_4: 7
Group_5: 1
这两个垃圾箱将是:
G1 和 G2 -> 它们的和等于 11。
G3、G4 和 G5 -> 它们的和等于 10。
This is not a cluster analysis problem (I rewrote the question to use the more appropriate wording for you). Cluster analysis is a structure discovery task.
Instead, have a look at the following two related problems from computer science:
All of these appear to be NP-hard, so you will want to use an approximation only (if you have large data, with just 5 examples you can easily brute-force all combinations)
这似乎与集合划分问题有关,它是NP难的,但幸运的是承认了许多好的近似算法和伪多项式时间动态规划算法。你可能想把这些作为一个起点,因为在这个领域已经做了很多工作。
希望这可以帮助!
If you're looking for a clustering algorithm (partitioning method) with equal size constraint, I would suggest the Spectral Clustering. It will satisfy your demand for clusters with almost the same sizes because it solves the normalized cut problem, which try to find a balanced cut.