我有一个总数为 540000 的数字列表。我想将此列表排序为 3 个列表,每个列表总计 180000。假设数字列表是一个平面文件,每个列表都有一个数字,那么最有效的编程方法是什么?线?
5 回答
听起来像是背包问题的变体。知道这些数字的大小和计数会很有用 - 大小是否存在巨大差异,或者它们的规模是否都相似 - 它们有很多(> 1000)还是只有少数(<100)?
一种快速而肮脏的方法是将它们按大小顺序排序-从大到小-然后循环遍历它们,将第一个放入第一个列表,第二个放入第二个列表,第三个放入第三个列表,然后返回并将第四个放入第一个列表中……以此类推。对于许多小数字来说可能效果很好......但是对于不同类型的数据集还有其他方法。
for i as integer = 1 to 180000
put data in array 1
next i
for i as integer = 180001 to 360000
put data in array 2
next i
for i as integer = 360001 to 540000
put data in array 3
next i
这对我来说有NP 硬度的味道——在这种情况下,没有“有效”的方法可以做到这一点。尽管您可能会想出许多可以很好地解决它的启发式方法。
话虽如此,您仍然会遇到诸如 [179998, 180001, 180001] 之类的列表的问题 :)
我已经编写了一些 Java 代码来为您完成大部分工作。
较小的方法需要一个数字列表和要达到的总数,它返回一组数字列表,这些列表加起来就是该总数。您可以使用 18000 和您的数字列表来运行它。
对于返回的每个数字列表,您需要创建一个缺少已使用数字的新列表,然后在 18000 上运行该方法,然后再次运行。
如果第二次调用返回一个或多个列表,您就会知道问题是可以解决的,因为剩余的数字加起来也将达到 18000。
无论如何,这是代码。是的,这只是递归蛮力。很可能没有经过验证的方法可以通过任何其他方法始终做得更好。运行时间长别怪我;您可能想先用较小的示例进行尝试。
import java.util.*;
public class Listen {
private static Set<List<Integer>> makeFrom(int total, List<Integer> numbers) {
Set<List<Integer>> results = new HashSet<List<Integer>>();
List<Integer> soFar = new ArrayList<Integer>();
makeFrom(results, total, soFar, numbers, 0);
return results;
}
private static void makeFrom(Set<List<Integer>> results, int total, List<Integer> soFar, List<Integer> numbers, int startingAt) {
if (startingAt >= numbers.size()) return;
for (int p=startingAt; p<numbers.size(); p++) {
Integer number = numbers.get(p);
List<Integer> newSoFar = new ArrayList<Integer>(soFar);
newSoFar.add(number);
int newTotal = total - number;
if (newTotal < 0) continue;
if (newTotal == 0) {
Collections.sort(newSoFar);
results.add(newSoFar);
} else {
List<Integer> newNumbers = new ArrayList<Integer>(numbers);
newNumbers.remove(number);
makeFrom(results, newTotal, newSoFar, newNumbers, startingAt + 1);
}
}
}
public static void main(String[] args) {
List<Integer> numbers = new ArrayList<Integer>();
for (int j=1; j<11; j++) numbers.add(j);
for (List<Integer> result : makeFrom(25, numbers)) {
System.out.println(Arrays.deepToString(result.toArray(new Integer[result.size()])));
}
}
}
正如 ian-witz 已经指出的,这可能是 NP 完全排序的问题:这意味着对于一般情况没有真正好的解决方案,除非尝试所有可能性。随着处理的数据量增加,执行此操作的算法往往会变得异常缓慢。
也就是说,这是我的算法,用于从给定的整数列表中形成具有指定总和的子列表:
Set up a place to hold your results. The results will all be lists of numbers, each some sub-set of your original list. We don't know how many such lists will result; possibly none.
Put your list of numbers into an array so you can refer to them and access them by index. In the following, I'm assuming array indices starting at 1. Say you have 10 numbers, so you put them into a 10 element array, indexed by positions 1 through 10.
For performance reasons, it may help to sort your array in descending order. It's not necessary though.
Run a first index, call it i, through this array, i.e. through index values 1 through 10.
For each index value:
select the number at index position i, call it n1.
set up a new list of numbers, where we will be assembling a sub-list. call it sublist.
add n1 to the (so far empty) sublist.
If i is already at 10, there's nothing more we can do. Otherwise,
Run a second index, call it j, through the arrray, starting at i+1 and going up to 10.
For each value of j:
select the number at index position j, call it n2.
add n2 to the sublist containing n1
calculate the sum of our sublist so far: Does it add up to 18000?
If the exact total is reached, add the current sublist to our result list.
If the total is exceeded, there's nothing we can add to make it better, so skip to the next value of j.
If the total is less than 18000, you need to pick a third number.
Run a third index, call it k, through the array, starting at j+1 and going up to 10. Skip this if j is already at 10 and there's no place to go.
For each value of k:
select the number at k, call it n3
add n3 to the sublist
check the sublist total against the expected total
if the exact total is reached, store the sublist as a result;
if it's less than the expected, start a 4th loop, and so on.
When you're done with checking a value for a loop, e.g. n4, you need to take your latest n4, n3 or whatever back out of the sublist because you'll be trying a different number next.
Whenever you find a combination of numbers with the correct sum, store it in your results set.
When you've run all your loop counters into the wall (i.e. i is 10 and there's nowhere left to go), your "results" set will contain all sub-lists of the original list that added up to the desired total. It's possible there will be none, in that case there's no (exact) solution to your problem.
If you have 3 or more sub-lists in your results set, you can check if you can find a pair of them that use non-overlapping sets of numbers from the original list. If you have 2, then there should also be a 3rd sub-list containing exactly all the numbers not contained in the first 2 lists, and you have your solution.
我的示例代码没有执行一系列循环;相反,它执行一个从 1 到(比如说)10 并寻找 18000 的循环。然后,假设选择的第一个数字是 2000,该函数再次递归调用自身,并提示从 2 (= i + 1) 开始,并且尝试组装总共 16000 个。然后,该函数的调用再次以 (whatever + 1) 的起始位置和总共 (16000 -what) 的起始位置调用自身,并且它继续以原始位置的子集调用自身直到指数没有上升空间为止。如果它在途中找到一个“好”的子列表,它会将其存储在结果集中。
如何有效地实现这一点取决于您使用的语言。FORTRAN 77 没有递归,Lua 没有有效地实现列表或集合,Lisp 可能无法有效地索引到列表中。在 Java 中,我可能会使用 bitset 而不是子列表。我对 P4GL 一无所知,所以:对于实施,你自己一个人!