对于那些熟悉合并排序的人,我试图找出合并两个大小为 n/2 的子数组所需的最小比较次数,其中 n 是原始未排序数组中的项目数。
我知道算法的平均和最坏情况时间复杂度为 O(nlogn),但我无法确定所需的确切最小比较次数(以 n 为单位)。
对于那些熟悉合并排序的人,我试图找出合并两个大小为 n/2 的子数组所需的最小比较次数,其中 n 是原始未排序数组中的项目数。
我知道算法的平均和最坏情况时间复杂度为 O(nlogn),但我无法确定所需的确切最小比较次数(以 n 为单位)。
合并步骤的最小比较次数大约是n/2
(顺便说一句,仍然是O(n)
),假设一旦完全遍历了列表之一,就会进行合理的实现。
例如,如果正在合并两个实际上已经排序的列表,则将较大列表的第一个成员n/2
与较小列表的第一个成员进行多次比较,直到用完为止;然后可以复制较大的列表而无需进一步比较。
List 1 List 2 Merged List Last Comparison
[1, 2, 3] [4, 5, 6] [] N/A
[2, 3] [4, 5, 6] [1] 1 < 4
[3] [4, 5, 6] [1, 2] 2 < 4
[] [4, 5, 6] [1, 2, 3] 3 < 4
[] [5, 6] [1, 2, 3, 4] N/A
[] [6] [1, 2, 3, 4, 5] N/A
[] [] [1, 2, 3, 4, 5, 6] N/A
请注意,进行了 3 次比较,列表中有 6 个成员。
O(n)
再次注意,即使在最好的情况下,仍然有效地考虑合并步骤。合并排序算法具有时间复杂度O(n*lg(n))
,因为合并步骤O(n)
跨越整个列表,并且划分/合并发生O(lg(n))
在递归级别。
This answer gives an exact result, not only the asymptotic behaviour written using some Landau symbol.
Merging lists of lengths m and n takes at least min(m, n) comparisons. The reason is that you can stop comparing elements only when one of the input lists has been completely processed, i.e. you'll need to iterate over at least the smaller of the two lists. Note that this number of comparisons will only be sufficient for some inputs, so it is minimal in the sense that it assumes the best case of possible input data. For worst case input, you will find higher numbers, namely n ⌈lg n⌉ − 2⌈lg n⌉ + 1.
Let n = 2k be a power of two. Let i be a merge level, with 0 ≤ i < k. At level i you execute 2k − i − 1 merges, each of which requires 2i comparisons. Multiplying these two numbers gives you 2k − 1 comparisons, which is equal to n/2. Summing over the k levels of merges you get nk/2 = (n lg n)/2 comparisons.
Now let n be 1 less than a power of two. Let k = ⌈lg n⌉ still denote the number of merge levels. Compared to the 2k case, you now have one less comparison at each level. So the total number of merges reduces by k, resulting in 2kk/2 − k = (2k/2 − 1)k comparisons. However, if you remove one more element, leading to n = 2k − 2, then you won't reduce the number of topmost merges, since the other list already is the shorter one. Which suggests that things might become more difficult around here.
So let's have a little demo program, which we can use both to check our previous result and to compute the number of comparisons for other values:
mc = [0, 0] # dynamic programming, cache previous results
k = 1 # ceil(lg n) in the loop
for n in range(2, 128):
a = n // 2 # split list near center
b = n - a # compute length of other half list
mc.append(mc[a] + mc[b] + min(a, b)) # need to sort these and then merge
if (n & (n - 1)) == 0: # if n is a power of two
assert mc[-1] == n*k/2 # check previous result
k += 1 # increment k = ceil(lg n)
print(', '.join(str(m) for m in mc)) # print sequence of comparison counts, starting at n = 0
This gives you the following sequence:
0, 0, 1, 2, 4, 5, 7, 9, 12, 13, 15, 17, 20, 22, 25, 28, 32, 33, 35,
37, 40, 42, 45, 48, 52, 54, 57, 60, 64, 67, 71, 75, 80, 81, 83, 85,
88, 90, 93, 96, 100, 102, 105, 108, 112, 115, 119, 123, 128, 130, 133,
136, 140, 143, 147, 151, 156, 159, 163, 167, 172, 176, 181, 186, 192,
193, 195, 197, 200, 202, 205, 208, 212, 214, 217, 220, 224, 227, 231,
235, 240, 242, 245, 248, 252, 255, 259, 263, 268, 271, 275, 279, 284,
288, 293, 298, 304, 306, 309, 312, 316, 319, 323, 327, 332, 335, 339,
343, 348, 352, 357, 362, 368, 371, 375, 379, 384, 388, 393, 398, 404,
408, 413, 418, 424, 429, 435, 441
which you can look up in the On-Line Encyclopedia of Integer Sequences to find that this sequence describes the total number of 1's in binary expansions of 0, ..., n. There are some formulas there as well, but either they are inexact (involve some Landau symbol term), or they rely on some other non-trivial sequence, or they are pretty complex. The one I like most expresses just what my program above did:
a(0) = 0, a(2n) = a(n)+a(n-1)+n, a(2n+1) = 2a(n)+n+1. - Ralf Stephan, Sep 13 2003
Given these alternatives I guess I'd stick with the above script to compute these numbers. You can remove the assertion and everything related to this, rely on the fact that a < b
, and drop the output as well if you include this into a larger program. The result should look like this:
mc = [0, 0]
for n in range(2, 1024):
a = n // 2
mc.append(mc[a] + mc[n - a] + a)
Notice that e.g. for n = 3 you get only two comparisons. Clearly this can only work if you compare both extremal elements to the median one, so that you don't have to compare the extremal ones to one another any more. This illustrates why the above computation only works for best case input. Worst case input would have you computing minimal and maximal element with one another at some point, leading to three comparisons as computed by that n ⌈lg n⌉ − 2⌈lg n⌉ + 1 formula.
对于每次比较,您从两个列表之一中排出一个元素。所以比较的次数最多是两个列表的长度之和。如图Platinum
所示,如果您到达一个数组的末尾而另一个数组中仍有项目,则它可能会更少。
所以比较次数在n/2
和之间n
。