3

This is a question from the book 'Database Systems The complete book, 2nd Edition' - Chapter 15: Two-pass algorithm based on sorting. "Sometimes, it is possible to save some disk I/O ’s if we leave the last sublist in memory. It may even make sense to use sublists of fewer than blocks to take advantage of this effect. How many disk I/O ’s can be saved this way?"

I figured out that you divide the original relation into sub-lists and and sort them in the first pass, and keep the last list in memory, which will occupy fewer than M-1 block. Then how do you progress with sorting?

4

1 回答 1

1

这只是一个猜测,但我怀疑答案可以描述如下。标准的“一次级别”合并排序如下所示:

1 1 1 1 1 1 1 1
--- --- --- ---  -- pass 1
 2   2   2   2
 -----   -----   -- pass 2
   4       4
   ---------     -- pass 3
       8

请注意,在进入下一个级别之前,我们会执行完整的输入数据传递。

另一种方法是“一次子树”合并排序,如下所示:

1 1 1 1 1 1 1 1
--- | | | | | |
 2  --- | | | |
 |   2  | | | |
 -----  | | | |
   4    --- | |
   |     2  ---
   |     |   2
   |     -----
   |       4
   ---------
       8

在这里,一旦构建了邻居,我们就将每个子树与其具有相同高度的邻居合并。我们做了同样多的工作,但局部性得到了改善。

干杯。

于 2013-08-20T23:35:52.433 回答