Let's assume that we have a very large file which contains billions of integers , and we want to find k
largest elements of these values ,
the tricky part is that k
itself is very large too , which means we cannot keep k
elements in the memory (for example we have a file with 100 billon elements and we want to find 10 billion largest elements)
How can we do this in O(n)
?
What I thought :
We start reading the file and we check it with another file which keeps the k
largest elements (sorted in increasing order) , if the read element is larger than the first line of the second file we delete the first line and we insert it into the second file , the time complexity would be of O(NlogK)
(if we have random access to that file , otherwise it would be 'O(Nk)'
Any idea to do this in O(n)
, I guess if we have external version of Selection algorithm
(the partitioning algorithm in quicksort) we would be able to do this in O(n)
but I couldn't find it anywhere