I am working with a text file of about 12*10^6 rows which is stored on my hard disk. The structure of the file is:
data|data|data|...|data\n
data|data|data|...|data\n
data|data|data|...|data\n
...
data|data|data|...|data\n
There's no header, and there's no id to uniquely identify the rows.
Since I want to use it for machine learning purposes, I need to make sure that there's no order in the text file which may affect the stochastic learning.
Usually I upload such kind of files into memory, and I shuffle them before rewriting them to disk. Unfortunately this time it is not possible, due to the size of the file, so I have to manage the shuffling directly on disk(assume I don't have problem with the disk space). Any idea about how to effectively (with lowest possible complexity, i.e. writing to the disk) manage such task with Python?