We are dumping a lot of data (in terms of volume than frequency i.e. dumping 100K or > 400MB records at once ) from database to Excel files. The dumping process is currently being performed in Python, R and Java (using POI library). As part of the dump process, we read the data from the database to a intermediate file (pipe delimited text file) which is then picked up by the code to update excel files. Recently, we ran into issues where text from database with newline characters caused the pipe -delimited files to be invalid since 1 record spanned over multiple lines as opposed to just one line. For example,
| Col1 | Col2 | Col3 |
| Val | Val2 | Val3 |
is a valid example of pipe delimited file. If the data contains any new lines, then:
| Col1 | Col2 | Col3 |
| Val1
| Val2 | Val3
|
Such scenarios become harder to catch and results in more coding then needs to be done in order to make such checks.
I was wondering if there are any libraries/techniques that can be used to write out such temp data. I am not sure if XML would be a solution, considering that performance might become a issue for such a large volume of data. JSON might seem a better fit, but then I don't know all my options here.