I'm trying to build a Java application that can stream very large result sets of arbitrary SQL SELECT queries into JSONL files, specifically through SQLServer but would like to run with any JDBC DataSource
. In Python this would be easy to just treat the sql client result as a generator and then call json.dumps()
. However, in this code it seems like its putting everything in memory before writing out, typically causing heap and garbage collection exceptions. The queries I need this to run for are very large, bringing back up to 10GB of raw data. Execution time is not the primary concern, as long as it works every time.
I've tried calling flush after ever row (which is ridiculous) and that seems to help with small datasets but not with large ones. Can anyone suggest a strategy I can use to pull this off easily?
In my SQL client class I use Apache DbUtils QueryRunner
and MapListHandler
to create a list of Map
s which is the flexibility I need (versus more traditional approaches in Java which require specifying schema and types):
public List<Map<String, Object>> query(String queryText) {
try {
DbUtils.loadDriver("com.microsoft.sqlserver.jdbc.Driver");
// this function just sets up all the connection properties. Ommitted for clarity
DataSource ds = this.initDataSource();
StatementConfiguration sc = new StatementConfiguration.Builder().fetchSize(10000).build();
QueryRunner queryRunner = new QueryRunner(ds, sc);
MapListHandler handler = new MapListHandler();
return queryRunner.query(queryText, handler);
} catch (Exception e) {
logger.error(e.getMessage());
e.printStackTrace();
return null;
}
}
JsonLOutputWriter
class:
JsonLOutputWriter(String filename) {
GsonBuilder gsonBuilder = new GsonBuilder();
gsonBuilder.serializeNulls();
this.gson = gsonBuilder.create();
try {
this.writer = new PrintWriter(new File(filename), ENCODING);
} catch (FileNotFoundException | UnsupportedEncodingException e) {
e.printStackTrace();
}
}
void writeRow(Map row) {
this.writer.println(this.gson.toJson(row));
}
void flush() {
this.writer.flush();
}
Main method:
JsonLOutputWriter writer = new JsonLOutputWriter(outputFile)
for (Map row : client.query(inputSql)) {
writer.writeRow(row);
}
writer.flush()