Try using the java-diff-utils library
Example
I use groovy for quick demos of java libraries:
The following differences are reported between two sample files:
$ groovy diff
[ChangeDelta, position: 0, lines: [1,11,21,31,41,51] to [1,11,99,31,41,51]]
[DeleteDelta, position: 2, lines: [3,13,23,33,43,53]]
[InsertDelta, position: 5, lines: [6,16,26,36,46,56]]
files1.csv
1,11,21,31,41,51
2,12,22,32,42,52
3,13,23,33,43,53
4,14,24,34,44,54
5,15,25,35,45,55
file2.csv
1,11,99,31,41,51
2,12,22,32,42,52
4,14,24,34,44,54
5,15,25,35,45,55
6,16,26,36,46,56
diff.groovy
//
// Dependencies
// ============
import difflib.*
@Grapes([
@Grab(group='com.googlecode.java-diff-utils', module='diffutils', version='1.2.1'),
])
//
// Main program
// ============
def original = new File("file1.csv").readLines()
def revised = new File("file2.csv").readLines()
Patch patch = DiffUtils.diff(original, revised)
patch.getDeltas().each {
println it
}
Update
According to the dbunit FAQ performance of this solution can be improved for very large datasets by using a streamed revision of the ResultSetTableFactory interface. This is enabled within the ANT task as follows:
ant.dbunit(driver:driver, url:url, userid:user, password:pass) {
compare(src:"dbunit.xml", format:"flat")
dbconfig {
property(name:"datatypeFactory", value:"org.dbunit.ext.h2.H2DataTypeFactory")
property(name:"resultSetTableFactory", value:"org.dbunit.database.ForwardOnlyResultSetTableFactory")
}
}