I'm no Java expert but the program I'm making is going to be dealing with high throughput. So I thought I'd do a little crowd sourcing for opinions. Here's the situation.
A java process will be watching a directory for files to process, these files will be paired (data file to be stored and xml file with meta information to be cataloged). So I need to get the list of current files, check for the required twins, and then process.
Files will always have matching filenames and only differ by file extension e.g. filename1.jpg filename1.xml filename2.jpg filename2.xml
I have three options I've thought of so far.
Use FilenameFilter with File.List(FileNamefilter) call to check if the total files with a filename is greater than 1.
Use two filenamefilters to generate a list of files with .xml and without .xml, convert the non XML file list to an ArrayList and call Collections.binarySearch().
Generate a list of all files without .xml extension, use this list as the keys for a hashmap of key/value pairs that assumes the .xml file based on the filename. Then run through the hash list and check for the existence of the .xml twin before processing.
Any thoughts?
EDITS/COMMENTS
After looking at the suggestions and tinkering I'm for now going with using two FilenameFilters, one that lists XML files and one that does not. The list of XML files is stripped of the xml extension and dumped into a hash. Then the list of data files is iterated through, calling hashlist.contains() to see if a match exists in the hashset before proceeding.
There is the concern as mentioned below of processing incomplete files. As I said in comments, I assume that a newly written file is not visible to non-writing processes until that write is complete (new files, not open for edit)