I am working on an application that retrieves files from different URL's.
There is a TreeSet that contains the target to download. This is processed in a loop with each item being called with an ExecutorService. Here's some code:
private void retrieveDataFiles() {
if (this.urlsToRetrieve.size() > 0) {
System.out.println("Target URLs to retrieve: " + this.urlsToRetrieve.size());
ExecutorService executorProcessUrls = Executors.newFixedThreadPool(this.urlsToRetrieve.size());//could use fixed pool based on size of urls to retrieve
for (Entry target : this.urlsToRetrieve.entrySet()) {
final String fileName = (String) target.getKey();
final String url = (String) target.getValue();
String localFile = localDirectory + File.separator + fileName;
System.out.println(localFile);
executorProcessUrls.submit(new WikiDumpRetriever(url, localFile));
dumpFiles.add(localFile);
//TODO: figure out why only 2 files download
}
executorProcessUrls.shutdown();
try {
executorProcessUrls.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
} catch (InterruptedException ex) {
System.out.println("retrieveDataFiles InterruptedException: " + ex.getMessage());
}
} else {
System.out.println("No target URL's were retrieved");
}
}
Then the WikiDumpRetriever:
private static class WikiDumpRetriever implements Runnable {
private String wikiUrl;
private String downloadTo;
public WikiDumpRetriever(String targetUrl, String localDirectory) {
this.downloadTo = localDirectory;
this.wikiUrl = targetUrl;
}
public void downloadFile() throws FileNotFoundException, IOException, URISyntaxException {
HTTPCommunicationGet httpGet = new HTTPCommunicationGet(wikiUrl, "");
httpGet.downloadFiles(downloadTo);
}
@Override
public void run() {
try {
downloadFile();
} catch (FileNotFoundException ex) {
System.out.println("WDR: FileNotFound " + ex.getMessage());
} catch (IOException ex) {
System.out.println("WDR: IOException " + ex.getMessage());
} catch (URISyntaxException ex) {
System.out.println("WDR: URISyntaxException " + ex.getMessage());
}
}
}
As you can see this is an inner class. The TreeSet contains:
Key : Value
enwiki-latest-pages-articles.xml.bz2 : http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
elwiki-latest-pages-articles.xml.bz2 : http://dumps.wikimedia.org/enwiki/latest/elwiki-latest-pages-articles.xml.bz2
zhwiki-latest-pages-articles.xml.bz2 : http://dumps.wikimedia.org/enwiki/latest/zhwiki-latest-pages-articles.xml.bz2
hewiki-latest-pages-articles.xml.bz2 : http://dumps.wikimedia.org/enwiki/latest/hewiki-latest-pages-articles.xml.bz2
The problem is that this process downloads 2 of the four files. I know that all four are available and I know that they can be downloaded. However, only 2 of them process at any time.
Can anyone shed any light on this for me please - what am I missing or what am I getting wrong?
Thanks nathj07