1

I am working on an application that retrieves files from different URL's.

There is a TreeSet that contains the target to download. This is processed in a loop with each item being called with an ExecutorService. Here's some code:

private void retrieveDataFiles() {
    if (this.urlsToRetrieve.size() > 0) {
        System.out.println("Target URLs to retrieve: " + this.urlsToRetrieve.size());
        ExecutorService executorProcessUrls = Executors.newFixedThreadPool(this.urlsToRetrieve.size());//could use fixed pool based on size of urls to retrieve
        for (Entry target : this.urlsToRetrieve.entrySet()) {
            final String fileName = (String) target.getKey();
            final String url = (String) target.getValue();

            String localFile = localDirectory + File.separator + fileName;
            System.out.println(localFile);
            executorProcessUrls.submit(new WikiDumpRetriever(url, localFile));
            dumpFiles.add(localFile); 
            //TODO: figure out why only 2 files download
        }
        executorProcessUrls.shutdown();
        try {
            executorProcessUrls.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
        } catch (InterruptedException ex) {
            System.out.println("retrieveDataFiles InterruptedException: " + ex.getMessage());
        }
    } else {
        System.out.println("No target URL's were retrieved");
    }
}

Then the WikiDumpRetriever:

private static class WikiDumpRetriever implements Runnable {

    private String wikiUrl;
    private String downloadTo;

    public WikiDumpRetriever(String targetUrl, String localDirectory) {
        this.downloadTo = localDirectory;
        this.wikiUrl = targetUrl;
    }

    public void downloadFile() throws FileNotFoundException, IOException, URISyntaxException {
        HTTPCommunicationGet httpGet = new HTTPCommunicationGet(wikiUrl, "");
        httpGet.downloadFiles(downloadTo);
    }

    @Override
    public void run() {
        try {
            downloadFile();
        } catch (FileNotFoundException ex) {
            System.out.println("WDR: FileNotFound " + ex.getMessage());
        } catch (IOException ex) {
            System.out.println("WDR: IOException " + ex.getMessage());
        } catch (URISyntaxException ex) {
            System.out.println("WDR: URISyntaxException " + ex.getMessage());
        }
    }
}

As you can see this is an inner class. The TreeSet contains:

Key : Value

enwiki-latest-pages-articles.xml.bz2 : http://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2

elwiki-latest-pages-articles.xml.bz2 : http://dumps.wikimedia.org/enwiki/latest/elwiki-latest-pages-articles.xml.bz2

zhwiki-latest-pages-articles.xml.bz2 : http://dumps.wikimedia.org/enwiki/latest/zhwiki-latest-pages-articles.xml.bz2

hewiki-latest-pages-articles.xml.bz2 : http://dumps.wikimedia.org/enwiki/latest/hewiki-latest-pages-articles.xml.bz2

The problem is that this process downloads 2 of the four files. I know that all four are available and I know that they can be downloaded. However, only 2 of them process at any time.

Can anyone shed any light on this for me please - what am I missing or what am I getting wrong?

Thanks nathj07

4

1 回答 1

1

感谢 ppeterka - 这是源头的限制。因此,为了克服这个问题,我将固定线程池大小设置为 2。这意味着只有 2 个文件同时下载。

然后的答案是找到供应商施加的限制并设置线程池:

ExecutorService executorProcessUrls = Executors.newFixedThreadPool(2);

我想接受一个答案,但似乎无法用评论做到这一点。抱歉,如果这是错误的做法。

感谢所有的指点——“集体思考”确实帮助我解决了这个问题。

于 2012-11-05T10:17:20.957 回答