0

我编写了一个函数来从外部 API 读取一些数据。我的功能是在从磁盘读取文件时调用该 API。我想针对大文件(35000 条记录)优化我的代码。你能在这方面给我建议吗?

以下是我的代码。

public void readCSVFile() {

    try {

        br = new BufferedReader(new FileReader(getFileName()));

        while ((line = br.readLine()) != null) {


            String[] splitLine = line.split(cvsSplitBy);

            String campaign = splitLine[0];
            String adGroup =  splitLine[1];
            String url = splitLine[2];              
            long searchCount = getSearchCount(url);             

            StringBuilder sb = new StringBuilder();
            sb.append(campaign + ",");
            sb.append(adGroup + ",");               
            sb.append(searchCount + ",");               
            writeToFile(sb, getNewFileName());

        }

    } catch (Exception e) {
        e.printStackTrace();
    }
}

private long getSearchCount(String url) {
    long recordCount = 0;
    try {

        DefaultHttpClient httpClient = new DefaultHttpClient();

        HttpGet getRequest = new HttpGet(
                "api.com/querysearch?q="
                        + url);
        getRequest.addHeader("accept", "application/json");

        HttpResponse response = httpClient.execute(getRequest);

        if (response.getStatusLine().getStatusCode() != 200) {
            throw new RuntimeException("Failed : HTTP error code : "
                    + response.getStatusLine().getStatusCode());
        }

        BufferedReader br = new BufferedReader(new InputStreamReader(
                (response.getEntity().getContent())));

        String output;

        while ((output = br.readLine()) != null) {
            try {

                JSONObject json = (JSONObject) new JSONParser()
                        .parse(output);
                JSONObject result = (JSONObject) json.get("result");
                recordCount = (long) result.get("count");
                System.out.println(url + "=" + recordCount);

            } catch (Exception e) {
                System.out.println(e.getMessage());
            }

        }

        httpClient.getConnectionManager().shutdown();

    } catch (Exception e) {
        e.getStackTrace();
    }
    return recordCount;

}
4

1 回答 1

1

由于远程调用比本地磁盘访问慢,因此您需要以某种方式并行化或批处理远程调用。如果您不能对远程 API 进行批量调用,但它允许多个并发读取,那么也许您想使用线程池之类的东西来进行远程调用:

public void readCSVFile() {
    // exception handling ignored for space
    br = new BufferedReader(new FileReader(getFileName()));
    List<Future<String>> futures = new ArrayList<Future<String>>();
    ExecutorService pool = Executors.newFixedThreadPool(5);

    while ((line = br.readLine()) != null) {
        final String[] splitLine = line.split(cvsSplitBy);
        futures.add(pool.submit(new Callable<String> {
            public String call() {
                long searchCount = getSearchCount(splitLine[2]);
                return new StringBuilder()
                    .append(splitLine[0]+ ",")
                    .append(splitLine[1]+ ",")
                    .append(searchCount + ",")
                    .toString();
            }
        }));
    }

    for (Future<String> fs: futures) {
        writeToFile(fs.get(), getNewFileName());
    }

    pool.shutdown();
}

但是,理想情况下,如果可能的话,您真的希望从远程 API 进行一次批量读取。

于 2013-08-16T03:08:51.540 回答