我有以下方法,我在多线程执行中从我的地图任务运行,但是这在独立模式下运行良好,但是当我在 Hadoop YARN 中运行它时,它会耗尽 1GB 的物理内存并且虚拟内存也会射击向上。
从编程的角度来看,我需要知道我是否做错了什么,我想我正在关闭我正在尽快打开的所有流,所以我认为没有理由发生内存泄漏。请指教。
谢谢。
公共静态无效manageTheCurrentURL(字符串网址){
logger.trace("Entering the method manageTheCurrentURL ");
InputStream stream = null;
InputStream is = null;
ByteArrayOutputStream out = null;
WebDriver driver = null;
try {
if (StringUtils.isNotBlank(url)) {
caps.setJavascriptEnabled(true); // not really needed: JS
// enabled by default
caps.setCapability(
PhantomJSDriverService.PHANTOMJS_EXECUTABLE_PATH_PROPERTY,
"/usr/local/bin/phantomjs");
// Launch driver (will take care and ownership of the phantomjs
// process)
driver = new PhantomJSDriver(caps);
driver.get(url);
String htmlContent = driver.getPageSource();
if (htmlContent != null) {
is = new ByteArrayInputStream(htmlContent.getBytes());
ByteArrayDocumentSource byteArrayDocumentSource = new ByteArrayDocumentSource(
is, url, "text/html");
Any23 runner = new Any23();
runner.setHTTPUserAgent("test-user-agent");
out = new ByteArrayOutputStream();
TripleHandler handler = new NTriplesWriter(out);
try {
runner.extract(byteArrayDocumentSource, handler);
} catch (ExtractionException e) {
} finally {
if (driver != null) {
driver.quit();
//driver.close();
}
try {
handler.close();
} catch (TripleHandlerException e) {
}
if (is != null) {
try {
is.close();
} catch (IOException e) {
}
}
}
if (out != null) {
stream = new ByteArrayInputStream(out.toByteArray());
Iterator<Node[]> it = new DeltaParser(stream);
if (it != null) {
SolrCallbackForNXParser callback = new SolrCallbackForNXParser(
url);
callback.startStory();
while (it.hasNext()) {
Node[] abc = it.next();
callback.processStory(abc);
}
callback.endStory();
}
}
}
}
} catch (IOException e) {
return;
}
finally {
if (stream != null) {
try {
stream.close();
} catch (IOException e) {
}
}
if (out != null) {
try {
out.close();
} catch (IOException e) {
}
}
}
logger.trace("Exiting the method manageTheCurrentURL ");
}