3

这是一个简单的案例:

let html = `<<some huge html file>>`
var libxmljs = require("libxmljs");

class MyObject{
  constructor(html){
    this.doc = libxmljs.parseHtml(html);
    this.node = this.doc.root()
  }
}

let obj

for(var i = 0; i < 100000; i++){
  obj = new MyObject(html)
  // if I uncomment the next line it works fine
  // obj.node = null
  console.log(i)
}

当我运行它时,脚本很快就会耗尽内存,显然是因为 obj.node 没有正确收集垃圾。当我认为我已经完成它时,如何确保在不明确将其设置为 null 的情况下发生这种情况?

4

2 回答 2

1

如果您没有将引用专门存储在类实例中,则对象.root()返回似乎更多。内存使用似乎仍然相当泄漏,因为分配的全部堆量从未回收。Node 本身使用的内存似乎是堆上的两倍来处理本机 libxml 代码。也许会在 libxmljs 上提出一个问题,因为这就像一个错误。

不将对象存储在类实例中而是将其传递会更好。

class MyObject{
  constructor(){
    this.doc = libxmljs.parseHtml(html)
  }
  get node(){
    return this.doc.root()
  }
}

使用普通对象也效果更好。

function myObject(){
  let doc = libxmljs.parseHtml(html)
  let node = doc.root()
  return {
    doc: doc,
    node: node,
  }
}

作为替代方案,可以尝试其中一个基于 JS 的解析器

于 2017-10-20T08:05:44.993 回答
1

TL;DR: It's the library and not node which is an issue.

Long answer

Here is a slightly modified code

var heapdump = require('heapdump');
const fs = require('fs');
var libxmljs = require("libxmljs");

const content = fs.readFileSync('./html2.htm');
let id = 0;

class MyObject{
  constructor(){
    this.doc = libxmljs.parseHtml(content);
    this.node = this.doc.root()
  }
}

let obj;

function createObject () {
  obj = new MyObject(content);
};


try {
  for(var i = 0; i < 3000; i++){
    createObject();
    // if I uncomment the next line it works fine
    // obj.node = null
    console.log(i);
    if (i === 50) {
      heapdump.writeSnapshot('/Users/me/3.heapsnapshot');
    }
    if (i === 100) {
      heapdump.writeSnapshot('/Users/me/4.heapsnapshot');
    }
    if (i === 150) {
      heapdump.writeSnapshot('/Users/me/5.heapsnapshot');
    }

  }
  console.log('done');
}
catch(e) {
  console.log(e);
}

Below is the relevant section of the heapdump diff we took in the code (3 and 4)

enter image description here

And even clear when we look at 4 and 5 heapdump

enter image description here

Few thing that we can conclude from these heapdumps:

  • There is no memory leak in the JS part.
  • The size of the heapdump does not match the size of the process we see on htop/top/activity monitor depending on your OS. (12 MB of heapdump versus few Gb in RAM)

Heapdump will only give us memory leak which are in JS. Since this library has c code, heapdump will not capture leaks which will be there.

I am not sure how we can capture the dump from that library or why setting it to null allows the memory to be freed but it should be safe to assume that node gc is doing everything it can.

Hope this helps

于 2017-10-21T19:05:23.573 回答