xml - 使用 nodejs async 处理大型 xml 文件（有关系）

Question

我必须处理一个大型 XML 文件（大小约为 25 mb），并将数据组织成文档以导入 MongoDB。

问题是，xml 文档中大约有 5-6 种类型的元素，每一种都有大约 10k 行。

在获取了一个 a 类型的 xml 节点之后，我必须获取它对应的 b、c、d 等类型的元素。

我在节点中尝试做的事情：

获取类型 a 的所有行。
对于每一行，使用 xpath，找到其对应的相关行，并创建文档。
在 mongodb 中插入文档

如果有 10k 行类型 a，则第 2 步运行 10k 次。我试图让它并行运行，这样事情就不会永远持续下去。因此， async.forEach 似乎是完美的解决方案。

async.forEach(rowsA,fetchA);

我的 fetchrelations 函数有点像这样

var fetchA = function(rowA) {
//covert the xml row into an object 
    var obj = {};
    for(i in rowA.attributes) {
    attribute = rowA.attributes[i];
    if(attribute.value === undefined) 
        continue;
    obj[attribute.name] = attribute.value;
    }
    console.log(obj.someattribute);
    //first other related rows, 
    //callback inserts the modified object with the subdocuments
    findRelations(obj,function(obj){
        insertA(obj,postInsert);
    });
};

在我尝试运行它之后，代码中的 console.log 大约每 1.5 秒运行一次，而不是像我预期的那样对每一行并行运行。在过去的两个小时里，我一直在挠头并试图弄清楚这一点，但我不确定我做错了什么。

我不是很擅长node，所以请耐心等待。

score 1 · Accepted Answer

在我看来，您没有声明和调用异步将传递给您的迭代器函数 ( fetchA) 的回调函数。有关示例，请参阅forEach 文档。

您的代码可能需要看起来更像...

var fetchA = function(rowA, cb) {
//covert the xml row into an object 
    var obj = {};
    for(i in rowA.attributes) {
    attribute = rowA.attributes[i];
    if(attribute.value === undefined) 
        cb();
    obj[attribute.name] = attribute.value;
    }
    console.log(obj.someattribute);
    //first other related rows, 
    //callback inserts the modified object with the subdocuments
    findRelations(obj,function(obj){
        insertA(obj,postInsert);
        cb();  // You may even need to call this within insertA or portInsert if those are asynchronous functions.
    });
};

xml - 使用 nodejs async 处理大型 xml 文件（有关系）

1 回答 1

Related

Reference