6

我需要在树上递归以使用异步操作对特定节点执行操作。如何控制流量,以便在完成后访问节点?

这是一个示例情况:

data = {
  name: "deven",
  children: [
    { name: "andrew" },
    { name: "donovan" },
    { name: "james",
      children: [
        { name: "donatello" },
        { name: "dan" }
      ]
    },
    { name: "jimmy",
      children: [
        { name: "mike" },
        { name: "dank" }
      ]
    }
  ]
};

我有一个函数,其目标是遍历树并将所有以“d”开头的名称大写。之后,我想将树传递给另一个函数来做更多的工作(可能删除所有名称以“a”开头的节点),但只有在完成初始处理之后:

function capitalize_d(node) {
    if(node.name === "d") {
        node.name = node.name.toUpperCase();
    }

    if(node.children != null) {
        for(var i = 0; i < node.children.length; i++) {
            capitalize_d(node.children[i]);
        }
    }
}

function remove_a(node) {
}

capitalize_d(data);

// Should only get called after all the d's have been capitalized.
remove_a(data);

上面的代码工作正常,因为 capitalize_d 是阻塞的。如果是capitalize_d异步递归,怎么保证remove_a完成后被调用呢?注意中的setTimeout调用capitalize_d

function capitalize_d(node) {
    setTimeout(function() {

        if(node.name === "d") {
            node.name = node.name.toUpperCase();
        }

        if(node.children != null) {
            for(var i = 0; i < node.children.length; i++) {
                capitalize_d(node.children[i]);
            }
        }

    }, 1);
}

function remove_a(node) {
}

capitalize_d(data);

// Should only get called after all the d's have been capitalized.
remove_a(data);

问题是我们同时处理了树的不同分支,并且无法判断它何时最终完成了对树的处理。

我该如何解决这个问题?

4

2 回答 2

3

Let me summarize what I understood of your requirements:

  • you have a data tree where each single node can be altered asynchronously by a set of operations (capitalize_d and remove_a in your example),
  • you want to make sure every single node has been subjected to a given operation before allowing the next one.

I have spent 10 years or more designing real-time embedded software, and believe me, the requirements in this area are meaner and scarrier than anything most network programmers will experience in their entire life. This makes me warn you that you seem to be headed the seriously wrong way here.

As I can imagine it, your problem is to organize a set of individual data in some sort of meaningful structure. Some process collects random bits of information (what you call 'nodes' in your example), and at some point you want to put all those nodes into a consistent, monolithic data structure (a hierarchical tree in your example).

In other words, you have three tasks at hand:

  • a data acquisition process that will collect nodes asynchronously
  • a data production process that will present a consistent, refined data tree
  • a controller process that will synchronize data acquisition and production (possibly directly the user interface if the two above processes are smart enough, but don't count on it too much).

My advice: don't try to do acquisition and production at the same time.

Just to give you an idea of the nightmare you're headed for:

  • depending on how the operations are triggered, there is a possibility that the tree will never be completely processed by a given operation. Let's say the controlling software forgets to call capitalize_d on a few nodes, remove_a will simply never get the green light

  • conversely, if you fire at the tree at random, it is very likely that some nodes will get processed multiple times, unless you keep track of the operation coverage to prevent applying the same transformation twice to a given node

  • if you want remove_a processing to ever start, you might have to prevent the controlling software from sending any more capitalize_d requests, or else the light might stay stuck to red forever. You will end up doing flow control over your requests one way or another (or worse : you will not do any, and your system will be likely to freeze to death should the operation flow wander away from the sweet spot you happened to hit by chance).

  • if an operation alters the structure of the tree (as remove_a obviously does), you have to prevent concurrent accesses. At the very least, you should lock the subtree starting from the node remove_a is working on, or else you would allow processing a subtree that is likely to be asynchronously altered and/or destroyed.

Well it's doable. I've seen fine youg men earning big money doing variations on this theme. They usually spent a couple of evenings each week eating pizzas in front of their computers too, but hey, that's how you can tell hardboiled hackers from the crowd of quiche eaters, right?...

I assume you being posting this question here means you don't really want to do this. Now if you boss does, well, to quote a famous android, I can't lie to you about your chances, but... you have my sympathies.

Now seriously folks.. This is the way I would tackle the problem.

1) take a snapshot of your data at a given point in time

you can weed out raw data using as many criteria as you can (last data acquisition too old, incorrect input, whatever allows you to build the smallest possible tree).

2) build the tree with the snapshot, and then apply whatever capitalize_d, remove_a and camelize_z operations sequentially on this given snapshot.

In parallel, the data acquisition process will continue collecting new nodes or updating existing ones, ready to take the next snapshot.

Besides, you can move some of your processing forward. Obviously capitalize_d does not take any advantage of the tree structure, so you can apply capitalize_d to each node in the snapshot before the tree is even built. You might even be able to apply some of the transformations earlier, i.e. on each collected sample. This can save you a lot of processing time and code complexity.

To end with a bit of theroretical babble,

  • your approach is to consider the data tree a shared object that should support concurrent acces from the data acquisition and data production processes,
  • my approach is to make the data acquisition process serve (asynchronously) consistent sets of data to a data production process, which can then handle sequentially the said set of data.

the data production process could be triggered on demand (say when the end user clicks the "show me something" button), in which case the reactivity would be rather poor: the user would be stuck watching an hourglass or whatever Web2.0 sexy spinning wheel for the time needed to build and process the tree (lets's say 7-8 seconds).

or you could activate the data production process periodically (feed it with a new snapshot every 10 seconds, a period safely above the mean processing time of a data set). The "show me something" button would then just present the last set of completed data. An immediate answer, but with data that could be 10 seconds older than the last recieved samples.

I rarely saw cases where this was not considered acceptable, especially when you produce a bunch of complex data an operator needs a few dozen seconds to digest.

In theory, my approach will lose some reactivity, since the data processed will be slightly outdated, but the concurrent access approach would probably result in an even slower software (and most certainly 5-10 times bigger and buggier).

于 2013-12-29T21:31:14.170 回答
3

我知道这篇文章很旧,但它出现在搜索结果中,并且唯一的回复没有提供一个有效的例子,所以这是我最近做的事情的修改版本......

function processTree(rootNode, onComplete) {

    // Count of outstanding requests.
    // Upon a return of any request,
    // if this count is zero, we know we're done.
    var outstandingRequests = 0;

    // A list of processed nodes,
    // which is used to handle artifacts
    // of non-tree graphs (cycles, etc).
    // Technically, since we're processing a "tree",
    // this logic isn't needed, and could be
    // completely removed.
    //
    // ... but this also gives us something to inspect
    // in the sample test code. :)
    var processedNodes = [];

    function markRequestStart() {
        outstandingRequests++;
    }

    function markRequestComplete() {
        outstandingRequests--;
        // We're done, let's execute the overall callback
        if (outstandingRequests < 1) { onComplete(processedNodes); }
    }

    function processNode(node) {
        // Kickoff request for this node
        markRequestStart();
        // (We use a regular HTTP GET request as a
        // stand-in for any asynchronous action)
        jQuery.get("/?uid="+node.uid, function(data) {
            processedNodes[node.uid] = data;
        }).fail(function() {
            console.log("Request failed!");
        }).always(function() {
            // When the request returns:
            // 1) Mark it as complete in the ref count
            // 2) Execute the overall callback if the ref count hits zero
            markRequestComplete();
        });

        // Recursively process all child nodes (kicking off requests for each)
        node.children.forEach(function (childNode) {
            // Only process nodes not already processed
            // (only happens for non-tree graphs,
            // which could include cycles or multi-parent nodes)
            if (processedNodes.indexOf(childNode.uid) < 0) {
                processNode(childNode);
            }
        });

    }

    processNode(rootNode);
}

这是使用 QUnit 的示例用法:

QUnit.test( "async-example", function( assert ) {
    var done = assert.async();

    var root = {
        uid: "Root",
        children: [{
            uid: "Node A",
            children: [{
                uid: "Node A.A",
                children: []
            }]
        },{
            uid: "Node B",
            children: []
        }]
    };

    processTree(root, function(processedNodes) {
        assert.equal(Object.keys(processedNodes).length, 4);
        assert.ok(processedNodes['Root']);
        assert.ok(processedNodes['Node A']);
        assert.ok(processedNodes['Node A.A']);
        assert.ok(processedNodes['Node B']);
        done();
    });
});
于 2016-05-22T20:07:37.487 回答