52

问题

我发现浏览器可以生成的 Web Worker 的数量是有限制的。

例子

主要 HTML / JavaScript

<script type="text/javascript">
$(document).ready(function(){
    var workers = new Array();
    var worker_index = 0;
    for (var i=0; i < 25; i++) {
        workers[worker_index] = new Worker('test.worker.js');
        workers[worker_index].onmessage = function(event) {
            $("#debug").append('worker.onmessage i = ' + event.data + "<br>");
        };
        workers[worker_index].postMessage(i); // start the worker.      

        worker_index++;
    }   
});
</head>
<body>
<div id="debug">
</div>

test.worker.js

self.onmessage = function(event) {
    var i = event.data; 

    self.postMessage(i);
};

使用 Firefox(版本 14.0.1,Windows 7)时,这将在容器中仅生成 20 行输出。

问题

有没有解决的办法?我能想到的唯一两个想法是:

1)菊花链网络工作者,即让每个网络工作者产生下一个

例子:

<script type="text/javascript">
$(document).ready(function(){
    createWorker(0);
});

function createWorker(i) {

    var worker = new Worker('test.worker.js');
    worker.onmessage = function(event) {
        var index = event.data;

        $("#debug").append('worker.onmessage i = ' + index + "<br>");

        if ( index < 25) {
            index++;
            createWorker(index);
        } 
    };
    worker.postMessage(i); // start the worker.
}
</script>
</head>
<body>
<div id="debug"></div>

2)将网络工作者的数量限制为有限的数量,并修改我的代码以使用该限制(即,在有限数量的网络工作者之间分担工作负载) - 像这样:http ://www.smartjava.org /content/html5-easy-parallelize-jobs-using-web-workers-and-threadpool

不幸的是,#1 似乎不起作用(只有有限数量的网络工作者会在页面加载时产生)。还有其他我应该考虑的解决方案吗?

4

4 回答 4

107

Old question, let's revive it! readies epinephrine

I've been looking into using Web Workers to isolate 3rd party plugins since web workers can't access the host page. I'll help you out with your methods which I'm sure you've solved by now, but this is for teh internetz. Then I'll give some relevant information from my research.

Disclaimer: In the examples that I used your code, I've modified and cleaned the code to provide a full source code without jQuery so that you and others can run it easily. I've also added a timer which alerts the time in ms to execute the code.

In all examples, we reference the following genericWorker.js file.

genericWorker.js

self.onmessage = function(event) {
    self.postMessage(event.data);
};

Method 1 (Linear Execution)

Your first method is nearly working. The reason why it still fails is that you aren't deleting any workers once you finish with them. This means the same result (crashing) will happen, just slower. All you need to fix it is to add worker.terminate(); before creating a new worker to remove the old one from memory. Note that this will cause the application to run much slower as each worker must be created, run, and be destroyed before the next can run.

Linear.html

<!DOCTYPE html>
<html>
<head>
    <title>Linear</title>
</head>
<body>
    <pre id="debug"></pre>
    <script type="text/javascript">
        var debug = document.getElementById('debug');
        var totalWorkers = 250;
        var index = 0;
        var start = (new Date).getTime();

        function createWorker() {
            var worker = new Worker('genericWorker.js');
            worker.onmessage = function(event) {
                debug.appendChild(document.createTextNode('worker.onmessage i = ' + event.data + '\n'));
                worker.terminate();
                if (index < totalWorkers) createWorker(index);
                else alert((new Date).getTime() - start);
            };
            worker.postMessage(index++); // start the worker.
        }

        createWorker();
    </script>
</body>
<html>

Method 2 (Thread Pool)

Using a thread pool should greatly increase running speed. Instead of using some library with complex lingo, lets simplify it. All the thread pool means is having a set number of workers running simultaneously. We can actually just modify a few lines of code from the linear example to get a multi-threaded example. The code below will find how many cores you have (if your browser supports this), or default to 4. I found that this code ran about 6x faster than the original on my machine with 8 cores.

ThreadPool.html

<!DOCTYPE html>
<html>
<head>
    <title>Thread Pool</title>
</head>
<body>
    <pre id="debug"></pre>
    <script type="text/javascript">
        var debug = document.getElementById('debug');
        var maxWorkers = navigator.hardwareConcurrency || 4;
        var totalWorkers = 250;
        var index = 0;
        var start = (new Date).getTime();

        function createWorker() {
            var worker = new Worker('genericWorker.js');
            worker.onmessage = function(event) {
                debug.appendChild(document.createTextNode('worker.onmessage i = ' + event.data + '\n'));
                worker.terminate();
                if (index < totalWorkers) createWorker();
                else if(--maxWorkers === 0) alert((new Date).getTime() - start);
            };
            worker.postMessage(index++); // start the worker.
        }

        for(var i = 0; i < maxWorkers; i++) createWorker();
    </script>
</body>
<html>

Other Methods

Method 3 (Single worker, repeated task)

In your example, you're using the same worker over and over again. I know you're simplifying a probably more complex use case, but some people viewing will see this and apply this method when they could be using just one worker for all the tasks.

Essentially, we'll instantiate a worker, send data, wait for data, then repeat the send/wait steps until all data has been processed.

On my computer, this runs at about twice the speed of the thread pool. That actually surprised me. I thought the overhead from the thread pool would have caused it to be slower than just 1/2 the speed.

RepeatedWorker.html

<!DOCTYPE html>
<html>
<head>
    <title>Repeated Worker</title>
</head>
<body>
    <pre id="debug"></pre>
    <script type="text/javascript">
        var debug = document.getElementById('debug');
        var totalWorkers = 250;
        var index = 0;
        var start = (new Date).getTime();
        var worker = new Worker('genericWorker.js');

        function runWorker() {
            worker.onmessage = function(event) {
                debug.appendChild(document.createTextNode('worker.onmessage i = ' + event.data + '\n'));
                if (index < totalWorkers) runWorker();
                else {
                    alert((new Date).getTime() - start);
                    worker.terminate();
                }
            };
            worker.postMessage(index++); // start the worker.
        }

        runWorker();
    </script>
</body>
<html>

Method 4 (Repeated Worker w/ Thread Pool)

Now, what if we combine the previous method with the thread pool method? Theoretically, it should run quicker than the previous. Interestingly, it runs at just about the same speed as the previous on my machine.

Maybe it's the extra overhead of sending the worker reference on each time it's called. Maybe it's the extra workers being terminated during execution (only one worker won't be terminated before we get the time). Who knows. Finding this out is a job for another time.

RepeatedThreadPool.html

<!DOCTYPE html>
<html>
<head>
    <title>Repeated Thread Pool</title>
</head>
<body>
    <pre id="debug"></pre>
    <script type="text/javascript">
        var debug = document.getElementById('debug');
        var maxWorkers = navigator.hardwareConcurrency || 4;
        var totalWorkers = 250;
        var index = 0;
        var start = (new Date).getTime();

        function runWorker(worker) {
            worker.onmessage = function(event) {
                debug.appendChild(document.createTextNode('worker.onmessage i = ' + event.data + '\n'));
                if (index < totalWorkers) runWorker(worker);
                else {
                    if(--maxWorkers === 0) alert((new Date).getTime() - start);
                    worker.terminate();
                }
            };
            worker.postMessage(index++); // start the worker.
        }

        for(var i = 0; i < maxWorkers; i++) runWorker(new Worker('genericWorker.js'));
    </script>
</body>
<html>

Now for some real world shtuff

Remember how I said I was using workers to implement 3rd party plugins into my code? These plugins have a state to keep track of. I could start the plugins and hope they don't load too many for the application to crash, or I could keep track of the plugin state within my main thread and send that state back to the plugin if the plugin needs to be reloaded. I like the second one better.

I had written out several more examples of stateful, stateless, and state-restore workers, but I'll spare you the agony and just do some brief explaining and some shorter snippets.

First-off, a simple stateful worker looks like this:

StatefulWorker.js

var i = 0;

self.onmessage = function(e) {
    switch(e.data) {
        case 'increment':
            self.postMessage(++i);
            break;
        case 'decrement':
            self.postMessage(--i);
            break;
    }
};

It does some action based on the message it receives and holds data internally. This is great. It allows for mah plugin devs to have full control over their plugins. The main app instantiates their plugin once, then will send messages for them to do some action.

The problem comes in when we want to load several plugins at once. We can't do that, so what can we do?

Let's think about a few solutions.

Solution 1 (Stateless)

Let's make these plugins stateless. Essentially, every time we want to have the plugin do something, our application should instantiate the plugin then send it data based on its old state.

data sent

{
    action: 'increment',
    value: 7
}

StatelessWorker.js

self.onmessage = function(e) {
    switch(e.data.action) {
        case 'increment':
            e.data.value++;
            break;
        case 'decrement':
            e.data.value--;
            break;
    }
    self.postMessage({
        value: e.data.value,
        i: e.data.i
    });
};

This could work, but if we're dealing with a good amount of data this will start to seem like a less-than-perfect solution. Another similar solution could be to have several smaller workers for each plugin and sending only a small amount of data to and from each, but I'm uneasy with that too.

Solution 2 (State Restore)

What if we try to keep the worker in memory as long as possible, but if we do lose it, we can restore its state? We can use some sort of scheduler to see what plugins the user has been using (and maybe some fancy algorithms to guess what the user will use in the future) and keep those in memory.

The cool part about this is that we aren't looking at one worker per core anymore. Since most of the time the worker is active will be idle, we just need to worry about the memory it takes up. For a good number of workers (10 to 20 or so), this won't be substantial at all. We can keep the primary plugins loaded while the ones not used as often get switched out as needed. All the plugins will still need some sort of state restore.

Let's use the following worker and assume we either send 'increment', 'decrement', or an integer containing the state it's supposed to be at.

StateRestoreWorker.js

var i = 0;

self.onmessage = function(e) {
    switch(e.data) {
        case 'increment':
            self.postMessage(++i);
            break;
        case 'decrement':
            self.postMessage(--i);
            break;
        default:
            i = e.data;
    }
};

These are all pretty simple examples, but I hope I helped understand methods of using multiple workers efficiently! I'll most likely be writing a scheduler and optimizer for this stuff, but who knows when I'll get to that point.

Good luck, and happy coding!

于 2015-04-02T21:32:32.660 回答
14

My experience is that too many workers (> 100) decrease the performance. In my case FF became very slow and Chrome even crashed. I compared variants with different amounts of workers (1, 2, 4, 8, 16, 32). The worker performed an encryption of a string. It turned out that 8 was the optimal amount of workers, but that may differ, depending on the problem the worker has to solve.

I built up a small framework to abstract from the amount of workers. Calls to the workers are created as tasks. If the maximum allowed number of workers is busy, a new task is queued and executed later.

It turned out that it's very important to recycle the workers in such an approach. You should hold them in a pool when they are idle, but don't call new Worker(...) too often. Even if the workers are terminated by worker.terminate() it seems that there is a big difference in the performance between creating/terminating and recycling of workers.

于 2013-07-14T21:11:09.250 回答
3

老问题,但在搜索时出现,所以...... Firefox 中有一个可配置的限制。如果您查看about:config(在 FF 的地址栏中输入地址)并搜索“worker”,您将看到几个设置,包括以下设置:

dom.workers.maxPerDomain

默认设置为20。双击该行并更改设置。您将需要重新启动浏览器。

于 2015-09-03T02:45:21.157 回答
2

您在解决方案#1 中链接您的 Worker 的方式会弹劾垃圾收集器以终止 Worker 实例,因为您在 onmessage 回调函数的范围内仍然有对它们的引用。

试试这段代码:

<script type="text/javascript">
var worker;
$(document).ready(function(){
    createWorker(0);
});
function createWorker(i) {
   worker = new Worker('test.worker.js');
   worker.onmessage = handleMessage;
   worker.postMessage(i); // start the worker.
}
function handleMessage(event) {
       var index = event.data;
       $("#debug").append('worker.onmessage i = ' + index + "<br>");

        if ( index < 25) {
            index++;
            createWorker(index);
        } 
    };
</script>
</head>
<body>
<div id="debug"></div>
于 2013-05-19T14:38:25.253 回答