0

I was testing out using IBM Cloud Functions (managed Apache OpenWhisk) to run code in the background after the action finishes, but my callback provided when calling setTimeout didn't run at the right time later, and it never ran at all unless I invoked the function a second time. It ran at that point (late).

Details:

I had two use cases in mind:

  • Accumulating multiple requests' data in memory and then putting a large object to a Cloud Object Storage bucket once a lot has accumulated or a certain amount of time has passed with no requests.
  • Managing a database connection, per container, for APIs, so that I can close the connection in an unused container that IBM Cloud Functions hasn't killed yet.

I assumed this would work because I've used other platforms like Google Cloud Run that I noticed running code in the background (using setTimeout etc), seeing the logs for this code in Stackdriver after the request finished. And, there's even an entire library created by an AWS developer advocate that manages MySQL connections in the background on AWS Lambda (https://www.npmjs.com/package/serverless-mysql).

I tested with the following function:

// from https://stackoverflow.com/questions/105034/how-to-create-guid-uuid
function uuidv4() {
    return 'xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx'.replace(/[xy]/g, function (c) {
        var r = Math.random() * 16 | 0, v = c == 'x' ? r : (r & 0x3 | 0x8);
        return v.toString(16);
    });
}

function main() {
    const runId = uuidv4().slice(31, 36);

    console.log(`function started (runId = ${runId})`);

    setTimeout(() => {
        console.log(`after 5s delay (runId = ${runId})`);
    }, 5000);

    return {
        msg: `ok (runId = ${runId})`,
    };
}

And I deployed it with the command ibmcloud fn action update logging-in-background src/index.js --kind nodejs:10.

I created a LogDNA instance and set it as my platform instance so my function logs would go to it. This is what I see in the logs after invoking the function three times with the command ibmcloud fn action invoke logging-in-background --blocking, each 10 seconds apart (CRN redacted):

May 18 17:26:23 functions REDACTED 2020-05-18T21:26:23.956013Z    stdout: function started (runId = 9be7c)
May 18 17:26:23 functions REDACTED Activation record '3589870e8ce44cc089870e8ce4acc018' for entity 'logging-in-background'
May 18 17:26:34 functions REDACTED 2020-05-18T21:26:34.111745Z    stdout: after 5s delay (runId = 9be7c)
May 18 17:26:34 functions REDACTED 2020-05-18T21:26:34.115043Z    stdout: function started (runId = faba6)
May 18 17:26:34 functions REDACTED Activation record 'ac47c067177648f187c0671776b8f1c2' for entity 'logging-in-background'
May 18 17:26:44 functions REDACTED 2020-05-18T21:26:44.248470Z    stdout: after 5s delay (runId = faba6)
May 18 17:26:44 functions REDACTED 2020-05-18T21:26:44.253822Z    stdout: function started (runId = 0af34)
May 18 17:26:44 functions REDACTED Activation record 'bbad3eabb3d64ab1ad3eabb3d61ab1a7' for entity 'logging-in-background'

You can see how when I first invoked the function, it only logged the "function started" message. It didn't log the "after 5s delay" message 5 seconds later. But then, at the beginning of the second invocation, 10 seconds after the first invocation, it finally logs the "after 5s delay" message associated with run 9be7c. The callback for setTimeout appears to never run until, at earliest, the next time the action is invoked.

Is this how Apache OpenWhisk is meant to work by design, or is there something I'm not doing correctly to run code in the background after the action is finished?

4

2 回答 2

0

Your function is returning before the setTimeout has completed.

Take a look at the docs here:

https://github.com/apache/openwhisk/blob/master/docs/actions-nodejs.md#creating-asynchronous-actions

The correct way to do this:

function main(args) {
    const runId = uuidv4().slice(31, 36);

    console.log(`function started (runId = ${runId})`);
    return new Promise(function(resolve, reject) {
       setTimeout(function() {
         console.log(`after 5s delay (runId = ${runId})`);
         resolve({ msg: `ok (runId = ${runId}) });
       }, 5000);
    })
}

With AWS lambda if when you leave tasks in the event loop, the function doesn't return/terminate the request until the even loop is empty. If you use the special context flag to allow the function to return eagerly, you're not guaranteed that the background processing will actually complete.

To initialize connection pools and reuse them across invocations, you can store the connection in a global variable and make sure it's initialized only.

于 2020-05-18T22:03:08.723 回答
0

Another user helped me understand what was going by explaining that OpenWhisk does not run code in the event loop after the action finishes. Therefore, it is impossible to do what I wanted to do, where I wanted every action invoked to retrieve data stored in a local variable some time after the action finished.

于 2020-06-06T19:35:17.887 回答