0

I am evaluating cadence for implementing our business orchestration. I understand that the workers continuously poll the task list for tasks to execute. My concern here is that will it cause any scale problems? The worker is always busy and continuously polling some database, along with this it also needs to execute the business logic so is there a possibility that it runs out of resources and then crashes or drops the tasks to execute?

How does this polling mechanism scale when we have millions of workflows? Will it cause delays in executing the workflow code, when we have millions of tasks in the task list?

4

2 回答 2

3

Some pointers in addition to Maxim's answer -- Cadence/Temporal doesn't have the scaling issues as long as you are setting up correctly.

When you have millions of tasks in tasklist(taskqueue), and need to run thousands of workers to process those tasks, make sure you configure scalable tasklist with more partition.

Essentially, by default a tasklist only uses one partition, which is mapped to one db partition (if using Cassandra or multiple SQL or other NoSQL), and owned by one matching host. So it's not scalable enough to serve thousands of worker hosts, and millions of tasks. Therefore, you need to scale up the tasklist by adding more partitions. Otherwise the matching host will run too hot, and the DB partition will be hot partition (and having high latency).

See the docs about how to enable Scalable tasklist feature:https://cadenceworkflow.io/docs/operation-guide/maintain/#scale-up-a-tasklist-using-scalable-tasklist-feature

于 2021-11-19T19:03:26.580 回答
2

Cadence and Temporal use long polling over gRPC to listen to task queues. So if there are no messages in the queues the poll requests return once per minute. This way workers don't consume excessive resources due to polling. Also, most poll calls never cause a call to the database due to various optimizations the matching service implements.

The number of open workflows doesn't affect polling performance at all as many of these workflows can be passively waiting on a timer on an external event. The number of operations per second that workflows execute defines how many tasks have to be delivered to workers. If the cluster and workers are provisioned correctly then even a high rate of tasks shouldn't cause any issues.

于 2021-11-18T02:39:36.823 回答