Number of Worker vs. Number of Nodes in a Cluster

Question: The number of workers should be equal to the number of nodes in a cluster?
Example: I have a 4 node (lets say each node has 4vCPUs and 16GiB of memory) cluster on gcloud. On my helm chart, should i put 3 workers (it will be 3 workers plus the coordinator) or the number of workers is independent of the number of nodes?

When configuring Trino, the only difference between a coordinator and a worker is that you set the property coordinator=true in the etc/ file. That tells that node that it is the central point of contact for the discovery service, and that it is responsible for parsing, analyzing, and scheduling queries.

By default the coordinator is not included as a node that can be scheduled as to run queries. So your case of 4 nodes, you only have 3 workers.

However, if you decide you want to include the coordinator in the data processing, you can set this property to true node-scheduler.include-coordinator=true which will now schedule tasks on the coordinator and technically it’s a worker node that also is the coordinator node. In which case your worker count is 4.I don’t believe this is recommended for production uses. Likely its fine for testing/dev environments unless you’re simulating production.

Here’s an example of what I mean:

4 nodes
1 coordinator
3 workers

(this is the default)
4 nodes
1 coordinator
4 workers (overlaps with coordinator)

1 Like

Just wanted to add some Kubernetes specific information to all the great info from @bitsondatadev!

When I deploy Trino on Kubernetes in production, we typically have 1 worker pod per node. But in general, the number of worker pods can be independent of the number of nodes in your Kubernetes cluster.

In glcoud, we like to have 1 node pool for coordinator pods and 1 node pool for worker pods. The node pool for worker pods then typically has auto scaling enabled and only 1 worker pod will be on each node.

1 Like