What is the difference between a job queue and worker threads?
There are several different architectural patterns and solutions for handling asynchronous code. Queues, promises, worker threads, child processes, the cluster
module, scaling horizontally with multiple containers, etc.
Whether you're dealing with scaling up or computationally-heavy tasks, certain patterns/solutions will be the right choice over others.
Two of these patterns/methods that might be confusing as to when you would use which one due to perceived similarities are job/task queues and worker threads. Both are options if you want to handle computationally heavy work and not block the event loop. With both you are "outsourcing" work to be executed elsewhere besides the event loop.
Getting a better grasp on these will help you know when to use which solution, and help you implement your designs faster and in a more robust way. With architecture choices I've found that once you decide on it, it can be more difficult to change compared to changes at the "code-level", so best to pick the right solution for your problem/use case off the bat.
So what are the differences and when would you use which one?
Job queue
Compared to worker threads, a "job queue" (or "task queue") is more of an architectural design pattern. It's not a part of the NodeJS API, but is a pattern you implement (or a third party library implements for you), with a set of common features.
Job queues generally provide features like scheduling, coordinating, and retrying jobs, which helps with scalability as well as "fault-tolerance". Instead, if you handle all of that in application memory then it will be ephemeral, and less able to recover upon failure. If the main process dies, it will take down all threads with it.
With a queue, the jobs are usually run in a separate component/service from the application producing the jobs, and can run in parallel with it. The service that produces/schedules/creates the jobs is the producer, and the service(s) that do the work are consumers.
Worker threads
"Worker threads" are more of a primitive as opposed to an architectural or design pattern. This primitive is a building block used to constructu a larger abstraction, and you could even use worker threads to implement the job queue pattern. Of course, you would need a lot of other code and features to implement that pattern - simply "dropping in" worker threads doesn't automatically mean you have a job queue - but it can be done.
Each worker thread has its own V8 instance and event loop, compared to single-threaded NodeJS where there is one event loop that all code uses. This is an important distinction, because having multiple event loops is a big part of what allows worker threads to work as semi-isolated. I say "semi-isolated" because they still run as part of the same process as the "main" NodeJS thread.
To use queue terminology, in this case worker threads are generally the "consumers", and the "main" NodeJS thread is the "producer".
In contrast to a job queue (via the library you're likely using), you need to implement retry logic, scheduling, and priority levels for the worker threads yourself. There might be some libraries out there that do this that I'm not aware of, but the general idea is that retry-ability is not native to worker threads. And since the "main" Node thread is the orchestrator of the worker threads, it would be something you'd have to implement at that level, including tracking of which ones failed, initiating retries, etc.
Instead, if you have a CPU-intensive task that only has a single step - where failure may mean less than if you have a multistep process, in which case you'd likely have to rollback certain things at certain steps - and you can afford to let it fail, a worker thread might be the right choice for your use case.
Another important thing to note - an uncaught exception will kill the whole process, taking all threads including the "main" Node thread down with it. If that happens in a job consumer/processor, you still have all the others running assuming you have more than one consumer. Obviously you can mitigate this by having a "global" error even listener and handle the exception there, but even in case of a single worker thread failure another worker thread would have to be spun up and would not automatically recover. That's logic you would have to implement yourself in the "main" thread.
Questions for use case
Some things to consider:
- Do you already have queue infrastructure setup, a library selected, pattern for calling it, etc.?
- If you do, and you're on the fence about which solution to choose given your use case, it may make sense to use queues if you already have them setup.
- Do you need schedulability? (then use a job queue)
- It would be very difficult to build this feature using a purely worker threads solution - the main Node thread would have to act as an orchestrator polling some database where you have the schedules stored, and use some lookup to determine what code to run for the job, etc. What you would end up with would likely resemble a job queue anyways.
- Do you need retryability? (then probably use a job queue)
- You can build this feature with worker threads but what if the main node thread goes down or the container is restarted? If you were tracking retries in memory you would lose the number of attempts and when the process/container restarts it wouldn't be retried. Because threads are more "ephemeral" you have to factor in fault tolerance.
- How many threads do you have available vs. how many "nodes" can you distribute for workers, and what is the processing power of each of those?
- Depending on how many threads you have available in your service, and how many individual job consumer processes you can sping up will likely factor into your decision. If it's too expensive to be running that many separate job consumers, which will likely be a server/container per consumer, and you've determined you need that many consumers given your workload, it may make sense to look at if worker threads would be more cost-effective. Of course, if you have such a high workload you need lots of consumers then you might need lots of threads too, which may come back around to being too expensive, but it's something to consider.
Examples of when you would use which
At a certain point if you have enough requirements like being able to set job priority, scheduling, etc. it may make more sense to have a separate architectural piece for this, via a job queue.
A job queue is useful if you have lots of things in your "order of operations" to be done rather than a single thing As briefly mentioned before infrastructure/setup cost is a factor you'll have to consider in setting up a queue.
But looking at some more specific examples...
Lots of sensors / "Internet of Things" (IoT)
In this use case you have many sensors communicating frequently over the network with your server and database, likely calling multiple services and inserting/updating several different tables with sensor data. It's a more involved, multi-step process to process the data sent in by the sensor. It's not just a single task that happens. And you need to return a successful response from the server to your sensors as quickly as possible since sensors usually can't wait a long time for a response, and the server has lots to process.
So, you recognize you need to use an asynchronous pattern for this.
Worker threads are good for CPU-intensive work but don't help much with I/O-intensive work, and in an IoT use case where you have potentially hundreds of thousands of devices sending data every X seconds its very I/O-intensive, at the network and database layers as opposed to CPU-intensive processes like machine learning or image processing (generally speaking).
It's also possible that you have too many requests from sensors at once and don't have enough threads to delegate all that work in a timely fashion to the worker threads.
This is a use case where a queue is almost always going to be the better option.
Image processing
Imagine a use case where you have an auction site and a user can upload images of the items they're selling. These images may need to be scaled down to a reasonable resolution so as to not eat a ton of storage space, as well as be able to be edited by the user (rotated, brightened, etc).
While this can happen in the browser, generally image processing, resizing, etc is computationally itensive and so happens on the server using something like ImageMagick
.
This "task" is not complex in that there are multiple steps that need to happen. Instead it's a single step but is compute-heavy.
But it's also a task that if it fails, likely isn't a big deal. You can show an error message and the user can simply retry. Could you setup a queue for this? Sure, but the infrastructure setup if you're solely using it for this use case may not be worth the extra setup overhead and cost.
Because you probably don't need retries, you definitely don't need schedulability, and this is a one-off task, using worker threads is a good solution for this use case.
Summary
Hopefully this examination of differences and tradeoffs between worker threads and a job/task queue gives you a better idea of which one to choose for you given use case. There are lots of asynchronous patterns in NodeJS and software architecture in general, so having a clearer understanding of some of these patterns will make choosing the right solution from the start easier, and remove headaches later on.
When it comes to Node, mastering asynchronous patterns is one thing.... but using that knowledge to build out a full REST API is a whole other animal.
To help you hit the ground running and avoid wasting time on figuring out what code goes here when you're building your next Express project, I have a standard template I use to structure all my REST API'S - sign up below to receive the template repo and a post explaining in detail what logic goes where within that structure. You'll also receive all my future posts directly to your inbox!
Subscribe for the repo!
No spam ever. Unsubscribe any time.