Practice System Design interview question walkthrough
Tech job interviews are notoriously difficult. You never know what questions they are going to ask you - are they going to ask about obscure language trivia, have you build some UI component while doing a screenshare, ask you some algorithm puzzles that have no relevance to the job day-to-day, or ask you to design an architecture for some theoretical problem?
There are a million things they could ask you, and unfortunately you don't have much control over that. You can look on Glassdoor to see what the company tends to ask, but even then it's a gamble.
Personally I'm not a fan of how most technical interviews are done. Trivia questions aren't relevant and even experienced candidates often forget simple things all the time. Screensharing/livecoding some feature is more realistic, but time-consuming and nerve wracking. And Leetcode-style/algorithm puzzle questions you will almost never encounter in the real programming world, not to mention those can require weeks or months of studying - not something I personally have time for.
The one interview question type that I think is actually relevant are System Design questions. Relevant, because discussing with your future team members different approaches to building out a new feature or system is what you do everyday on the job. Sometimes these interview questions are very formal, where you have a list of requirements, and other times they're more informal, like a back and forth discussion. Something like "we have this current problem we are working through solving, how would you go about it?", and you talk through different options.
Although those are both different styles - and sometimes you're not designing a whole system but instead a slice of it, like a feature - for the sake of conciseness I am calling both of them "System Design".
With the job market being as tough as it is right now, having peace of mind that you have gotten practice in going over these types of questions will go a long way in reducing your pre-interview anxiety.
And if you don't have a ton of experience designing systems/doing architecture work - maybe you have a couple of years experience and now you're looking to move to a more Senior-level role - learning how to go about designing a feature or system is extremely useful.
Maybe you have helped out Architects or Senior developers on your team before, but didn't quite understand why certain decisions were made, or how they even came up with their approach.
No matter what scenario you find yourself in - the key to both is being exposed to lots of patterns and company architectures. Once you have seen enough, the pattern recognition starts to kick in and you can more easily identify which patterns to apply to which problems. While it's always best to gain this experience in the "real world", there is only so much you can learn at any given company, so being exposed to these patterns is key.
With enough exposure to different patterns and architectures, you can ace the System Design interview and/or learn enough to move from junior-mid to senior.
This is a continuous learning process, but a continuous learning process starts with step 1, which is why I am going to cover a system design interview question I received in the "real world" recently when interviewing.
This post will accomplish two main things:
- get practice with an interview question (if you are actively applying/interviewing)
- learn a few patterns/approaches you can apply in your work (if you are just trying to level up design/architecture knowledge)
The interview question
I picked this particular question because in addition to it being one I was recently asked in a real interview, it also:
- is a common question asked in interviews
- is a problem that allows us to go over a few different intersting achitectural topics
The question asked was "how would you go about designing a food delivery service like Grubhub/DoorDash?"
When I mentioned the formal vs. informal distinction before, this one definitely fits in the informal category. Which has some challenges, because we don't have a clearly defined scope up front. But for purposes of this post it's actually helpful because it presents an opportunity to go over how to define and narrow scope, which we will do now.
Step 1: narrowing scope and clarifications
The way I broke down this problem is first identifying the core pieces. You get live updates for where your driver is, so there is a real time component needed here. A restaurant needs to manage their "inventory". The service needs to efficiently route drivers.
Think about how you use an app like Grubhub, and think about how restaurants and drivers might use it. This process will quickly help you realize there are lots of pieces involved:
- users must be able to register
- users must be able to set favorites/bookmark restaurants
- users must be able to setup payment options
- users must be able to search for restaurants
- the service must determine which drivers to allocate to a food pickup and delivery
- the service must serve data to the front-end containing restaurant details
- the service must prevent orders from being made before or after a restaurant's opening/closing hours
- the service must allow the user to track their order status and delivery in real-time
- the service must send text or push notifications when food has been picked up, when it's on the way, and when it has been delivered
- restaurants must be able to specify menu items and prices
- restaurants must be able to mark items as "sold out" (either manually or using a quantity available) and the service must not allow a user to order those
- the service must provide directions for the driver - to the restaurant and to the customer
... and there are probably many more components/features we could think of.
But think of this a "whittling down" from open and abstract, to something clearer and clearer. It's not necessary to go so deep that you are essentially doing requirements gathering, but enough you can identify the high-level building blocks.
For purposes of a 30 minute (maybe 1 hour) long conversation though, going over all the above would take way too much time. A lot of these components are important but kind of standard to all apps. Things like registration, user settings, billing/payments, etc. So we can make an assumption those aren't really important to go over for purposes of an interview question.
But these elements do seem important, because they are relatively unique to a food delivery service when compared to some other app/service:
users must be able to register(Common to most apps)users must be able to set favorites/bookmark restaurants(Common to most apps)users must be able to setup payment options(Common to most apps)- users must be able to search for restaurants
- the service must determine which drivers to allocate to a food pickup and delivery
the service must serve data to the front-end containing restaurant details(Common to most apps)- the service must prevent orders from being made before or after a restaurant's opening/closing hours
- the service must allow the user to track their order status and delivery in real-time
- the service must send text or push notifications when food has been picked up, when it's on the way, and when it has been delivered (Likely less important, given we will discuss real-time updates)
- restaurants must be able to specify menu items and prices (Likely less important)
- restaurants must be able to mark items as "sold out" (either manually or using a quantity available) and the service must not allow a user to order those (Likely less important, and more of an "implementation detail")
- the service must provide directions for the driver - to the restaurant and to the customer (Important but likely too big for current scope)
So in my interview I went over this list (all items) and then asked if the narrowed down scope (everything in bold) was a correct assumption, and they confirmed. Not only is this useful for limiting the discussion, but it also shows the interviewer that you have enough experience to even think about all the pieces potentially involved, even though you won't be discussing all of them.
Coming up with a design
Now we start on the design work. In my interview I didn't go through each component in a specific order, unless they were related to each other.
For purposes of this post, the real-time update component is an interesting one to think through so let's start with that.
Feature - the service must allow the user to track their order status and delivery in real-time
Either WebSockets or SSE (Server-Sent Events) are viable options for handling the real-time aspect of updating the user on their order and delivery status.
A discussion comparing and contrasting these two technologies/patterns is an entire post by itself, but generally speaking here is a comparison:
- WebSockets are bi-directional, meaning both client and server can communicate with each other. Whereas SSE is uni-directional - only the server can send data to the client.
- WebSockets are low latency, because a connection stays open and there is no additional overhead required to re-open connections. SSE connections do automatically reconnect, however.
- WebSockets can maintain millions of connections, but they need stateful scaling. The server needs to keep track of the connection state of each client connected to it. SSE on the other hand, since it uses HTTP which is stateless, can scale more easily.
- For load balancing requests, WebSockets require a proxy that supports WebSockets, whereas SSE's since they just use HTTP, and load balancer will work.
- Some ISP's will block WebSocket connections, so some users may not be able to connect depending on their ISP's rules.
With that in mind, you could use:
- WebSockets:
- for the bi-directional capability of a driver being able to send updates to the server, and the server pushing those updates to the user
- SSE:
- you want to account for users having bad network connections where it drops frequently (SSE's will auto-reconnect)
- the server is the only one that needs to push updates (updates to the user like "Preparing food", "Out for delivery", "Delivered")
In an interview situation, if you have two potential options or solutions, you don't have to necessarily propose just one as the "right" option. For example, you can mention the above two options and discuss tradeoffs between the two. This is actually a good thing to do because most of the job day-to-day is thinking through and discussing tradeoffs.
In our case, you could probably argue for either option. The bi-directionality of WebSockets is likely to be useful, but that comes with more infrastructural and configuarion costs. On the other hand, if you don't need that SSE's are more "lightweight" and you have less to think about when it comes to scaling and handling dropped connections.
One last thing - in theory you could also use short polling, which is where the client makes HTTP requests for status updates every few seconds. The positives of this approach are:
- the implementation is only on the app/client code, instead of also needing server code to support this
- no connection management to keep track of
However, this approach is inefficient for a service and app that works at scale, because now you will have many more requests made to your REST API servers handling these update requests (let's say one request every 5 seconds, which is 12 req/min, times potentially hundreds of thousand of clients making these requests). Short polling is also not real-time, which may not matter in the food delivery case because a user doesn't need precisely real-time. But because of this inefficiency of putting more load on the servers, WebSockets or SSE are a better option.
Interviewer follow-up question
Let's say that you did choose WebSockets. How would you handle reconnecting disconnected clients - OR - if their connection is completely blocked, as by their ISP?
I got this question in my interview, and my answer was that while theire WebSocket connection reconnects you could use a fallback polling until the WebSocket becomes reconnected. The tradeoff there is that you then sacrifice some REST API server efficiency (as we went over in the short polling section just above), but that you then have a good fallback experience for the user, so that they can still receive updates.
Feature - users must be able to search for restaurants
A user is going to want to be able to search by restaurant name, restaurant type, food item, by location, etc.
To support this feature, there are many options out there but Elasticsearch or PostgreSQL "Full Text Search" are good options. Similar to WebSockets vs. SSE, there are tradeoffs with both approaches and neither is the "correct" choice. But there are some clear use cases for when you would want to use one over the other.
Both Postgres and Elasticsearch are very commonly used technologies, so with either choice you won't have to worry about finding developers with lack of knowledge in either or a poorly maintained solution (no development, no bugfixes, etc). While Elasticsearch is a dedicated search solution, Postgres also has search functionality built-in (despite being a dedicated database solution).
Postgres Full Text Search (referred to as "FTS" going forward) gives us some of the features of a search engine, and which we can make use of in designing the restaurant and search feature of the food delivery service. The topic of Postgres FTS could be covered in a post by itself but at a high level, here are the benefits this feature provides:
- Basic search by terms
- Ranking support
- You don't need an additional technology you have to add to your infrastructure and then maintain if you are already using Postgres
The drawbacks are that:
- It doesn't support typo correction
- Fuzzy searching is not as robust as it is with Elasticsearch
- It doesn't support more robust searches (like by location, ratings, delivery time, etc.)
Elasticsearch, on the other hand:
- Requires additional infrastructure to be added to your stack and maintained
- Supports handling typos
- Supports fuzzy searches/matches
- Supports more advanced searching (like by location, ratings, delivery time, etc.)
As with the real-time feature, we again come to tradeoffs. In this case you can propose either. I lean towards Elasticsearch because a user is going to want to search by restaurant name, food, location and get those results back sorted/ranked by restaurant rating, delivery time, etc. Elasticsearch just supports this better than Postgres FTS, which you can do a lot with but is still more basic.
NOTE: Look for a post on Postgres FTS coming in the future!
Interviewer follow-up question
Is there a way you could take a hybrid approach and use both?
One approach would be to start with Postgres FTS and if you run into scaling issues or find there are features you need to add that it doesn't support very well, you can migrate to Elasticsearch. Another approach would be to use FTS for some search categories that are more basic (like a restaurant manager searching for their own menu items to update via their management dashboard), and Elasticsearch for other that are more complex (like user searches).
How would Elasticsearch ingest restaurant data?
The restaurant details - location, hours, menu items, prices, etc - are likely going to be stored in the database via some management/dashboard UI the restaurant uses to set these details. Everytime there is an update, Elasticsearch could then ingest the new data and store it in a format that is designed for searchability.
This conveniently skips over how thinking through how Elasticsearch would know there is an update - for example, would this be accomplished via a cron job? Is there something like a webhook or emitted event from the database that lets Elasticsearch know to ingest the new data? Because this post is already getting kind of long I am going to skip over the details on that, but this is a good practice to answer in more detail on your own if you are preparing for an interview.
Feature - the service must determine which drivers to allocate to a food pickup and delivery
In order to determine which drivers to allocate to pickup and delivery a customer order, we need to know three things:
- pickup address (restaurant location)
- driver's current location
- delivery address
And then the selection algorithm would need to essentially triangulate between those three.
Because this is a System Design question and not a coding/algorithm question, you probably won't be asked what that algorithm would be. But figuring out how to store those pieces of information is relevant to designing a system and, thus, worth talking through.
Fortunately, the pickup and delivery addresses are static, and input by the restaurant and by the custom. So we only need to store those in a restaurants
table and a customers
table (or have some foreign key relationship to an addresses table).
But figuring out and storing the driver's location is more challening, because of the dynamic nature of their coordinates. Because they are pretty much always driving, their location is going to be always changing.
A potential solution to this is Geohashing. Using this technique, you spatially index coordinates of drivers by encoding the latitude and longitude coordinates into a hash.
For example:
Location | Latitude | Longitude | Geohash (Precision 5) |
---|---|---|---|
Restaurant 1 | 40.7128 | -74.0060 | dr5r7 |
Driver 1 | 40.7139 | -74.0075 | dr5r7 |
Driver 2 | 40.7291 | -73.9995 | dr5r8 |
In this example, because the Geohash of Driver 1 and Restaurant 1 match, we know that Driver 1 is closer to the restaurant than Driver 2.
I mentioned above that we already have the restaurant's and customer's addresses. We would need to take those, get the latitude/longitude coordinates (likely using some lookup service), and then store as a Geohash. This would likely happen asynchronously using some queueing system to create a job for this work, since it doesn't need to happen immediately. With each of the three data points we need as a Geohash, we can then triangulate.
We can either store these hashes in a database, or we can use Redis' Geohashing feature to accomplish the storage piece of this. Redis handles the hashing part of it for you, and provides a nice GEOSEARCH
feature that lets you search by a radius. Too much to go into in this post, but you can checkout the link above to learn more about it.
You may be wondering - because the driver's location is dynamic and always updating, how do we handle updating it? The Geohash would therefore also always be changing.
There are a couple approaches to this we could take. One would be for the driver's version of the delivery app to update their location every 30 seconds or so (or whatever timeframe would not overwhelm or Geohash storage solution). Another would be for the driver's app to track when it's gone more than X miles since the last Geohash update, or to even detect when it's sitting idle and to not send updates then. Ultimately, this logic best lives on the client.
And the client needs to make sure it's not overwhelming the server with updates. Fairly easily accomplishable.
This is another potential use case for WebSockets - in this case the driver app would open a WebSocket connection to our delivery service and pipe updates. This would be closer to "real-time" than the suggestions above, but then again, we get back to tradeoffs. How precisely real-time does this need to be, and is the increased load handling of hundreds of thousands (or however many drivers are active) worth it?
Interviewer follow-up question
What if a driver is not active or offline? How can we be sure they aren't selected for a pickup/delivery?
Rather than go over this, I'm going to leave this as a good practice exercise.
Feature - the service must prevent orders from being made before or after a restaurant's opening/closing hours
While in the context of a System Design interview, this leans more towards the "implementation detail"/programming side of things, I think in the context of a food delivery service it's a fairly important part of the system. In my interview I briefly discussed this with the interviewers, so I'm going to go over that here.
An approach you could take here is to store each restaurant’s operating hours in a table, and then for each order validate it against the stored hours before confirming the order. Because this table would be hit often, it's best to index it, most likely on the hours
column, in order to not slow down performance.
The REST endoint for placing an order would:
- get the opening/closing hours of the restaurant from the table
- check if the time the order was placed is greater than the closing hours (or less than the opening hour)
- allow if not, reject if so
On the front-end you would do this same check, and ideally you would already have the restaurant's operating hours fetched so you don't need to make an additional server-side call. But you still need this validation on the back-end too, as someone could be hitting the endpoint directly rather than through the front-end.
And alternative to checking the database would be to store the restaurant hours in a cache, that way you reduce the load on your database (which is going to be serving up lots of other query responses).
Interviewer follow-up question
What if you had multiple orders placed around the same time for an item that is out of stock, or had one item remaining? How would you handle that
I'm going to leave this as a practice question you can work through yourself!
Remaining things we didn't cover
Other things you could be asked about are:
- scaling/load balancing
- infrastructure setup
- security
- caching
- what are things you would cache?
- what are things you wouldn't?
These things tend to come up in interviews but are fairly standard across all architectures, so when you feel comfortable discussing approaches for handling for one particular service/company, you should feel comfortable applying it to others.
Summary
If you have a technical interview coming up, use the scope defined in this post to practice how you would go about designing it, using the patterns and approaches I discussed, and/or come up with your own design.
Instead, if you are looking to learn more about architecture, use the patterns and design discussed to apply to the next project you are working on or as a springboard for learning more about any of the given topics!
There isn’t much content out there that dives into advanced topics for Node.js developers or for front-end developers transitioning to back-end development, which is popular given that front-end devs already know JavaScript. Most resources focus on beginner-level material or generic concepts but leave a gap for those looking to advance their knowledge and/or career.
This post is part of my effort to fill that gap. While it’s not yet part of a formal series, I plan to write more articles like this one in the future, because architecture concepts are part of that "advanced" knowledge.
If you’ve ever felt the pain of not having enough material to guide you through these advanced topics, you’re not alone. My goal is to provide actionable insights and examples that you can apply directly to your work or interview prep.
Stay tuned for more posts like this, and if you’re interested in learning about these topics as I publish them, make sure to subscribe below! I will also send you a repo and article going over how I structure all my Node REST API's - where the code goes, what goes in controllers, what goes elsewhere - because this is something I see people struggle with a lot (which is understandable considering Node and its popular frameworks are not as prescriptive as other langauges and their frameworks).
Subscribe for the explanation and repo!
No spam ever. Unsubscribe any time.