how do you design ubereats [Music] that's good so let me think about the ubereats so there can be many things let me figure out okay so when we talk about ubereats let's start with the requirements so if we think about Uber Eats there will be some kind of restaurant there uh then maybe I mean maybe adding so removing so I'm just thinking through I'm kind of brainstorming right now of what can go in ubereats right now so there are restaurants over there there are customers who buy the like who place in orders then there are orders that they place then there are Dashers or delivery persons so like in doordash there are Dashers there are delivery persons here so in terms of I'm trying to understand the requirements so here when we talk about restaurants there can be like uh maybe we can give a mechanism for them to add remove uh uh like remove we will be doing ourselves but uh maybe updates anything over there then customers the profile that restaurants we created so maybe they can view them they can search search they can do in a variety of ways I guess so one of the common ways is like sorting by distance like which restaurants are near and even the delivery time like I think here it matters because we want to make sure that uh we are getting uh like if you want if you are hungry right now that we are getting food much quicker even if the distance is small or big so it doesn't matter then yeah so let's say for for our V1 here I like how you're considering like what's valuable to all the different kinds of users there's delivery people there's a restaurant there's the actual consumer who's ordering the food so let's say for our example here or our our exercise here um let's go with adding a restaurant to this the system let's go with the consumer being able to view the restaurant okay let's go with the consumer being able to search for restaurants and being able to search by time and search like time to deliver and searching by the distance okay okay sounds good so yeah I think I'll skip uh these parts I mean they can create the orders also but I'll give yeah they can create orders or even Dashers can look for notifications but here I will focus on these parts that you mentioned so like thinking from uh thinking that into account so in terms of like non-functional requirements uh so I I have a quick question uh how can a restaurant be added to the ubereats let's assume that the restaurant has personnel and employees and they go through some process themselves to add themselves to your breeds okay sounds good so I'm assuming then restaurants will have some kind of menu options um yeah so here uh yeah let's assume that there's some standards that we give all the restaurants saying hey if you want to get added to ubereats you've got to follow these standards you've got to upload these assets like images and Json CSV menu items and prices so uh customers or like restaurants themselves they have to do all these things on their own okay sounds good so let me think about the non-functional requirements but now that's uh thinking about it's scalability in general I believe like uh I'll go to the back of the envelope estimation also but in general I'm assuming that we want like Uber Eats is pretty big it's spread across the world so I'm assuming we'll want it scalably it's a scalable solution from consistency perspective usually what we do is like when a restaurant is added or anything changes it doesn't need to be like immediately reflected right that second so I would consider it to be eventually consistency please correct if uh I was fine and then we think about availability I'm assuming that system like you would like to be searching uh restaurants and viewing them you want it to be available you don't want page to crash so kind of a very high availability that we'll think about and in terms of security yes I think we can think about like uh only logged in users like some kind of I am some identity access management system that we will provide then with some login mechanisms and all and even the standard things rate limiting uh the DDOS attacks preventions uh I can assume maybe I can add to the system and in terms of yeah one more thing I think that's I think it's important is the latency because when you you are viewing that uh restaurant page so many times when a customer is viewing they want it to be pretty fast so usually I think what I've seen in my history like it's as fast as like 150 milliseconds uh for viewing uh such kind of pages and even searching should be pretty fast even Google does it uh less than 400 milliseconds uh for most of the queries maybe even faster so search I I'm assuming these should be faster adding I'm assuming that adding can be in general okay because uh so if we talk about restaurants like standard things like if you are adding a menu menus like if you talk about menu has images so if you think about here so menu it will have a lot of images some kind of text or title for each of the item menu item then then it's going to be price so adding images is uploading images might take some time so I'm assuming something like a reasonable number maybe 10 to 15 seconds to upload or maybe one minute to upload and upload images and all so we can even optimize it but let me think about it so what do you think about these non-functional requirements any concerns with that or are we good no concerns here uh these non-functional requirements sound good to me um let's let's move on to the actual design and maybe we can come back to these if there's any questions I have yeah so I think one thing I would like to do is back of the envelope estimation which is super critical to understand how we come up with the design so like uh if we talk think about maybe restaurants it's probably kind of one million I would say maybe to start with and probably 100 restaurants added every day so if you think about it like 100 multiplied by 365 days so around 36. 5 K restaurants and if we think about 20 years so it's like 0. 73 million so it's going to be around 1.
73 million restaurants even after 20 years I don't think it's that huge as such uh but if we I think if you think about customers right so customers there will be a lot of customers ordering from even one restaurant every day and uh there might be some customers that are just viewing and not ordering so I guess probably 100 million is uh some kind of rough estimate that we I can think about and maybe we are adding sorry uh we are adding maybe 10K customers per day so which is around uh 10 000 so 365 0 k um we multiply by 20 so it's like it's going to be huge I as such so around 73 million so 73 million 173 million it's quite a lot so 173 million but I think one thing to notice is I think the restaurants are less in number but still the number of like calls I think view is going to be pretty high so if you think about the view uh views probably around probably around maybe 100 million per day it's these I mean it's normal even the search I would say kind of around same so search out use so maybe I can assume these and move on from um and think about the actual design so yeah that makes sense yeah yeah this scale the scale makes sense and let's I'm curious to see like what this design looks like to support these estimates sure sure so so if uh let me think about so from the data perspective so we have like 1. 73 and like I was here I was let me just uh create a table so I'll just create so if we have let's say a menu uh to menu item probably menu item like I mentioned here so we have price we can have currency oh sorry so we have currency okay yeah I'll just add a box instead I think yeah this works so price currency then image URLs so I would say I think one maybe multiple image URLs so these are some of the fields that I would I'm just creating some kind of database uh kind of data modeling I'm doing where uh it's mainly like maybe menu item id we can say item ID and then item title is important currency price image URL so and if I if I think about restaurant so probably maybe these restaurants can be like mainly their name ID you can say then their address location so maybe zip code country and uh I think we are talking about even ordering from the restaurant so we want even the nearest distance uh from the restaurant to our location so we would need some Geo locations like latitude longitude uh let me think about it more uh like there are some ways we can optimize it all so using geohash uh so let me think about Geo hash in a little bit whether yeah maybe uh high from a higher level uh explanation what is geohash so in terms of so Geo so if you think about latitude and longitude right there if you think about world in the world I think it's like three days to power 64 combinations of latitude and language longitude so it's a lot I mean every and if you think about decimals like latitude can be 41. 0001 and all so the geohash is a way to divide uh our world map our world map into different grids of a decent size so that is so that it can be used for like so so that we can use it for optimization let me let me explain how jio hash works like here uh let me just go and uh get word map and let me show what it looks like so I'll just yeah let's say I'll just copy this over copy image and explain what I'm trying to say so so the way it is like we have this world map and the way geohash works is it divides world into grids so and assigns Unicode characters sorry assigns binary values so of different sections so like I divided into four sections now what do you hash with Geo has what we can do is we can just add here so this is zero zero this is 0 1 this is uh this is one one and this is one zero so I can continue subdividing this one word like I can divide here then this this will be zero one zero zero zero one so you just keep making smaller smaller grids yeah making it smaller and smaller because let's say if we think about um our home or even a shopping complex right a shopping complex time to deliver from one shop to another like whether we are delivering from the first shop or the 10th in that same line it's almost going to be very same like very similar time that it will take so what Geo has does is it it is a way to group locations together for optimization purposes so like here um it we continuously subdivide subdivide and ultimately at every one by 30 second like if we divide into 30 32 grids here so after every like here if you think about it so this will become uh I would just type here it will become 0 1 0 0 uh like 0 0 and kind of zero so it divides it into like such kind of sections and then at every level so jio has level one is the whole Earth whole map whole world jio has level two is one by Thirty second one with one by 32 which is our base 32 of this this binary values so what this can encode to like if we think about base 32 right so base 32 mapping we can have uh here sorry I'll just add text okay maybe I'll add a table here okay here so if you think about 0 1 0 0 0 can be encoded to eight so with base 32 what happens is base 32 is mainly 32 like 32 characters kind of uh what do you say like zero two through nine there are multiple ways to encode it so 0 through 9 then a to kind of Z I would say zero to seminals some people encode like this so you have eight values here and you have 26.
in fact I mean sometimes it's zero to five so there are ways I think geohash how it does it I don't exactly remember what exact uh Unicode what exact value does it map to but uh essentially those are mainly 32 uh variants interesting so so it encodes it into like base 32 so level one it's you can say eight and now if we continue subdividing if you think about San Francisco it will come out to be like if if you think about entire San Francisco maybe something like this uh level maybe five level five I yeah I'm not sure but we'll have to check so there's some optimizations you can use by subdividing like the world into smaller smaller grids and you can compare these values to understand proximity right that's right okay cool so like I I don't want to spend too much time on this I think I understand I can go deeper here but uh let's assume let's make the assumption that we're using geohashing here um I love that we're able to Deep dive here if we wanted to but let's zoom out and think about like this this actual system um so let's pick off uh pick up uh yeah like here like where we we said that we'll use geohashing yeah yeah we might use uh let me think about it uh exactly I'll share the pros and cons but let me first complete the data modeling part sure let me have this customers also so if we think about a customer so customer will have the same kind of ID name then address zip country we can have latitude longitude and uh there can be I mean it's for restaurants we'll have menu also so menu can have a list of menu items we can have a list of menu items or a kind of we can have menu items or we can create another table if you are using relational database we can have restaurant Rd and menu item id as separate one uh so let's try so we can have so but let me think about uh relational versus no SQL here first so if we think about like 1. 73 million it's really not a big number so and I know the number of calls is pretty high so I think we are better off using kind of relational database for this one with multiple read replicas so here I think if we think about database uh data is sorry icon I think database so here this can be the what you call maybe relational DB with the read replicas so read replicas means the data whatever we write over here it's replicated to the read replicas the other database machines that can be used for read purposes so that helps with scaling but I don't think that will work from the customer perspective because customers can be huge we have 173 million and for that we will need sharding uh so this one is restaurant for customers yes we will have to use sharding over there to scale it up and with sharding we can do by customer ID as such it should be fine uh I think from database Choice perspective I think we can use the nosql databases like Cassandra or uh other nosql system so I would say here here so customer tables mainly kind of let's start some nosql database we have that supports automatic sharding as well so now let me think about the user experience part so if we think from the user perspective so let's start with user here to user so I would create from from this experience perspective so if a user let's say if a restaurant admin owners and they have their personnels who are going to add restaurants so we'll have I think the way I design Services is mostly I create some kind of experience layer first so that experienced layer I create because there because we might want to expand to other platforms like native iOS Android or M web mobile web desktop web so getting some kind of centralized place to orchestrate uh different Services I think right now we are focusing on a simple service but later on orchestration becomes very important between different services and orchestration and even shared back-end kind of like shared backend UI some kind of central UI elements that we can share so I kind of create that and then I'll create some kind of service for restaurant so mainly one service so whenever I use I think one thing to note is whenever I talk about any Services then I'm assuming that load balancer is on top of it so because I'm talking about a cluster of machines multiple machines and a load balancer will need to uh decide which machine is having lesser traffic or go round robin way so it can choose the protocol that it wants to follow so now before adding the restaurant data to the applicants I think we are assuming that there will be some kind of menu with text and images so images I think images sending it over storing here probably is going to be 4K I think so for images we tend to use uh it's good to use S3 or those kind of object storage which can store a lot more data and it's efficient over there so for that I'll create an image service mainly here which user will just call that image service to upload the images we can do parallelization over there like I'm currently uploading multiple images and then it will go through uh since images we want to make sure that we are storing quality images so I we would like to go it to different like I mean some kind of moderation policy as well there are multiple models machine learning models that we can create to moderate such kind of images so let's say image moderation uh ml API I can create and that goes through models that have high precision and Recon so some kind of yeah python We There are ways to like python models ml models that are already trained and maybe with high Precision recall so what they will do is though they will use some classification algorithm to figure out whether that image is really bad or not safe for work or whether that contains some kind of profanity image that we don't want to show to the public so and here here how do you define precision and recall so in terms of precision so when we think about uh like bad images so what Precision means is like the total number of images that we detected as bad how many of them are really bad so let's say we detected 10 images are bad and out of that nine images are bad one is not so it's like 0. 9 then like something like that and recall means out of all the bad images how many images are detected by the system so like Precision is mainly uh two positive divided by true positive plus uh false positives so mainly all the positives that were detected how many were really bad and recall is mainly two positives by two positives plus false negatives how do you figure out yeah go ahead go ahead yeah I'm just asking does it answer your question this does and uh for all all the bad images uh how would you know like how many were detected so that one we will run through the models like uh these are the clean models so the model training happens it takes a while you you have so many samples you can get data from kaggle you can get data from external sources or you can even train models using your internal images like would the all the images that we have so we'll have some kind of training data we'll run create models some classification model train them and then run it against some of the test data using the test data we will be able to figure out what the Precision and recall values are how how is that model performing against the test data okay um and what if we're running like our test our tests against test data and we find that our current machine learning infrastructure is just too slow and uh what what changes would you make here yeah yeah so if we think about like machine so training definitely takes a while and sometimes people definitely use I mean gpus instead of CPUs because they are more efficient so for testing like when we let's say when we are providing a new image for moderation purposes we can check whether it's like maybe we can think about other optimizations here in that moderation API maybe I mean if the moderation API is low we can think about adding scaling it up maybe adding more machines or we whether it's I mean even maybe multiple cores we can think about or maybe we can think about GPU it really depends upon what is the bottleneck what is slowing it down but assuming that we are not able to make all those changes if we are not able to make all those changes I wouldn't uh let users suffer due to slowness of this moderation API one of the option that we have is uh maybe we can just store this data like image service can just upload and store it in the database and will maintain some kind of flag and then when you upload these images so let me see here so when we have this S3 right so here let me see here so after these uh I think it's taking some space let me just move this down so S3 so after uh all these I I mean ideally I would be just throwing it into S3 it will be getting a grid so it will be stored in getting a grid and that grid I'll pass over to get it stored over in this restaurant stable with the image Goods I I would say Goods I wouldn't store the image URLs I would say image grades that are IDs for the images so but if these image like machine learning infrastructure is slow then maybe I'll just store these rather than running them through the machine learning infrastructure and after they are stored here then I'll drop an event some kind of event let's see so restaurant service will drop some kind of event into an offline consumer like maybe some kind of Kafka Cube and that queue like publisher I mean it publishes uh so it will just publish that event and there will be some kind of consumer we can save a Kafka consumer that we'll be listening to the events in this queue and it will be taking over and then running this moderation API and when it runs this moderation API it gets the it gets to know whether the moderation is like the classification is saying whether the image is bad or not it can update a flag in the restaurant table I would say here maybe instead of image Goods I can share some kind of image yeah image metadata here this is menu item right so we can even share store some kind of Json blob here um okay shows the metadata as well as you mean wait so it can hide it with bonds so that we can do if uh the machine if you're not able to optimize this image moderation I think it will be needed even for search like I can go over the search design as well sure sure let's do that real quick yeah so here hey I think it's so one is viewing part so if we are talking about viewing the restaurant so it will go through the view uh here experience layer then go to the view I think one thing is important for the view purposes is since the SLA is pretty uh like it's pretty it's low I would say 150 milliseconds maybe I think we might want to leverage some kind of cash uh and where we can in that cache we can store because the menu is menu doesn't change that often so retrieving the menu uh retrieving other details we can store in the cache with maybe lru or some kind of mechanism even we can compress it if the storage is an issue and then you use it in if then use the data from restaurant if it is not found in the cache but from the search perspective what we can do is like it will call a search service let's say and uh if we think about the if you think about just distance based search like which restaurants are nearby so I think elasticsearch is a good approach elasticsearch it provides different mechanisms like different Geo queries we can even store like here what I would do is uh I would store Maybe the all the restaurants coordinates here so whenever let's say whenever something is to be delivered to me so I have my own point like I have my own coordinates I can run a jio query against logic search and figure out which restaurants are nearby like uh that I can do let me go over the query in a little bit so and is this a loss of search based on the geohash here elasticsearch is it used to be based on geohash now what they are using is some kind of uh block bkd which is block K dimensional treatment so yeah let me explain this further so geohash I think one of the ways that we are talking about is elasticsearch another alternative we could have done is we were storing latitude and longitude here and maybe I mean uh if we had used if we had not used elasticsearch the way to uh you use would be like select star from restaurant where latitude is less than is kind of less than some user input so input latitude Plus yeah input latitude minus let's say let's say if we want 20 miles or 20 kilometers so and latitude is greater than input that I would say it plus 20 kilometers minus 20.
so something like that and same with longitude so this would have been pretty slow even if we index it the way Geo hash would help here is maybe some kind of like we have jio hashed like uh will have like here I mentioned right San Francisco some kind of 8 c8s so 8c8 S1 percentage so here like after that five divisions whatever in that grade whatever uh restaurants lie we can get those restaurants from here but here elasticsearch now it's use some kind of blockade a Ministry which is an enhanced version a multi-dimensional version of binary search tree so and that's how it optimizes got it okay cool um thanks for for uh walking through that and answering my follow-up questions let's say we have five minutes left in this mock interview um how would you wrap things up and can you give me a quick summary on just like how the user interacts with all the services to get a food order yeah I think I'll just quickly cover this isochrons part I think one thing I didn't mention is about uh getting a list of restaurants based on their time like based on their time of delivery I think for that isochrons really help maybe just like high level like talk through it like we don't need maybe a diagram for this part okay so isochrons is like I mean for a given point like for me I'm staying here anything any restaurants which can be delivering within let's say next 30 minutes so I'll have uh some kind of diagram I'll map a kind of a great uh not a great but some kind of curve it's some kind of polygon that I can create uh based on the time to deliver for my particular location so when I have that um that think cast like cast and maybe dynamodb or somewhere so that search service can just get that isochrome get that polygon and compare with the restaurant coordinates so it will be the elasticsearch query that will be querying for that particular polygon for for n so it will be returning all the restaurants that can be reached to me uh like that can deliver to me within let's say next one hour something like that so that's the basic idea of isochromes mainly and uh yeah for caching purposes I think uh since we cannot cash each and every coordinate like I mentioned what I would do is maybe that isochrons I can create Geo hash for each maybe level 7 geohash I can use because that is just uh level 7 Geo hash is uh kind of like I think 150 meters across 150 meters which will cover an entire block and that block I can cover for that block it will be I think probably 1 billion entries but I can even optimize it further like removing water uh we don't want any ocean in there like any coordinates ready to Ocean and many other Hills and all so uh that can be optimized and uh with level 7 Geo hash mapping to the the different isochrons we can just get the restaurants pretty easily pretty fast so in chart I mean if we have to summarize this so yeah so we have this adding it has its own uh challenges in terms of like scaling up the system I think customers I didn't care I didn't add over here but yes customers is going to be mainly sharded uh and just scaled that way with Cassandra then we have another database for elastic cells some cache for optimizations here here we'll have some cash and multiple Services search I had viewing service then some kind of restaurant services so that's how it is I think I could clean this diagram up a little bit but yeah in short that's a summary awesome great thank you Niraj for this I think this is a very in-depth interview um I loved how from the beginning you gave an overview of different ways we could design ubereats right like you didn't Focus purely on one user you would consider it for also the restaurants and the delivery people the that was excellent and we were able to get like a broad overview on how you would design this but I loved how also you were able to Deep dive into very specific parts of your design if you needed to I think that's also very valuable to be able to show um so this wraps up our mock interview here um I think this was a very excellent example of a typical system design interview that software Engineers face and Engineering managers face in technical rounds and perhaps even product managers applying for more technical roles and hopefully this was helpful for you as an audience and if you have an upcoming interview good luck and stay tuned to exponent for more mock interviews thank you thanks so much for watching don't forget to hit the like And subscribe buttons below to let us know that this video is valuable for you and of course check out hundreds more videos just like this at try exponent.