Ro for the [Music] over the last two years more than 400,000 people have been laid off and data scientist role were affected so why should you become a data scientist in 2024 this is what somebody would tell you to discourage you [Music] from data scientist yes data scientist roles were affected by the layoff but when you zoom out and look at the bigger picture you will see less than 10% of the roles that were laid off belong to data scientists 28% belong to hrn recruiting and 22% software engineering yes there were more software Engineers laid
off than the scientists yes the job market is still soft and it will take some time to catch up but this also makes it a good time to start learning data science in this video I'm going to be sharing a complete [Music] road map that you can use to learn data science and land a job so what does a data scientist do a data scientist take Mak the data turns it into powerful information that helps businesses succeed now this is a very simple definition and after watching this video you will actually have a really good
understanding what this definition means and if somebody were to come to you and ask you what does a data scientist do you'll be able to explain it a typical data scientist project involves a few steps it typically starts with data collection data processing exploratory data analysis short for Eda feature engineering model development model evaluation model deployment and iteration so before we jump into the road map there's one exercise that I want all of you to do I want you to research the data science and look at all the roles that are under the data science
umbrella there is typically a few roles that stand out under the data science job family data scientist data analyst machine learning engineer AI specialist data engineer analytics manager research scientist and many more one of the mistakes that I made when I started learning data science is I only focused on the data scientist role and back then I didn't know that so many other job roles exist in the data science space and I could have considered it if I had known about it so what I would like you to do is become familiar with all the
roles that are available and figure out what would you enjoy doing what do your target looks like before you actually jump into the data scientist role for itself because and it will take time so you don't want to waste your time you want to make sure like you are fully dedicated to learning data science before you commit to it because it will take a lot of hard work let's say you have figured out you want to become a data scientist generalist so the next thing I want you to do I want you to narrow down
the companies and the roles that you would be interested in then I want you to go to their career website let's say you want to work at meta as a data scientist I want you to go to meta's career website and look at data scientist role look at the job description and understand what type of candidate they are looking for what skill set they are asking for what hard skills and soft skills that they require is it a lot of experimentation is it a lot of machine learning cuz that will give you a better idea
of what you need to learn and it will help you define your road map and customize it take notes of all of this then I want you to go on LinkedIn and search for people who are already working as a data scientist at meta look at their profile look at their education history look at the type of work that they're doing and take note again because we're going to be using this throughout the road map so let's say you have narrowed down that data scientist role is what you want to pursue you have also done
your research and you're ready to go a lot of data scientist road maps tell you to start learning python as the first thing but I completely disagree with this advice here's why coding including Python and SQL are tools to apply data science they're not data science by themselves I'm going to say say this again coding is a way to apply data science it's not data science by itself what data science actually is is statistics and machine learning so that's the first skill I want you to learn because let's say if you start learning statistics and
machine learning and you realize like this is not for you then I would want you to stop and not pursue it further because in order to be successful as a data scientist you need to have the basic fundamental code learning coding is a tool to apply statistics and machine learning so what I want you to do is start building your foundation with Statistics and machine learning for statistics I would recommend that you learn descriptive and inferential statistics so descriptive statistics is basically understanding what the data looks like and there are several methods that you need
to understand in order to learn descriptive statistics it includes like measurement of central tendency basically mean median mod measurement of variability such as like standard deviation variance frequency distribution measurement of shape how skewed the data is and graphical summaries basically bar plots and so on the next I want you to learn inferential statistics and this is where it starts becoming fun learn probability distribution hypothesis testing one of my favorite subjects fasion versus frequen statistics sampling time series analysis experiment design multivariant analysis Nova survival analysis bootstrap resampling so these are some of the concept that you
need to know in order to have your Basics done in statistic next we're going to focus on machine learning machine learning divide the subjects into two Focus area one is supervised machine learning and then the next is unsupervised machine learning so the first we're going to focus on supervised learning for supervised learning we're going to focus on linear regression and logistic regression make sure you know these in and out because these two models coming up in an interview is very very very common continuing on the supervised learning then there is decision trees random Forest neural
network deep learning now again if you have done the research and looked at the role you would understand how much machine learning is required so you might not need to go as in depth into these machine learning models depending on the job family or depending on the company and the role that you're targeting but again if you want to do a lot of machine learning then you definitely should learn all of these things under unsupervised learning focused on kayans clustering hierarchal clustering principal component analysis this one specifically and C clustering I've seen quite a bit
come up in interviews along with these topics understand these Concepts well there are several online sources that you can use to learn statistics and machine learning for statistics I specifically love using KH Academy as well as a YouTube channel called stat Quest that you can watch videos and learn statistics there's several books that you can read and also use Chad and Google where you don't understand one of my favorite things with Chad is that I ask it to explain complex subjects as if it was explaining to a child and it does a really really good
job some of these subjects can can be difficult to understand and so use all the resources at your hands and to get a solid understanding of all the statistics and machine learning so let's say you figured out your fundamentals now it's time to apply these skills remember I said coding is a tool to apply statistics and machine learning so now that you know statistics and machine learning you're going to apply it and this is where you need to learn coding so you can apply the skills that you have learned so for coding hands down Learn
Python and SQL when I started learning data science I started with r I worked with r for two years and then I switched to python one of the things my team realized early on well not early on after 2 years of doing this that it was really hard to communicate with engineering team taking our R code and then deploying it was very difficult and it increased our project machine learning lifespan anyways I have covered this topic in a different video you can watch this video to see why I prefer python over r or python you
should understand how to do data analysis in Python no librar such as pandas numai matplot lab for machine learning you should know python libraries such as pyit learn tensor flow Pyar depending on the model that you're using there's so many available online so many resources online where you can learn python the next coding language that you should learn is SQL you will be using SQL quite a bit when you are accessing the data because most of the times the data is sitting in a database so understand how to write SQL know how to join the
data set that you should know all the joints Left Right outer inner select statement how to filter your data how to do aggregate functions and how to do subqueries how to do Advanced analytics function so by now you have statistics and machine learning covered you have coding you know Python and SQL the next thing you're going to focus on is learning the tools so for tools what I would recommend is that you understand how to work with notebooks such as duper notebook and Google collab notebooks because chances are that you're going to be using those
for doing your data science work writing your code there and doing most of your data science project there understand how the [Music] code review process works because the chances are that at your job you will be following the code riew process so learn all the git commands and understand it fully how it works and how you can push the code how you can pull the code how you can deploy the code and things like that and obviously these tools would vary depending on the company that you end up so I'm not trying to focus too
much on this right now because you will eventually end up learning more tools as you start doing your work as a data scientist let's say you don't want to become a data scientist and you want to become a data analyst in that case I wanted to share this data analytics for for [Music] cont for for for for and machine learning fundamentals done you know coding you know the tools now I want you to spend some time learning the business fundamentals and product fundamentals this step is specifically important because what differentiates a good data scientist from
a great data scientist is having a solid business understanding having a solid product understanding so you can build solution that is helping the bottom line of the business and the product understand how the product thinks how the product development life cycle works and how you will plug in as a data scientists into these products and those business areas to help them grow and succeed because knowing having a good business and product understanding will make you a superstar data scientist and on the topic of Superstar you need to also have solid communication skills because the chances
are that when you do your project at the start and at the end of the project you will be communicating with business conord will be communicating with engineering you'll be communicating with product and many other job families so for you to be a great data scientist you need to have a really good communication skills where you can communicate to a tech audience and a non- tech audience so get really good practice whether that is doing in a presentation form or doing it in a written form having good communication skills will help you grow in your
career way further than somebody who does not have these communication skills and we're going to practice this when we build the project portfolio so let's say you have the basics done you know statistics and machine learning you can solve problems by applying these fundamentals using the tools that we mentioned you have a good communication you can understand how product and business think now I want you to apply these skills by doing projects this is going to help you in two ways it's going to help solidify your knowledge and second it's going to help you stand
out on your resume when recruiters are looking on it five is typically a good number of projects that you can focus on you can do more or less just make sure that they're representing the skills that you know remember we took notes at the start of the video where you looked at the job description and you looked at the people who are already working there this will give you a good idea what type of work they are looking for what type of work these people have done to help them get a job at your target
company which in this case is matter so now you'll figure out what kind kind of project you're going to do for example one of the things that stood out to you is was probably a lot of experimentation and AB testing so one of the projects that you will do here is around hypothesis testing including experimentation design doing the analysis and then presenting insights and recommendation from that hypothesis testing the second project let's say you can focus on time series forecasting third project could be customer fraud detection this is a classification problem the fourth project let's
say you are building you build a recommender system the fourth project I guess we in the fourth project you can do some Eda exploratory data analysis and do some feature engineering so build your project portfolio in a way that shows unique skill set in each project so a recruiter reading your resume can tell that okay you know all of these things if you're looking for data set there's so many free data sets available on websites such as kaggle and Google data search you can find free data set there and start doing your [Music] projects for
we're not done yet in order to land the job let's say you've done the projects and it's helping you get the calls for the interviews now getting the call for the interviews is not enough you actually need to sit down in the interview and pass the interview in order to get the job so this is why I want you to do the interview prep responding in an interview setting in front of a stranger in a time constraint setting can be nerve-racked I have personally blanked out in interviews on Concepts that I knew very very well
so this is why getting practice on these interview questions it's very important to for data scientist interview I typically like to divide it into three segments one is behavioral second is coding and then third is fundamentals on statistics and machine learning under statistics and machine learning a lot of case study Focus Focus interview prep for example they can give you a problem in the interview they can say we want to improve our customer turn rate tell me what exactly you will do they obviously will not give you very clear problem like I just gave they
will give you a lot vague problem they would expect you to define the problem and then solve it using a solution that you would have to pick and you will have to tell the pro and cons and uh show why you actually picked that solution in my opinion A lot of the time people end up focusing on coding which is fine but you also need to have a solid understanding of how to solve these case studies for these specific data science Concepts that are the core skill set of a to me notice in this video
I did not give you a timeline I didn't say that you can learn the skill in 3 months or 6 months every every person has unique needs and unique abilities it might take you 6 months to do this but for someone else it might take them five months or it might take them 12 months so take your time learn these skills m