Over 80% of the companies worldwide are unable to find data scientists and AI professionals to bring their ideas into the market and to become more competitive in the next decade the demand for data science and ml professionals is only going to increase as the market in the data science and AI is projected to pass the 400 billion valuation about 95% of all employees worldwide are asking their employers to Get training in the field of G inative Ai and machine learning and if this didn't convince you to get into this lucrative and highly demanded industry then
let me tell you that the salaries of data science and AI professionals are at the moment about the 150 up to $200,000 in us and in some cases the salaries of most in demand Ai and machine learning professionals can pass the $500,000 us welcome to this involved crash course in machine learning and Data science in this 11 hours you will get a comprehensive overview of the machine learning and data science from different perspective both the theory practice implementation career insights and what you can expect from this career this will be a great course for anyone
who wants to become a machine learning engineer or AI engineer so here's what we are going to cover as part of this comprehensive Crush course in machine learning so we are going to start with The the machine learning roadmap for 2024 here we are going to provide you a structured overview of the machine learning landscape helping you understand what it's like to become a machine learning engineer what you can expect from this career what exactly you need to learn what kind of skill sets from what kind of Industries and also you are going to see
what kind of career directions you can take in the field of machine learning so how you can get into Machine learning how you can Kickstart a Creer here and what is that that you need to learn after that we are going to get into the to machine learning algorithms so here you will learn the most important machine learning algorithms from linear regression to Advanced algorithms like the boosting algorithms of course this won't be your comprehensive machine learning course because this is aimed to provide you the basics and the fundamentals but this Would be a great
starting point for you to get a taste what it's like to learn the theory of machine learning you will learn the theory you will also learn the definitions the pros and the cons of these algorithms along with the Practical python implementation so this will be great way to learn the basics and to also learn how to implement this in Python of course as a prerequisite for this it is required that you know basics in Python like how to create Lists how to work with pyit learn or how to create variables so for this this is
imported next up we have the handson case studies so after learning the basics in machine learning in terms of the theory and implementation in Python with do real examples you are ready to get into a Hands-On machine learning work and this won't be just quick case studies that you can complete in 30 minutes but rather those will be involved three different case studies so We will start with the basics like performing a behavior analysis and data analytics which is always a must when it comes to becoming machine learning or AI Engineers so you will learn
how to perform data analytics how to perform customer segmentation using python how to perform data wrangling how to do exploratory data analysis all in Python and then to make those important conclusions and tell your data story this is really important a AI Professional to know data science and data analytics so this first case study which is the superstore customer Behavior Analysis that will be conducted and presented to you by vah asan co-founder of L Tech that will provide you a good insights into the basics of machine learning and how to do a data analytics and
data science in a real life case study in the second case study we will then get more Hands-On with machine Learning and uh we will be predicting the Californian house prices we will do expiratory data analysis we will use Python to clean the data use statistics to perform outl detection data visualization we will also perform causal analysis and we will be using linear regression to perform the predictions by leveraging practical data analytics but this time also data science skills and combining this with python Li like psychic learn and the Third case study will be about
building a movie recommender system so here we will explore the NLP the natural language for processing another very important topic in the field of AI and machine learning these days and here we will be using NLP we will use also machine learning data science to to develop a recommended algorithm so this project will then enhance your skills in the text Data analysis how to process this text Data how to use Python for Doing that as well as practical machine learning applications like building a recommander system keep in mind that you can also put this case
studies on your resume to Showcase your experience after we are done with this three end to endend involved case studies we are going to provide you career insights now as a data science Andi professional you have two choices you can either decide to get into the corporate world so become a data science entist or a Professional or you can decide to build your own startup and to provide you information on both of these directions in the first conversation you will join me and the data science manager from Aliens Cornelius where you can learn from him
how to break into the field into the field of data science and machine learning especially from nontraditional background here you can get lot of tips on succeeding in this field how to get promoted What you expect from interviews what is like that selection process and much more about data sites and AI corporate career so once we are done with that conversation then we will provide you the next choice which is about the building of a startup as a machine learning or AI professional so here you can then listen to the conversation between co-founder of lunar
Tech vah asan and a Serial entrepreneur and a successful investor Adam coffee so here Adam coffee will be then provide you a lot of insights on how to learn to Startup how to raise funds what to expect from this uh type of career so once we are done with all this with this career insight as well we will then get into the final part of this course which is about interview preparation we will conclude with providing you the most popular machine learning interview questions with a corresponding detailed answer This will be great for anyone who
wants to Ace their interviews and who is now preparing for machine learning or AI interviews this crash course for 11 hours is more than just a short introduction is an involved comprehensive over you of everything that you can expect from the world of machine learning and AI want to become uh more handson and get the entire comprehensive overview and learn everything in one place to become a job Ready machine learning and AI professional then make sure to check the lunch. our data science boot camps and many other courses that will provide you that all inone
approach to become a job ready professional if you like this video make sure to like subscribe and comment so if you ready I'm really excited let's get started hi there in this video we are going to talk about how you can get into machine learning in 2024 first we are going to start with all the skills that you need in order to get into machine learning step byst step what are the topics that you need to cover and what are the topics that you need to study in order to get into machine learning we are
going to talk about what is machine learning then we are going to cover step by step what are the exact topics and the skills that you need in order to become a machine learning researcher or just get into Machine learning then we're going to cover the type of exact projects you can complete so examples of portfolio projects in order to put it on your resume and to start to apply for machine learning related jobs and then we are going to also talk about the type of industries that you can get into once you have all
the skills and you want to get into machine learning so the exact career path and what kind of business titles are Usually related to machine learning we are also going to talk about the average salary that you can expect for each of those different machine learning related positions at the end of this video you are going to know what exactly machine learning is where is it used what kind of skills are there that you need in order to get into machine learning in 2024 and what kind of career path with what kind of compensation you
can expect with the corresponding business titles When you want to start your career in machine learning I'm DF co-founder of lunar Tech and I come from econometrical and statistical background I have been in the tech field and specifically in data science and AI for The Last 5 Years working across different data science and AI projects across the globe and now I'm going to tell you what exactly machine learning is and what are the skill sets that you need in order to get into machine learning in 2024 so without Further Ado let's get started so what
is machine learning machine learning is a brand of artificial intelligence of AI that helps to uh build models based on the data and then learn from this data in order to make different decisions so we will first start with the definition of machine learning what machine learning is and what are the different sorts of applications of machine learning that you most likely have heard of but you didn't know that it was based On machine learning so machine learning is a brand of artificial intelligence that is uh using data in order to uh learn from this
data by using different sorts of algorithms and it's being used across different Industries uh starting from Healthcare till entertainment in order to improve uh the customer uh experience custom identify customer behavior um improve the sales for the businesses uh and it also helps um governments to make decisions so it Really has a wide range of applications so let's start with the healthare for instance machine learning is being used in the healthcare to help with the uh diagnosis of diseases it can help to uh diagnose cancer uh during the co it helped many hospitals to identify
whether people are getting more uh severe side effects or they are getting P pneumonia um based on those pictures and that was all bed based on machine learning and specifically computer Vision uh in the healthcare is also being used for drug Discovery is being used for personalized medicine for personalizing treatment plans to improve the operations of the hospitals to understand what is the amount of uh people and uh patients that hospital can expect in each of those uh uh days per week and also to estimate the amount of doctors that need to be available the
amount of uh people uh that the hospital can expect in the uh emergency room Based on the day or the time of the day and this is basically not a machine learning application then we have uh machine learning in finance machine learning is being largely used in finance for different applications starting from fraud detection in credit cards or in other sorts of banking operations um it's also being used in trading uh with specifically in combination with quantitative Finance to help traders to make decisions where They need to go short or long into different stocks or
bonds or different assets just in general to estimate the price that those talks will have an Assets in the real time in the most accurate way uh it's also being used in uh retail uh it helps to understand an estimate demand for certain products in certain warehouses it also helped you understand what is the most appropriate or closest uh uh warehouses is that the items for that corresponding customer Should be shipped so it's optimizing the operations it's also being used to build different Dr Commander systems and search engines like the famous Amazon is doing so
every time when you go to Amazon and you are searching for project or product you will most likely see many article recommenders and that's based on machine learning because Amazon is uh Gathering the data and comparing your behavior So based on what you have bought with based on what you are Searching uh to other customers and those items to other items in order to understand what are the items that you will most likely will be interested in and eventually will buy it and that's exactly based on machine learning and specifically different sorts of recommended system
algorithm and then we have uh marketing where machine learning is being heavily used because this can help to understand uh what are these different tactics and Specific targeting uh groups that you belong and how retailers can Target you uh in order to reduce their marketing cost and to result in high conversion rates so to ensure that you buy their product then we have machine learning in autonomous vehicles that's based on machine learning and specifically uh deep learning applications uh and then we have also um uh natural language Pro processing which is highly related to the
famous Chad gbt I'm sure you using it and that's based on the machine learning and specifically the large language models so the Transformers large Lang Which models where you will going and providing your text and then question and the chat GPT will provide ANW to you or in fact any other uh virtual assistant or chat boats those are all based on machine learning and then we have also uh smart home devices so Alexa is based on machine learning also in agriculture uh Machine learning is being used heavily these days to estimate what the weather conditions
will be uh to understand what will be the uh production of different plants uh what will be the um outcome of this uh to understand and to make decisions uh also how they can optimize those uh crop uh yields to monitor uh soil health and for different sorts of applications that can just in general uh improve the uh revenue for the farmers then we have of course in the Entertainment so the Vivid example is Netflix that uses the uh data uh that you are providing uh related to the movies and also based on what kind
of movies you are watching Netflix is uh building this super smart recommender system to recommend you movies that you most likely will be interested in and you will also like it so in all this machine learning is being used and it's actually super powerful topic and super powerful uh field to get into and in the Upcoming 10 years this is only going to grow so if you have made that decision or you are about to make that decision to get into machine learning continue watching this video because I'm going to tell you exactly what kind
of skills you need and what kind of uh practice projects you can complete in order to get into machine learning in 2024 so you first just need to start with mathematics you also need to know python you also need to know statistics You will need to know machine learning and you will need to know some NLP to get into machine learning so let's now unpack each of those skill sense so independent the type of machine learning you are going to do you need to know mathematics and specifically you need to know linear algebra so you
need to know what is matrix multiplication what are the vectors matrices do product you need to know how you can uh multiply those different matrices Matrix with vectors What are these different rules the dimensions also what does it mean to transform a matrix the inverse of the Matrix identity Matrix diagonal matrix uh those are all Concepts as part of linear algebra that you need to know as part of your mathematical skill set in order to understand those different machine learning algorithms then as part of your mathematics you also need to know calculus and specifically differential
Theory so you need to know these different theorem such as chain rule the rule of uh differentiating when you have sum of instances when you have constant multiply with an instance when you have um uh sum but also subtraction division multiplication of two items and then you need to take the uh derivative of that what is this idea of derivative what is the idea of partial derivative what is the idea of haian so first order derivative second order derivative and It would be also great to know a basic integration Theory so we have differentiation and
the opposite of it is integration Theory so this is kind of basic you don't need to know uh too much when it comes to calculus but those are basic things that you need to know uh in order to succeed in machine learning uh then the next Concepts uh such as discrete mathematics so you need to know uh what is this idea of uh graph Theory uh what are this uh combinations Combinators uh what is uh this idea of complexity which is important when you want to become a machine learning engineer because you need to understand
what is this Big O notation so you need to understand what is this complexity of uh n squ complexity of n complexity of n log n um and about that you need to know uh some basic uh mathematics when it comes which comes from usually high school so you need to know multiplication division you need to Understand uh multiplying uh uh amounts which are within the parentheses you need to understand um different symbols that represent mathematical um values you need to know this idea of using X's y uh and then what is x s what
is y^ 2 What is X to the^ 3 so exponents of the different variables then you need to know what is logarithm what is logarithm at the base of two what is logarithm at the base of e and then at the base of 10 uh what is the idea of e so what is the Idea of Pi uh what is this idea of uh exponent logarithm and how does those uh transform when it comes to taking derivative of the logarithm taking the derivative of the uh exponent those are all values and all uh topics that
are actually quite basic they might sound complicated but they are actually not so if someone explains you uh clearly then you will definitely understand it from the first goal and uh for this uh to understand all those different Mathematical Concepts so linear algebra calculus differential Theory and then discrete mathematics and those different symbols you need to uh go for instance uh and look for courses or um YouTube tutorials that are about uh basic mathematics uh for machine learning and AI uh don't go and look further you can check for instance Can Academy which is uh
quite favorite when it comes to learning math uh both for uni students and also for just people who want to Learn mathematics and this will be your guide um or you can check our resources at lunch. cuz we are going also to uh provide this resources for you uh in case you want to learn mathematics for your machine Learning Journey the next skill set that you need to gain in order to break into machine learning is the statistics so you need to know this is a must statistics if you want to get into machine learning
and in AI in general so There are few topics that you must um study when it comes to statistics and uh those are descriptive statistics multivariate statistics differential statistics probability distribution and some bial thinking so let's start with descriptive statistics when it comes to descriptive statistics you need to know what is side of mean uh median standard deviation variance and uh just in general how you can uh analyze the data with using this Descriptive measures so distance measures but also variational measures then the next topic area that you need need to know as part of
your statistical journey is the inferential statistics so you need to know those INF famous theories such as Central Lim theorem the low of large numbers uh and how you can um relate to this idea of population sample unbiased sample and also uh a hypothesis testing confidence interval statistical significance uh and uh how You can test different theories by using uh this idea of Staal significance uh what is the power of the test what is type one error what is type two error so uh this is super important for understanding different s of machine learning applications
if you want to get into machine learning then you have probability distributions and this idea of probabilities so to understand those different machine learning Concepts you need to know what are probabilities so What is this idea of probability what is this idea of Sample versus population uh what is what does it mean to estimate probability what are those different rules of probability so conditional probability uh and um those uh probability uh values and rules that usually you can uh apply when you have uh a probability of um multipliers probability of two sums um and then
uh you need to understand some uh popular and you need to know on popular Probability distribution functions and those are berno distribution binomial distribution uh normal distribution uniform distribution exponential distribution so those are all super important distributions that you need to know in order to understand uh this idea of normality normalization uh also uh this idea of baroli Trials and uh relating uh different probability distributions to different uh Uh higher level statistical Concepts so rolling a dice the probability of it how it is related to B no distribution or to binomial distribution and those are
super important when it comes to hypothesis testing but also for uh many other machine learning applications so then we have the Aban thinking this is super important when it comes to more advanced machine learning but also some basic machine learning you need to know what is the B theorem which Arguably is one of the most popular statistical theorems out there comparable also to the central limit theorem you need to know what is conditional probability what is this bias theorem and how does it relate to conditional probability uh what is this uh bi uh statistics ID
at very high level you don't need to know everything in uh super detailed but you need to know um the these Concepts at least at higher level in order to understand Machine learning so to learn statistics and the fundamental concepts of Statistics you can check out the fundamental statistics course at lunch. a here you can learn all this required Concepts and topics and you can practice it in order to get into machine learning and to gain the statistical skills the next skill set that you must know is the fundamentals to machine learning so this covers
not only the basics of machine learning but also the Most popular machine learning algorithms so you need to know this uh different um mathematical side of these algorithms step by step how they work what are the benefits of them what are the demores and and which one to use for what type of applications so you need to know this uh categorization of supervised versus unsupervised versus semi-supervised then you need to know what is this idea of classification regression or uh clustering then you Need to know uh also time series analysis uh you also need to
know uh these different popular algorithms including linear regression also logistic regression LDA so linear discriminant analysis you need to know K Ann you uh need to know uh decision treats both classification and regression case you need to know uh random Forest begging but also boosting so popular boosting algorithms like uh light GBM GBM uh so gradient boosting Models and you need to know uh HG boost uh you uh also need to know um some unsupervised learning algorithm such as K means uh usually used for clust string you need to know theb scan which becomes more
and more popular in uh class string algorithms you also need to know hierarchal class string um and um for all this type of U models you need to understand the idea behind them what are the advantages and disadvantages whether they can be applied for unsupervised Versus supervised versus semi-supervised you need to know whether they are for regression classification or for uh classer beside of this popular algorithms and models you also need to know the basics of uh training a machine learning model so you need to know uh this process behind training validating and testing your
machine learning algorithms so you need to know uh what does it mean to to uh perform hyperparameter tuning what are those Different optimization algorithms that can be used to optimize your parameters such as uh GD SGD SGD with momentum Adam and Adam V you also need to know the testing process this idea of splitting the data into train validation and then test you need to know resampling techniques why are they use including the um bootstrapping and uh cross viation and this different sorts of cross validation techniques such as leave one out cross validation kful Cross
validation validation set approach uh you also need to know um this uh idea of uh Matrix and how you can use different Matrix to evaluate your machine learning models such as uh classification type of metrics like F1 score FB Precision recall um Cru entropy um and also you need to know some Matrix that can be used to evaluate secession type of problems like the uh mean squared error so M root mean squared Error r m uh MAA so the uh absolute uh version of those different sorts of Errors um and um or the residual sum
of squares for all these cases you not only need to know higher level what the those algorithms or those uh topics or concepts are doing but you actually need to know the uh mathematics behind it they are benefits the uh disadvantages because during the interviews you can definitely expect questions that will test uh not only your high level Understanding but also this uh background knowledge if you want to learn machine learning and you want to gain those skills then uh feel free to check out my uh fundamentals to machine learning course which is part of
the ultimate data science boot camp at lunch. or you can also check out and download for free the fundamentals to machine learning learning handbook that I published with free cord Camp then the next skill set that you definitely need To gain is a knowledge in python python is actually one of the most popular programming languages out there and it's being used across software Engineers AI Engineers machine learning Engineers data scientists so this this is the universal language I would say when it comes to programming so if you're considering getting into machine learning in 2024 then
python will be your friend so knowing the theory is one thing then uh implementing it uh in in The actual job is another and that's exactly where python comes in handy so you need to know python in order to perform uh descriptive statistics in order to train machine learning model or more advanced machine learning models or deep learning models you can use for training validation and uh for testing of your models and uh also for building different sorts of applications so python is super powerful therefore it's also gaining such a high uh popularity Across the
globe because it has so many uh libraries it has uh taner flow pie torch both that uh are must if you want to not only get into machine learning but also the advanced levels of machine learning so if you are considering the AI engineering jobs or machine learning engineering jobs and uh you want want to train for instance deep learning models uh or you want to build large language models or generative AI models then you definitely need to learn uh pytorch and Teslow which are Frameworks that I use in order to uh Implement different deep
learning uh which are Advanced machine learning models here are a few libraries that you need to know in order to uh get into machine learning so you definitely need to know pandas napai you need to know psychic learning scipi you also need to know uh nltk for the text Data you also need to know tensor flow and Pythor for a bit more advanced machine learning and um beside this there are Also data visualization libraries that I would definitely suggest you to practice with which are the Met plot lip and specifically the PIP plot and also
the caburn when it comes to python beside knowing how to use libraries you also need to know some basic data structures so you need to know what are these variables how you can create variables what are the matrices arrays how the indexing works and also uh what are the lists what are the sets so unique lists Uh What uh are the ways that you can what are the different operations you can perform uh how does the Sorting for instance work I would definitely suggest you know um some basic data structures and algorithms such as binary
sort so an optimal way to sort your arrays you also need to know uh the data processing in Python so you need to understand how to identify missing data how to uh identify uh duplicate in your data how to clean this how to perform feature engineering So how to combine uh multiple variables or to perform operations to create new variables um you also need to know uh how you can aggregate your data how you can filter your data how you can sort your data and of course of course you also need to know how you
can form AB testing in your Python and how you can train machine learning models how you can test it and how you can evaluate them and also visualize the performance of it if you want to Learn Python then The easiest thing you can do is just to Google for uh python for data science or python for machine learning tutorials or blogs or you can even try out the python for data science course at learner. in order to learn all these Basics and usage of these libraries and some practical examples when it comes to python for
machine learning the next skill set that you need to gain in order to get into machine learning is the basic introduction to NLP natural Language processing so you need to know how to work with text Data given that these days the text data is the Cornerstone of all these different Advanced algorithm such as uh gpts Transformers the attention mechanisms so those uh applications that you see as part of building chat boat or this uh personalized uh applications based on Tex data they are all based on NLP so therefore you need to know these basics of
NLP to just get started with machine Learning so you need to know uh this idea of text Data what are those strings uh how you can clean Text data so how you can clean uh those um dirty data that you get and what are the steps involved such as lower casing uh removing punctuation tokenization uh also what is this idea of stemming lemmatization toop wordss how you can use the nltk in Python in order to perform this cleaning you also need to know uh this idea of embeddings And uh you can also learn this idea
of uh the uh tfidf which is a basic uh NLP algorithm uh you also uh can learn this idea of word embeddings uh the subword embeddings uh and the character embeddings if you want to learn the basics of NLP you can check out those Concepts and learn them as part of the blogs there are many tutorial on YouTube you can also try the introduction to uh NLP course at l. in order to learn this uh different Basics that form the NLP if You want to go beyond this uh intro till medium level machine learning and
you also want to learn bit more advanced machine learning and this is something that you need to know after you have gained all these previous skills that I mentioned then you can gain uh this uh knowledge and the skill set by learning deep learning and also uh you can consider uh getting into generative AI topics so you can for instance learn what are the RNN what are the Ann what Are the cnns you can learn what is this uh out encoder concept what are the variational out encoders what are the uh generative adversarial networks so
gens uh you can understand what is this idea of reconstruction error uh you can understand this um this different sorts of neural networks what is this idea of back propagation the optimization of these algorithms by using the different optimization algorithms such as GD HGD um HGD momentum Adam adamw RMS prop uh You uh can also go One Step Beyond and you can uh get into generative AI topics such as um uh the uh variational out encoders like I just mentioned but also the large language models so if you want to move towards the NLP side
of generative Ai and you want to know how the ched GPT has been invented how the gpts work or the birth model uh then you will definitely need to uh get into this topic of language models so what what are the end grams what is the attention Mechanism what is the difference between the self attention and attention what is uh one head self attention mechanism what is multi-head self attention mechanism you also need to know at high level this uh encoder decoder architecture of Transformers so you need to know the architecture of Transformers and how
they solve different problems of reur neuron networks or RNN and lstms uh you can also look into uh this uh uh encoder based or decoder based Algorithm such as uh gpts or bir model and those all will help you to not only get into machine learning but also stand out from all the other candidates by having this Advanced knowledge let's now talk about different sorts of projects that you can complete in order to train your machine learning skill set that you just learned uh so there are a few projects that I suggest you to complete
and you can put it this on your resume to start to apply for machine learning Roles the first application in the project that I would suggest you to do is building a basic recommender system whether it's a job recommender system or a movie recommender system in this way you can showcase how you can use for instance text Data from those job advertisement how you can use numeric data such as the ratings of the movies in order to build a top end recommended their system this will showcase your understanding of the distance measures Such as cosign
similarity this K&N algorithm idea and this will help you to uh uh tackle this specific uh area of data science and machine learning the next project I would suggest you to do will be to build a regression based model so in this way you will show case that you understand this idea of regression how to work with a Predictive Analytics and predictive model that has a dependent variable response variable that is in the numeric format so here For instance you can uh estimate the salaries of the jobs based on the uh characteristics of the uh
job based on this data which you can get for instance from uh open source uh web pages such as kegle and you can then uh use different sorts of regression algorithms to perform your predictions of the salaries evaluate the model and then compare the performance of these different machine learning regression based algorithms for instance you can use the Uh linear regression you can use the decision trees regression version you can use the um uh random Forest you can use uh GBM exost in order to Showcase and then in one uh graph to compare this uh
performance of these different algorithms by using single regression uh ml modal method metrics so for instance the rmsse this project will show case that you understand how you can train a regression model how you can test it and validate it and it will showcase your Understanding of optimization of this regression algorithm you understand this concept of hyperparameter tuning the next project that I would suggest you to do in order to Showcase your classification knowledge so when it comes to uh predicting a class for an observation given uh the feature space would be uh to uh
build a classification model that would classify emails being a Spam or not a Spam so you can use a publicly available data that will be uh Describing a specific email and then you will have multiple emails and the idea is to uh build a machine learning model that would classify the email to the class zero and class one where class zero for instance can be your uh not being a Spam and one being a Spam so with this binary classification you will showcase that you know how to train a machine learning model for classification purposes
and you can here use for instance logistic regression you Can use also the decision traits for classification case you can also use random Forest the uh EG sh for classification GBM for classification and uh with all these models you can then obtain the performance metrics such as uh F1 score or you can plot the rck curve uh or the uh area under the curve metrix and you can also compare those different classification models so in this way you will also tackle another area of Expertise when it comes to the machine learning then a final project
that I would suggest you to do would be uh from the unsupervised learning to Showcase another area of expertise and here you can for instance use data to add your customers into good better and best customers based on their transaction history the amount of uh money that they are spending in the store so uh in this case you can for instance use uh K means uh DB scan hierarchy clustering and then You can evaluate your uh clustering algorithms and then select the one that performs the best so you will then in this case cover yet
another area of machine learning which would be super important to Showcase that you can not only handle recommender systems or supervised learning but also unsupervised learning and the reason why I suggest you to uh cover all these different areas and complete this four different projects is because in this Way you will be covering different expertise and areas of machine learning so you will be also putting projects on your uh resume that are covering different sorts of algorithms different sorts of uh Matrix and approaches and it will showcase that you actually know a lot from machine
learning now if you want to go beyond the basic or medium level and you want to be considered for medium or Advanced machine learning uh levels and positions You also need to know bit more advanced which means that you need to complete bit more advanced projects for instance if you want to apply for generative AI related or large language models related positions I would suggest you to complete a project Where You Are building a very basic uh large language model and specifically the pre-training process which is the most difficult one so in this case uh
for instance you can build a baby GPT and I'll put a here Link that you can follow where I'm building a baby GPT a basic pre-trained GPT algorithm where uh I am using a text Data uh publicly available data in order to uh uh process data in the same way like GPT is doing and the encoded part of the Transformers in this way you will showcase to your um hiring managers that you understand this architecture behind Transformers architecture behind the um uh large language models and d gpts and you understand how you can use pych
in Python in order to do this Advanced NLP and generative AI task and finally let's now talk about the common career path and the business titles that you can expect from a career in machine learning so assuming that you have gained all the skills uh that are must for breaking into machine learning there are different sorts of business titles that you can apply in order to get into machine learning so when it comes to machine learning uh you can uh get into Machine learning uh and there are different fields that are covered as part of
this so uh first we have the general machine learning researcher machine learning researcher is basically doing a research so training testing evaluating different machine learning algorithms there are usually people who come from academic background but it doesn't mean that you cannot get into machine learning research without getting a degree in statistics Mathematics or in um uh machine learning specifically not at all so uh if you have this um desire and this passion for reading doing research uh and you don't mind reading uh research papers then machine learning researcher job would be a good fit for
you so machine learning combined with research then sets you uh for the machine learning researcher role then we have the machine learning engineer so machine learning engineer is the engineering version of the machine Learning uh expertise which means that we are combining machine learning skills with the engineering skills such as building pipeline so end robust pipeline scalability of the model considering all these different aspects of the model not only from the performance side when it comes to the quality of the algorithm but also the uh scalability of it and when putting it in front of
many users so when it comes to combining engineering with machine learning then You get machine learning engineering so if you're someone who is a software engineer and you want to get into machine learning then machine learning engineering would be the best fit for you so for machine learning engineering you not only need to have all these different skills that I already mentioned but you also need to have this good grasp of uh uh scalability of algorithms the uh uh data structures and algorithms type of um skill set uh the Uh complexity of the moral uh
also system design so this one uh converges more towards in similar to the software engineering position combined with machine learning rather than your pure machine learning or AI role then we have the AI research versus AI engineering position so uh the uh AI research position is similar to The Machine learning uh research position and the AI engineer position is similar to The Machine learning engineer position with Only single difference when it comes to machine learning we are specifically talking about this traditional machine learning so linear regression logistic regression and also uh random Forest exy boost
begging and when it comes to AI research and AI engineer position here we are tackling more the advanced machine learning so here we are talking about deep learning models such as RNN lstms grus CNN or computer vision applications and we are also talking About uh generative AI models large language models so uh we are talking about um the Transformers the implementation of Transformers the gbts T5 all these different algorithms that are from uh more advanced uh AI topics rather than traditional machine learning uh for those you will then be applying for AI research and AI
engineering positions and finally you have these different sorts of obervations niches from AI for Instance NLP research NLP engineer or even data science positions for which you will need to know machine learning and knowing machine learning will set you apart for the source of positions so also the business titles such as data science or technical data science positions NLP researcher NLP engineer for this all uh you will need to know machine learning and knowing machine learning will help you to break into those positions and those career paths If you want to prepare for your deep
learning interviews for instance and you want to get into AI engineering or AI research then I have recently published for free a full course with 100 interview questions with answers for a span of 7.5 hours that will help you to prepare for your deep learning interviews and for your machine learning interviews you can check out my uh fundamentals to machine learning course at lunch. or uh you can download the Machine learning fundamental handbook from free Cod camp and check out my blogs and also free resources at lunch. in order to prepare for your interviews and
in order to get into machine learning let's now talk about the list of resources that you can use in order to get into machine learning in 2024 so to learn statistics and the fundamental concepts of Statistics you can check out the fundamentals to statistics course at Lunch. here you can learn all this required Concepts and topics and you can practice it in order to get into machine learning and to gain the statistical skills then when you want to learn machine learning you can check the fundamentals to machine learning course at lunch. to get all these
basic concepts the fundamentals to machine learning and the list of comprehensive and the most comprehensive list of machine learning Algorithms out there as part of this course then you can also check out the introduction to NLP course at the lunch. aai in order to learn the basic concepts behind natural language preprocessing and finally if you want to Learn Python and specifically python for machine learning you can check out the python for data science course at lunch. and if you want to get access to these different projects that you can practice your machine learning skills That
you just learned you can either check out the ultimate data science boot camp that covers a specific course the uh data science uh project portfolio course covering multiple of these projects that you can train your machine learning skills and put on your resume or you can also check my GitHub account or my LinkedIn account where I cover many case studies including the baby GPT and I will also put the link to this course and to this uh case study in the Link below and once you have gained all the skills you are ready to get
into machine learning in 2024 in this lecture we will go through the basic concepts in machine learning that is needed to understand and follow conversations and solve main problems using machine learning strong understanding of machine learning Basics is an important step for anyone looking to learn more about or work with machine learning we'll be looking at the three Concepts in this tutorial we will Define and look into the difference between supervised and unsupervised machine learning models then we will look into the difference between the regression and classification type of machine learning models after this we
will look into the process of training machine learning models from scratch and how to evaluate them by introducing performance metrics what you can use depending on the type of machine learning model or Problem you are dealing with so whether it's a supervised or unsupervised whether it's regression versus classification type of problem machine learning methods are categorized into two types depending on the existence of the labeled data in the training data set which is especially important in in the training process so we are talking about the So-Cal dependent variable that we saw in the section of
fundamental use statistics Supervised and unsupervised machine learning models are two main type of machine learning algorithms one key difference between the two is the level of supervision during the training phase supervised machine learning algorithms are Guided by the labeled examples while a supervised algorithms are not as learning model is more reliable but it also requires a larger amount of labeled data which can be timec consuming and quite expensive to Obtain examples of supervised machine learning models include regression and classification type of models on the other hand unsupervised machine learning algorithms are trained on unlabeled data
the model must find patterns and relationships in the data without the guidance of correct outputs so we no longer have a dependent variable so unsupervised ml models required trading data that consists only of independent variables or the features And there is no dependent variable or label data that can supervise the algorithm while learning from the data examples of unsupervised models are class string models and outlier detection techniques supervised machine learning methods are categorized into two types depending on the type of dependent variable they are predicting so we have regression type and we have classification type
Some key differences between regression and classification include output type so the regression algorithms predict continuous values while the classification algorithms predict categorized values some key difference between regression and classification include the output type the evaluation metrics and their applications so with regards to the output type regression algorithms predict continuous values while Classification algorithms predict Cate oral values with regard to the evaluation metric different evaluation metrics are being used for regression and classification tasks for example mean squared eror is commonly used to evalate regression models but accuracy is commonly used to evalate classification models when it
comes to Applications regression and classification models are used in entirely different types of Applications regression models are often used for prediction tasks while classifications are used for decision-making tasks regression algorithms are used to predict the continuous value such as price or probability for example a regression model might be used to predict the price of a house based on its size location or other features examples of regression type of machine learning models are linear Regression fixed effect regression exus regression Etc classification algorithms on the other hand are used to predict the categorical value these algorithms take
an input and classify it to one of the several predetermined categories for example a classification model might be used to classify emails as a Spam or as not a Spam or to identify the type of animal in an image examples of classification type of Machine learning models are logistic regression exus classification random Forest classification let us now look into different type of performance metrics we can use in order to evaluate different type of machine learning models for aggression models common evaluation metrics includes residual sum of squared which is the RSS mean squared error which is
the msse the root mean squared error or rmsse and the mean Absolute error which is the m AE this metrix measure the difference between the predicted values and the True Values with a lower indicating a better feed for the model so let's go through these metric one by one the first one is the RSS or the residual sum of squares this is a metrix commonly used in the setting of linear regression when we are evaluating the performance of the model in estimating the different coefficients and here the beta is a coefficient and The Yi is
our dependent variable value and the Y hat is the predicted value as you can see the RSS or the residual sum of square or the beta is equal to sum of of all the square of Y IUS y hat across all I is equal to 1 till n where I is the index of the each row or the individual or the observation included in the data the second Matrix is the MSC or the mean squared error which is the average of the squared differences between the predicted values and the True Values so as you can
see m is equal to 1 / to n and then Su across all i y IUS y head squ as you can see the RSS and the C are quite similar in terms of their formulas the only difference is that we are adding one / to n and then this makes it the average across all the square differences between the predicted value and the actual True Value a lower value of MC indicates a better fit the rmsc which is the root mean squar error is the square root of the msse so as you Can see
it has the same formula as MSC only with the difference that we are adding a square roof on the top of that formula a lower value of rmsc indicates a better fit and finally the Mae or the mean absolute error is the average absolute difference between the predicted values so the Y hat and the True Values or Yi a lower value of this indicates a better fit the choice of a regression metrics depends on the specific problem you are Trying to solve and the nature of your data for instance the msse is commonly used when
you want to penalize large errors more than the small one ones msse is sensitive to outliers which means that it may not be the best choice when your data contains many outliers or extreme values r msse on the other hand which is the square root of the msse makes it easier to interpret so it's easier interpretable because it's in the same Units as Target variable it is commonly used when you want to compare the performance of different models or when you want to report the error in a way that it's easier to understand and to
explain the Mia is commonly used when you want to penalize all errors equally regardless of their magnitude and M is less sensitive to outliers compared to msse for classification models common evaluation metrics include accuracy precision recall and F1 score this Metrics measure the ability of the machine learning model to correctly classify instances into the correct categories let's briefly look into this Matrix individually so the accuracy is a proportion of correct predictions made by the model it's calculated by taking the correct predictions so the correct number of predictions and divide two all number of predictions which
means correct predictions plus incorrect predictions next we will look into the Precision so Precision is the proportion of true positive predictions among all positive predictions made by the model and it's equal to True positive divided to True positive plus false positive so all number of positives true positives are cases where the model correctly predict a positive outcome while false positives are the cases where the model incorrectly predicts a positive outcome next metrix is recall recall is a proportion of true positive predictions Among all actual positive instances it's calculated as the number of true positive predictions
divided by the total number of actual positive instances which means dividing the true POS positive to True positive plus false negative so for example let's say we are looking into medical test a true positive would be a case where the test correctly identifies a patient as having a disease while a false positive would be a case where the test incorrectly Identifies a healthy patient as having the disease and the final score is the F1 score the F1 score is the harmonic mean or the usual mean of the Precision and recall with the higher volume indicating
a better balance between precision and recall and it's calculated as the two times recall times Precision divided to recall P Precision for unsupervised model such as clust string models whose performance is Typically evaluated using metrics that measure the similarity of the data point within a cluster and the the similarity of the data points between different clusters we have three type of metrics that we can use homogeneity is a measure of the degree to which all of the data points within a single cluster belong to the same class A Higher value indicates a more homogeneous cluster
so as you can see homogeneity of age where age is the Simply the short way of describing homogenity is equal to 1 minus conditional entropy given cluster assignments divided to the entropy of predicted class if you're wondering what this entropy is then stay tuned as we are going to discuss this tropy whenever we will discuss the cloud string as well as the cision trees the next metrix is the silid score silid score is a measure of the similarity of the data point to its own cluster compared to the other Clusters a higher silid score indicates
that the data point is well match to its own cluster this is usually used for DB scan or k me so here the silhouette squore can be represented by this formula so the s o or the silhou score is equal to B minus AO / to the maximum of AO and b o where s o is the silhouette coefficient of the data point characterized by o AO is the average distance between o and all the other data points in the cluster to which o Belongs and the B is the minimum average distance from o to
all the Clusters to which o does not belong the final metrix we will look into is the completeness completeness is another measure of the degree to which all of the data points that belong to a particular class are assigned to the same cluster a higher value indicates a more complete cluster let's conclude this lecture by going through the step-by-step process Of evaluating a machine learning model at a very simplified version since there are many additional considerations and techniques that may be needed depending on a specific task and the characteristics of the data knowing how to
properly train machine learning model is really important since this defines the accurate of the results and conclusions you will make the training process starts with the preparing of the data this includes Splitting the data into training and test sets or if you are using more advanced resampling techniques that we will talk about later then splitting your data into multiple sets the training set of your data is used to Feit the model if you have also validation set then this validation set is used to optimize your hyper parameters and to pick the best model while the
test set is to use to evaluate the model Performance when we will approach more lectures in this section we will talk in detail about these different techniques as well as what the training means what the test means what validation means as well as what the hyperparameter tuning means secondly we need to choose an algorithm or set of algorithms and train the model on the training data and save the fitted model there are many different algorithms to choose from and the appropriate algorithm will depend on The specific task and the characteristics of the data as a
third step we need to adjust the model parameters to minimize the error on the training set by performing hyperparameter tuning for this we need to use validation data and then we can select the best model that results in the least possible validation error rate in this step we want to look for the optimal set of parameters that are included as part of our model to end up With a model that has the least possible error so it performs in the best possible way in the final two steps we need to evaluate the model we are
always interested in a test error rate and not the training or the validation error rates because we have not used a test set but we have used training and validation sets so this test a rate will give you an idea of how well the model will generalize to the new unseen data we need to use the optimal set of Parameters from hyperparameter tuning stage and the training data to train the model again with this hyperparameters and with the best model so we can use the best fitted model to get the predictions on the test data
and this will help us to calculate our test error rate once we have calculated the test a rate and we have also obtained our best model we are ready to save the predictions so once we are satisfied With the model performance and we have tuned the parameters we can use it to make predictions on a new data on the test data and compute the performance metrics for the model using the predictions and the real values of the target variable from the test data and this complete this lecture so in this lecture we have spoken about
the basics of machine learning we have discussed the difference between the the unsupervised and supervised learning Models as well as repression versus classification we have discussed in details the different type of performance metrics we can use to evaluate different type of machine learning models as well as we have looked into the simplified version of the step-by-step process to train the machine learning model this video was sponsored by lunarch at lunar Tech we are all about making you ready for your dream job in Tech making data science And AI accessible to everyone whether it's data science
artificial intelligence or engineering at lunar Tech Academy we have courses and boot camps to help you become a job ready professional we are here to help also businesses and schools and universities with a topn training modernization with data science and AI corporate training including the latest topics like generative AI with lunar Tech learning is easy fun and super practical we care About providing an endtoend learning experience that is both practical and grounded in fundamental knowledge our community is all about supporting each other making sure you get where you want to go ready to to start
your Tech Journey luner Tech is where you begin for students or aspiring data science and AI professionals visit Lun Tech Academy section to explore our courses and boot camps and just in general our programs businesses in Need for employee Training upscaling or data science and AI Solutions should head to the technology section on the learner tech. AI page Enterprises looking for corporate training curriculum modernization and customize AI tools to enhance education please visit the lunch Enterprises section at lunch. for a free consultation and customize estimate join Lun Tech and start building your future one data
point at a time in this lecture lecture number two we will discuss a Very important Concepts which you need to know before considering and applying any statistical or machine learning model here I'm talking about the bias of the model and the variant of the model and the tra between the two which we call bias Varian trade of whenever you are using a statistical econometrical or machine learning model no matter how simple the model is you should always evaluate your model and check its error rate in all these cases it comes down to The tradeoff you
make between the variance of the model and the bias of your model because there is always a catch when it comes to the model choice and the performance let let us firstly Define what bias and the variants of the machine learning model are the inability of the model to capture the true relationship in the data is called bias hence the machine learning models that are able to detect the true relationship In the data have low bias usually complex models or more flexible models tend to have a lower bias than simpler models so mathematically the bias
of the model can be expressed as the expectation of the difference between the estimate and the True Value let us also Define the variance of the model the variance of the model is the inconsistency level or the variability of the model performance When applying the model to different data sets when the same model that is trained using training data performs entirely differently than on the test data this means that there is a large variation or variance in the model complex models or more flexible models tend to have a higher variance than simpler models in order
to evaluate the performance of the model we need to look at the amount of error that the model is making for Simplicity let's assume we have the following simple regression model which aims to use a single independent variable X to model the numeric y dependent variable that is we fit our model on our training observations where we have a pair of independent and dependent variables X1 y1 X2 Y2 up to xn YN and we obtain an estimate for our training observations fad we can then compute this let's say F X1 fhe X2 up to fhe
xn which are the estimate for our dependent variable y1 Y2 up to YN and if these are approximately equal to this actual values so one head is approximately equal to y1 Y2 head is approximately equal to Y2 head Etc then the training err rate would be small however if we are really interested in whether our model is predict in the dependent variable appropriately we want to instead of looking at the training error Rate we want to look at our test error rate so so the error rate of the model is the expected Square difference between
the real test values and their predictions where the predictions are made using the machine learning model we can rewrite this error rate as a sum of two quantities where as you can see the left part is the amount of FX minus F hat x squared and the second entity is the variance of the error term so the accuracy of Y head as a prediction for y Depends on the two quantities which we can call the reducible error and the irreducible error so this is the reducible error equal to FX minus F x^2 and then we
have our irreducible error or the variance of Epsilon so the accuracy of Y head as a prediction for y depends on the two quantities which we can call the reducible error and the irreducible error in general The Fad will not be a perfect estimate for f and this inaccuracy will introduce some Errors this error is reducible since we can potentially improve the accuracy of app head by using the most appropriate machine learning model and the best version of it to estimate the F however even if it was possible to find a model that would estimate
F perfectly so that the estimated response to the form of Y head is equal to FX our prediction would still have some error in it this happens because Y is also a function of the error rate Epsilon which By definition cannot be predicted by using our feature X so there will always be some error that is not predictable so variability associated with the error Epsilon also affects the accuracy of the predictions and this is known as the irreducible error because no matter how well we will estimate F we cannot reduce the error introduced by the
Epsilon this error contains all the features that are not included now model so all the unknown factors that have an influence On our dependent variable but are not included as part of our data but we can't reduce the reducible error rate which is based on two values the variance of the estimates and the bias of the model if we were to simplify the mathematical expression describing the error rate that we got then it's equal to the variance of our model plus squ bias of our model plus the irreducible error so even if we cannot reduce
the irreducible error we can Reduce the reducible error rate which is based on the two values the variance and the squared byas so though the mathematical derivation is out of the scope of this course just keep in mind that the reducible error of the model can be described as the sum of the variance of the model and the squared bias of the model so mathematically the error in the supervised machine learning model is equal to the squared bias in the model The variance of the model and the irreducible error therefore in order to minimize the
expected test error rate so on the Unseen data we need to select the machine learning meod that simultaneously achieves low variance and low bias and that's exactly what we call bias variance trade of the problem is is that there is a negative correlation between the variance and the bias of the model another thing that is highly Related to the bias and the variance of the model is the flexibility of the machine learning model so flexibility of the machine learning model has a direct impact on its variance and on its bias let's look at this relationships
one by one so complex models or more flexible models tend to have a lower bias but at the same time complex models or more flexible models tend to have higher variance than simpler models so as the flexibility of the model Increases the model finds the true patterns in the data easier which reduces the bias of the model at the same time the variance of such models increases so as the flexibility of the model decreases model finds it more difficult to find the true patters in the data which then increases the bias of the model but
also decreases the variance of the model keep this topic in mind and we will continue this topic in the next Lecture when we will be discussing the topic of overfitting and how to solve the overfitting problem by using regularization in this lecture lecture number three we will talk about very important concept called overfitting and how we can solve overfitting by using different techniques including regularization this topic is related to the previous lecture and to the topics of error of the model train error rate test error rate bias and a variance of The machine learning model
overfitting is important to know and also how to solve it with regularization because this topic can lead to inaccurate predictions and the lack of generalization of the model to new data knowing how to detect and prevent overfitting is crucial in building effective machine learning models questions about this topic are almost guaranteed to appear during every single Data science interview in the previous lecture we discussed the relationship between model flexibility and the variance as well as the bias of the model we saw that as the flexibility of the model increases model finds the true patteren in
the data easier which reduces the bias of the model but at the same time the variance of such models increases so as the flexibility of the model decreases model finds it more Difficult to find the true patterns in the data which then increases the bias of the model and decreases the variance of the model let's first formally Define what the overfitting problem is as well as what the underfitting is so overfitting occurs when the model performs well in the training while the model performs worse on the test data so you end up having a low
training error rate but a high test error rate and in Ideal World We want our test error rate to be low or at least that the training ER rate is equal to the test error rate overfitting is a common problem in machine learning where a model learns the detail and noise in training data to the point where it negatively impacts the performance of the model on this new data so the model follows the data too closely closer than it should this means that the noise So Random fluctuations of the training data is picked up and
Learned as concepts by the model which it should actually ignore the problem is that the noise or random component of the training data will be very different from the noise in the new data the model will therefore be less effective in making predictions on new data overfitting is caused by having too many features too complex of a model or too little of the data when the model is overfitting then also the model has high variance and low Bias usually the higher is the model flexibility the higher is the risk of overfitting because then we have
higher risk of having a model following the data too closely and following the noise so under fitting is the other way around under fitting occurs when our test error rate is much lower than our training error rate given that overfitting is much bigger of a problem and we w't ideally to fix the case when our test a rate is Large we will only focus on the overfitting and this also the topic that you can expect during your data science interviews as well as something that you need to be aware of whenever you are training a
machine learning model all right so now that we know what overfitting is we should now talk about how we can fix this problem there are several ways of fixing or preventing overfitting first you can reduce the complexity of the model we saw that Higher the complexity of the model higher is the chance of the following the data including the noise too closely resulting in overfitting therefore reducing the flexibility of the model will reduce the overfitting as well this can be done by using a simpler model with fewer parameters or by applying a regularization techniques such
as L1 or L2 regularization that we will talk in a bit kind solution is to collect more data the more data you have the less Likely your model will overfit third and another solution is using a resampling techniques one of which is cross validation this is a technique that allows you to train and test your model on different subsets of your data which can help you to identify if your model is overfitting we will discuss cross viation as well as other resampling techniques later in this section another solution is to apply early stopping early stopping
is a technique where you Monitor the performance of the model on a validation set during the training process and stop the training when the performance should decrease another solution is to use assemble methods by combining multiple models such as decision trees overfitting can be reduced we will be covering many popular emble techniques in this course as well finally you can use what we call dropout dropout is a regularization technique for reducing overfitting in Naral networks by dropping out or setting to zero some of the Neons during the training process because from time to time Dropout
related questions du appear during the data science interviews for people with no experience so if someone asks you about Dropout then at least you will remember that it's a technique used to solve overfitting in the setting of deep learning it's worth noting that there is no one solution that works for all types Of overfitting and often a group of these techniques that we just talk about should be used to address the problem we saw that when the model is over fitting then the model has high variance and low bias by definition regularization or what we
also call shrinkage is a method that shrinks some of the estimated coefficients toward zero to penalize unimportant variables for increasing the variance of the model this is a technique used to solve the Overfitting problem by introducing the Leal bias in the model was significantly decreasing its variance there are three types of regularization techniques that are widely known in the industry the first one is to reach regression or L2 regularization the second one is the ler regression or the L1 regularization and finally the third one is the Dropout which is a regularization technique using deep learning
we will cover the First two types in this lecture let's now talk about re regression or L2 regularization so re regression or L2 regularization is a shrinkage technique that aims to solve overfitting by shrinking some of the modal coefficients towards zero retrogression introduces little bias into the moral while significantly reducing the model variance retrogression is a variation of linear regression but instead of trying To minimize the sum of squared residuales that linear regression does it aims to minimize the sum of square residuales added on the top of the squared coefficients what we call L2 regularization
term let's look at a multiple linear regression example with P independent variables or predictors that are used to model the dependent variable y if you have followed the statistical section of this course you might also Record that the most popular estimation technique to estimate the parameter of the linear regression assuming its assumptions are satisfied is the ordinary Le squares or the OLS which finds the optimal coefficients by minimizing the sum of squared residuales or the RSS so R regression is pretty similar to the OLS except that the coefficients are estimated by minimizing the slightly different
cost or loss Function this is the loss function of the rual regression where beta J is the coefficient of the model for variable J beta0 is the intercept and XI J is the input value for the variable J and observation I Yi is a target variable or dependent variable for observation y and N is the number of samples and Lambda is what we call regularization parameter of the retrogression so this is the L function of OLS that you can see here and added a Penalization term so it's combined the what we call RSS so if
you check out the very initial lecture in this section where we spoke about different metrics that can be used to evaluate regression type of models you can see RSS and the definition of RSS well if you compare this expression then you can easily find that this is the exact formula for the RSS added with an intercept and this right term is what we called a penalty amount which basically represents the Lambda times the sum of the squar of the coefficients included in our model here Lambda which is always positive so it's always larger than equal
zero is the tuning parameter or the penalty parameter this expression of the sum of squared coefficients is called L2 Norm which is why we call this L2 penalty based regression or L2 to regularization In This Way regression assigns a penalty by shrinking their coefficients towards zero reduces the Overall model variance but this coefficient will never become exactly zero so the model parameters are never set to exactly zero which means that all P predictors of the model are still intact this one is a key property of retrogression to keep in mind that it shrinks the parameters
towards zero but never exactly said them equal to zero L2 Norm is a mathematical term coming from linear algebra and it's standing for Alan Norm we spoke about the penalty parameter Lambda what we also call the tuning parameter Lambda which serves to control the relative impact of the penalty on the regression coefficient estimates when the Lambda is equal to zero the penalty term has no effect and the r regression will introduce the ordinary leas squares estimates but as the Lambda increases the impact of the shrinkage penalty grows and the re regression coefficient estimates Approach to
zero what is important to keep in mind which you can also see from this graph is that in r agression large Lambda will assign a penalty to some variables by shrinking their coefficients towards zero but they will never become exactly zero which becomes a problem when you are dealing with a model that has a large number of features and your model has low interpretability retrogression advantage over ordinarily squares is coming from The earlier introduced bias Varian tra of phenomenon so as in Lambda the penalty parameter increases the flexibility of the retrogression F decreases leading to
decrease variance but increase bias the main advantages of retrogression are solving overfitting retrogression can shrink the regression coefficient of less important predictors towards zero it can improve the prediction accuracy as well by reducing The variance and increasing the bias of the model Rich repression is less sensitive to outliers in the data compared to linear regression Rich regression is computationally less expensive compared to loss or regression the main disadvantage of R agression is the low model interpretability as the P so the number of features in your model is large let's now look into another regularization technique
called ler Regression or L1 regularization by definition l or regression or L1 regularization is a shrinkage technique that aims to solve overfitting by shrinking some of the model coefficients towards zero and setting some to exactly zero loss or aggression like retrogression introduces later bias into the model while significantly reducing model variance there is however small difference between the two regression techniques that makes a huge difference In their results we saw that one of the biggest disadvantages of retrogression is that it will always include all the predictors or all the p predictors in the final model
whereas in case of lasso it overcomes this disadvantage so large Lambda or penalty parameter will assign a penalty to some variables by shrinking their coefficients towards zero in case of Rich aggression they will never become exactly zero which becomes a Problem when your model has a large number of features and it has a low interpretability and ler regression overcomes this disadvantage of retrogression let's have a look at the loss function of L regularization so this is the loss function of OLS which is the left part of the formula called RSS combined with a penalty Mount
which is the right hand side of the expression the Lambda times some of the absolute values of the Coefficients beta J as you can see this is the RSS that we just saw which is exactly the same as the loss function of the OS and then we are adding the second term which basically is the Lambda the penalization parameter multipli by the sum of the absolute value of the coefficient beta J where J goes from one till p and the p is number of predictors included in our model here once again the Lambda which is
always positive larger than equal Z Is a tuning parameter or the penalty parameter this expression of the sum of squared coefficients is called L1 Norm which is why we call this L1 penalty based regression or L1 regularization in this way L regression assigns a penalty to some of the variables by shrinking their coefficients towards zero and setting some of these parameters to exactly zero so this means that some of the coefficients will end up being exactly Equal to zero which is a key difference between the ler regression versus the regression the L1 Norm is a
mathematical term coming from the linear algebra and it's standing for man had Norm or distance you might see here a key difference when comparing the uh visual representation of the L of regression compared to the visual representation of the reg regression so if you look at this point you can see that there will be cases where our coefficients will be Set to exactly zero this is where we have this intersection whereas in case of R agression you can recall that there was not a single intersection so the numbers where the circle was close to the
intersection points but there was not a single point when there was an intersection and the coefficients were put to zero and that's the key difference between two regression type of models between these two regularization Techniques the main advantages of loss of regression are solving overfitting so loss of regression can shrink the regression coefficient of less important predictors toward zero and some to exactly zero as the model filters some variables out lasso indirectly performs also what we call feature selection such that the resulted model is highly interpretable and with less features and much more interpretable compared
to the regual regression lasso can also improve The prediction accuracy of the model by reducing the variance and increasing the bias of the model but not as much as the retrogression earlier when speaking about correlation we also briefly discussed the concept of causation we discuss that correlation is not a causation and we also briefly spoke the method used to determine whether there is a causation or not that model is the infamous linear aggression and even if this model is recognized as a simple Approach it's one of the few methods that allows identifying features that have
an impact or statistically significant impact on a variable that we are interested in and we want to explain and it also helps you identify how and how much there is a change in the Target VAR variable when changing the independent variable values to understand the concept of linear regression you should also know and understand the concepts of dependent Variable independent variable linearity and statistical significant effect dependent variables are often referred to as response variables or explained variables by definition dependent variable is a variable that is being measured or tested it's called the dependent variable because
it's thought to to depend on the independent variables so you can have one or multiple independent variables but you can have only one dependent variable That you are interested in that is your target variable let's now look into the independent variable definition so independent variables are often referred as regressors or explanatory variables and by definition independent variable is the variable that is being manipulated or controlled in the experiment and is believed to have an effect on the dependent variable put it differently the value of the dependent Variable is sought to depend on the value of
the independent variable for example in an experiment to test the effect of having a degree on the wage the degree variable would be your independent variable and the wage would be your dependent variable finally let's look into the very important concept of statistical significance we call the effect statistically significant if it's unlikely to have occurred by random chance in other words a statistically Significant effect is one that is likely to be real and not due to a random chance let's now Define the linear regression model formally and then we will dive deep into the theoretical
and practical details by definition the ne regression is a statistical or machine learning method that can help to model the impact of a unit change in the variable the independent variable on the values of another Target variable or the dependent Variable when the relationship between the two variables is assumed to be linear when the linear regression model is based on a single independent variable then we call this model simple linear regression when the model is based on multiple independent variables we call it multiple linear regression let's look at the mathematical expression describing linear regression you
can recall that When the linear regression model is based on a single independent variable we just call it a simple linear regression this expression that you see here is the most common mathematical expression describing simple linear regression so you can see that we are saying that the Yi is equal to Beta 0 + beta 1 x i plus UI in this expression the Yi is the dependent variable and the I that you see here is the index corresponding to The E row so whenever you are getting a data and you want to analyze this data
you will have multiple rows and if you're multiple rows describe the observations that you have in your data so it can be people it can be observation describing uh your data then the each characterizes the specific Row the each row that you have in your data and the Yi is then the dependent variables value corresponding to that each R then the same holds for the XI so The XI is then the independent variable or the explanatory variable or the regressor that you have in your model which is the variable that we are testing so we
want to manipulate it to see whether this variable has a statistically significant impact on the dependent variable y so we want to see whether unit change in the X will result in a specific change in the Y and what kind of change is that so beta Z that you see here is not A variable and it's called intercept or constant something that is unknown so we don't have that in our data and it's one of the parameters of linear regression it's an unknown number which the linear regression model should estimate so we want to use
the linear regression model to find out this uh unknown value as well as the second unknown value which is the beta one as well as we can estimate the error terms which are represented by the UI so beta 1 next to the XI so next to the independent variable is also not a variable so like beta0 is an unknown parameter in linear regression model an unknown number which the linear regression model should estimate beta 1 is often referred as a slope coefficient of variable X which is the number that quantifies how much dependent variable y
will change if the independent variable X will change by one unit so that's exactly what we are most interested in The beta one because this is the coefficient and this is the unknown number that will help us to understand and answer the question whether our independent variable X has a statistically significant impact on our dependent variable y finally the U that you see here or the UI in the expression is the error term or the amount of mistake that the model makes when explaining the target variable we add this value since we know that we
can Never exactly and accurately estimate the target variable so we will always make some amount of estimation error and we can never estimate the exact value of y hence we need to account for this mistake that we are going to make and we know in advance that we are going to have this mistake by adding an error term to our model let's also have a brief look at how multiple linear regression is usually expressed in mathematical terms so you Might recall that difference between the simple linear regression and multiple linear regression is that the first
one has a single independent variable in it whereas the letter or the multiple linear aggression like the name suggest has multiple independent variables in it so more than one knowing this type of Expressions is critical since they not only appear a lot in the interviews but also in general you will see them in a data Science blogs in presentations in books and also in papers so being able to quickly identify and say ah I remember seeing this at once then it will help you to easier understand and follow the process and the story line so
uh what you see here you can read as Yi is equal to Beta 0 plus beta1 * X1 I plus beta 2 * X2 I plus beta 3 * X3 I plus UI so this is the most common mathematical expression describing multiple linear Regression in this case with Tre independent variables so if you were to have more independent variables you should add them with their corresponding indices and coefficients so in this case the method will aim to estimate the model parameters which are beta 0o beta 1 beta 2 and beta 3 so like before Yi
is our dependent variable which is always a single one so we only have one dependent variable then we have beta 0 which is Our intercept or the constant then we have our first slope coefficient which is beta 1 corresponding to our first independent variable X1 then we have X1 I which stands for the independent variable the first independent variable with an index one and the I stands for the index corresponding to the row so whenever we have multiple linear regression we always need to specify two indices and not only one like we had in our
uh single linear regression the Index cor characterizes which independent variable we are referring to so whether it's independent variable one two or three and then we need to specify which row we are referring to which is the index I so you might notice that in this case all the indexes are the same because we are uh looking into one specific row and we are representing this row by using the independent variables the error term and dependent variable so then we are adding our third Term which is beta 2 * X to I so the beta
2 is our third unknown parameter in the model and the second slope coefficient corresponding to our second independent variable and then we have our third independent variable with the corresponding slope coefficient beta 3 as well as we also add like always an error term to account for the error that we know that we are going to make so now when we know what the linear regression is and how to express it in The mathematical terms you might be asking the next logical question well we know that when we know what the linear regression is and
how to express it in the mathematical terms you might be asking the next logical question how do we find those unknown parameters in the model in order to find out how the independent variables impacted the dependent variable finding this unknown parameters is called estimating in data science and in general so we are Interested in finding out the possible values or the values that the best approximate the unknown values in our model and we call this process estimation and one technique used to estimate linear regressions parameters is called OLS or ordinary Le squares so the main
idea behind this approach the OLS is to find the best fitting straight line so the regression line through a set of paired X and y's so our independent variables and depend Variables values by minimizing the sum of squared errors so to minimize the sum of squares of the differences between the observed dependent variable and its values which are the predicted values that we are predicted by our model that's exactly what we want to do by using this linear function of the independent variables the residuals so this is too much information let's go it step by
step so in linear regression we just saw when we were expressing our Simple linear regression we had this error term and we can never know what is the actual error term but what we can do is to estimate the value of the error term which we call residual so we want to minimize the sum of squared residuales because we don't know the errors so we want to find a line that will best fit our data in such way that the error that we are making or the sum of squared errors is as small as possible
and since we don't know the Errors we can estimate the Errors By each time looking at the predicted value that is predicted by our model and the True Value and then we can subtract them from each other and we can see how good our model is estimating the values that we have so how good is our model estimating the unknown parameters so to minimize the sum of squar of the differences between the observed dependent variable and its values predicted by the linear function Of the independent variables so the minimizing the sum of squared residuales so
uh we Define the estimate of parameters and variables by adding a hge on the top of the variables or parameters so in this case you can see that y i head is equal to Beta 0o head plus beta 1 head XI so you can see that we no longer have a error term in this and we say that the Yi head is the estimated value of Yi and beta 0o head is the estimated value of beta 0 beta 1 Head is the estimated value of our beta one and the XI is still our data so
the values that we have in our data and therefore we don't have a hat since that does not need to be estimated so what we want to do is to estimate our dependent variable and we want to compare our estimated value that we got using our OLS with the actual with the real value such that we can calculate our errors or the estimate of the error which is represented by the UI head so the UI Head is equal to Yi minus Yi head where UI head is simply the s esate of the error term or
the residual so this predicted error is always referred as residual so make sure that you do not confuse the error with the residual so error can never be observed error you can never calculate and you will never know but what you can do is to predict the error and you can when you predict the error then you get a residual and what oil is trying to do Is to minimize the amount of error that it's making therefore it looks at the sum of squared residuales across all the observation and it tries to find the line
that will minimize this volum therefore we are saying that the OLS tries to find the best fitting straight line such that it minimizes the sum of squared residuales e this video was sponsored by lunarch at Lunarch we are all about making you ready for your dream job in Tech making data science and AI accessible to everyone with is data science artificial intelligence or engineering at lunar Tech Academy we have courses boot camps to help you become a job ready professional we are here to help also businesses and schools and universities with a topn training modernization
with data science and AI corporate training including the latest topics like Generative AI with luner tech learning is easy fun and super practical we care about providing an endtoend learning experience that is both practical and grounded in fundamental knowledge our Community is all about supporting each other making sure you get where you want to go ready to start your Tech Journey larer Tech is where you begin for students or aspiring data science and AI professionals visit Lun Tech Academy section to explore our courses and boot Camps and just in general our programs businesses in Need
for employee training upscaling or data science and AI Solutions should head to the technology section on the learner tech. AI page Enterprises looking for corporate training curriculum modernization and customized AI tools to enhance education please visit the lunch Enterprises section at lunch. for a free consultation and customized estimate join Lun Tech and start building your Future one data point at a time we have discussed this model when we were talking about this model mainly from the perspective of causal analysis in order to identify features that have a statistically significant impact on the response variable but
linear regression can also be used as a prediction model for modeling linear relationship so let's refresh our memory with the definition of linear regression model by definition linear regression is A statistical or a machine learning method that can help to Morrow the impact of a unit change in the variable the independent variable on the values of another Target variable the dependent variable when the relationship between two variables is linear in the lecture number six on the statistical section we also discussed how mathematically we can express what we call Simple linear regression and a multiple linear
regression so this how The uh simple linear regression can be represented so uh in case of simple linear regression you might recall that we are dealing with just a single independent variable and we always have just one dependent variable both in the single linear regression and in the multiple linear regression so here you can see that Yi is equal to Beta 0 plus beta 1 * XI plus UI where Y is the dependent variable and I is basically the index of each observation or the row And then the beta0 is The Intercept which is also
known as constant and then the beta 1 is a slope coefficient or a parameter corresponding to the independent variable X which is unnown and a constant which we want to estimate along to the Z and then the XI is the independent variable corresponding to the observation I and then finally the UI is the error term corresponding to the observation ey do keep in mind that this error term we are adding because we Do know that we always are going to make a mistake and we can never perfectly estimate the dependent variable therefore to account for
this mistake we are adding this UI so let's also recall the estimation technique that we use to estimate the parameters of the linear regression model so the beta 0 and beta 1 and to predict the response variable so we call this estimation technique ORS or the ordinary Le squares NS is an estimation Technique for estimating the unknown parameters in a linear regression model to predict the response or the dependent variable so we need to estimate the beta0 so we need to get the beta Z head and we need to estimate the beta 1 or the
beta 1 head in order to obtain the Y I head so y i head is equal to Beta zero head plus beta 1 head times x i where the um difference between the Yi had and the Yi so the true value of the dependent variable and the predicted Value they are different will then produce our estimate of the error or what we also call residual the main idea behind this approach is to find the best fitting straight line so the regression line to set of paired X and Y values by minimizing the sum of squared
residuales so we want to minimize our errors as much as possible therefore we are taking their squared version and we are trying to sum them up and we want to minimize this entire error so to minimize the sum Of squar residual so the difference between the observed dependent variable and its values predicted by the linear function of the independent variables we need to use the o one of the most common questions related to linear regression that comes time and time again in the uh data science related interviews is a topic of the assumptions of the
linear regression model so you need to know each of these five Fundamental assumptions of the linear regression and the OLS and also you need to know how to test whether each of these assumptions are satisfied so the first assumption is the linearity Assumption which states that the relationship between the independent variables and the dependent variable is linear we also say that the model is linear in parameters we can also check whether the linearity Assumption Satisfied by plotting the residuals to The fitted values if the pattern is not linear then the estimat will be biased in
this case we say that the linearity assumptions violated and we need to use more flexible models such as Tre based models that we will discuss in a bit that are able to model these nonlinear relationships the second assumption in the linear regression is the Assumption about randomness of the sample which means that the data is randomly sampled and which basically means that the Errors or the residuales of the different observations in the data are independent of each other you can also check whether the second assumption so this assumption about random sample is Satisfied by plotting
the residuals you can then check whether the mean of this residuales is around zero and if not then the OLS estimate will be biased and the second assumption is violated this means that you are systematically over or underpredicting the dependent Variable the third assumption is the exogeneity Assumption which is a really important assumption often as during the data science interviews and exogeneity means that each independent variable is uncorrelated with the error terms exogeneity refers to the assumption that the independent variables are not affected by the error term in the model in other words the independent
variables are assumed to be determined independently of the errors in the model Exogeneity is a key Assumption of the new regression model as it allows us to interpret the estimated coefficients as representing the true causal effect of the independent variables on the dependent variable if the independent variable are not exogeneous then the estimated coefficients may be biased and the interpretation of the results may be invalid in this case we call this problem an endogeneity problem and we say that the independent variable is not Exogeneous but it's endogeneous it's important to carefully consider the exogeneity Assumption
when building a linear regression model as violation of this assumption can lead to invalid or misleading results if this assumption is satisfied for an independent variable in the linear model we call this independent variable exogeneous so otherwise we call it endogeneous and we say that we have a problem of endogeneity endogene refers to the Situation in which the independent variables in the linear regression model are correlated with the error terms in the model in other words the erors are not independent of the independent variables endogeneity is a violation of one of the key assumptions of
the linear regression model which is that the independent variables are exogeneous or not affected by the in a model and the geny can arise in a number of ways for example it can be caused by omitted Variable bias in which an important predictor of the dependent variable is not included in the model it can also be caused by the reverse causality in which the dependent variable affects the independent variable so those two are a very popular examples of the case when we can get an endogenity problem and those are things that you should know whenever
you are interviewing for data science roles as especially when it's related to machine learning because Those questions are uh being asked to you in order to test whether you understand the concept of exogeneity versus endogenity and also in which cases you can get endogenity and also how you can solve it so uh in case of omitted variable bias let's say you are estimating a person's salary and you are using as independent variable their education their number of years of experience and uh some other factors but you are not including for instance your Model a feature
that would describe the uh intelligence of a person or uh for instance IQ of the person well given that those are a very important indicator for a person in order to perform in their uh field and this can definitely have um indirect impact on their salary not including these variables will result in omitted variable bias because this will then be uh Incorporated in your um error term and uh this can Al relate to the other Independent variables because then your uh IQ is also related to the um to the education that you have higher is
your IQ usually higher is your education so in this way you will have an error term that includes an important variable so this is the omitted variable which is then uh correlated with your uh one of your or multiple of your independent variables included in your model so the other example the other codes of the endogenity problem is the reverse Causality and um what reverse causality means is basically that not only the independent variable has an impact on the dependent variable but also the dependent variable has an impact on the independent variable so there is
a reverse relationship which is something that we want to avoid we want to have our features that include in our model that have only an impact on dependent variable so they are explaining the dependent variable but not the other way Around because if you have the um the other way so you have the dependent variable impacting your independent variable then you will have the error term being related to this independent variable because there are some components that also Define your dependent variable so knowing the uh few examples such as those that can cause uh endogenity
so they can violate the exogeneity assumption is really important then uh you can also check for The exogeneity Assumption by conducting a formal statistical test this is called house one test so this an econom metrical test that helps to understand whether you have an exogeneity uh violation or not but this is out of the scope of this course I will however include uh many resources related the exogeneity endogenity the omitted variable bias as well as the reverse causality and also how the house one test can be conducted so for that check Out the inter preparation
guide where you can also find the corresponding free your resources the fourth assumption in linear regression is the Assumption of about homos gastes homos gastes refers to the assumption that the variance of the errors is constant across all predicted values this assumption is also known as the homogeneity of the variant homosa is an important Assumption of the neur regression model as it allows us to Use certain statistical techniques and make inferences about parameters of the model if the errors are not homoskedastic then the result of these techniques may be invalid or misleading if this is
assumption is violated then we say that we have heteroscedasticity heteroskedasticity refers to the situation in which the variance of the error terms in a linear regression model is not constant across all the predicted values so we have a Variating variance in other words the Assumption of homoskedasticity in that case is violated and we say we have a problem of heteroskedasticity heterosis can be a real problem in the ne regression NES because it can lead to invalid or misleading results for example the standard their estimates and the confidence intervals for the parameters may be incorrect which
means that also the statistical test may have incorrect type one error rates so you Might recall when we were discussing the linear regression as part of the fundamentals to St section of the course is that we uh looked into the output that comes from a python and we saw that we are getting uh estimates as part of the output as well as standard errors then the T Test so the student T test and then the corresponding P values and the 95% confidence intervals so whenever there is a heos problem the um coefficient might still be
accurate but Then the corresponding standard error the student T Test which is based on the standard error and then the P value as well as the uh confidence intervals may not be accurate so you might get the uh good and reasonable coefficient but then you don't know how to correctly evaluate them you might end up uh discovering that um you might end up stating that certain uh independent variables are statistically significant because their coefficient are statistically Significant since their P values are small but in the reality those P values are misleading because they are based
on the wrong statistical uh test and they are based on the wrong standard errors you can check for this assumption by plotting the residuals and see whether there is a finel like graph if there is a finel like graph then you have a constant variance but if there is not then you won't see this fanel like this shape that indicates that your variances Are constant and if not then we say we have a problem of heteros syst if you have a heteros syst you can no longer use the OS and the linear regression and instead
you need to look for other more advanced econometrical regression techniques that do not make such a strong assumption regarding the variance of your um res idual so you can for instance use the GLS the fgs the GMM and this type of solutions will um help to solve the hosc problem and they will not Make a strong assumptions regarding the variance in your model the fifth and the final assumption in linear regression is the Assumption about no perfect multicolinearity this assumption states that there are no exact linear relationships between the independent variables multicolinearity refers to the
case when two or more independent variables in your linear regression model are highly correlated with each other this can be a problem Because it can lead to unstable and unreliable estimate of the parameters in the model perfect multicolinearity happens when the independent variables are perfectly correlated with each other meaning that one variable can be perfectly predicted from the other ones and this can cause the estimated coefficient in your linear regression model to be infinite or undefined and can lead your errors to be uh entirely misleading when making no predictions Using this model if perfect multicolinearity
is detected it may be necessary to remove one if not more problematic variables such that you will avoid having correlated variables in your model and even if the perfect multicolinearity is not present multicolinearity at a high level can still be a problem if the correlations between the independent variables are high in this case case the estimate of the parameters may be imprecise and the Model may be uh entirely misleading and will result in less reliable uh predictions so uh to test for the multicolinearity Assumption you have different solutions you have different options the first way
uh you can do that is by using the uh de test De test is a formal statistical and econometrical test that will help you to identify which variables cause the problem and whether you have a perfect multicolinearity in your linear Regression model you can PL heat map which will be based on the correlation Matrix corresponding to your features then you will have your uh correlations per pair of independent variables plotted as a part of your heat map and then you can identify all the um pair of features that are highly correlated with each other and
those are problematic features one of which should be removed from your model and in this way by uh showing the heat map you can also Showcase your stakeholders why you have removed certain variables from your model whereas explaining a Diller test is much more complex because it involves more advanced econometrics and linear regression um explanation so if you're wondering how you can perform this D Flur test and you want to prepare the uh questions related to perfect multicolinearity as well as how you can solve the perfect multicol lity problem in your linear regression Model then
head towards the interview preparation guide included in this part of the course in order to answer such questions and also to see the 30 most popular interview questions you can expect from this section in the interview preparation guide now let's look into an example coming from the linear regression in order to see how all those pieces of the puzzle come together so let's say we have collected data on a class side and a test course For a sample of students and we want to model the linear relationship between the class size and the test course
using a linear regression model so as we have just one independent variable we are dealing with a simple linear regression and the model equation would be as follows so you can see that the test course is equal to beta0 plus beta 1 multiply by class size plus Epsilon so here the class size is the single independent variable that we got in our Model the t score is the dependent variable the beta Z is The Intercept or the constant the beta one is a coefficient of Interest as it's the coefficient corresponding to our independent variable and
this will help us to understand what is the uh impact of a unit change in the class size on the test score and then finally we are including in our model or error term to account for the mistakes that we are definitely going to make when estimating The uh dependent variable the test course the goal is to estimate the coefficient beta 0 and beta 1 from the data and use the estimated model to predict the test course based on a class size so once we have the estimates we can then interpret them as follows the
Y intercept the beta zero represents the expected test course when the class size is zero it represents the base score that the student would have obtained if The class size would have been zero then the coefficient for the class size the beta one represents the change in the test course associated with the one unit change in the class size the positive coefficient would imply that one unit change in the class size would increase the test course whereas the negative coefficient would uh imply that the one unit change in the class size will decrease the test
course uh correspondingly we can then use this Model with OLS estimate in order to predict the task course for any given class size so let's go ahead and Implement that in Python if you're wondering how this can be done then head towards resources section as well as the part of the Python for data science where you can learn more about how to work with pendant data frames how to import the data as well as how to fit a linear regression model so the problem is as follows we have collected a data On the class size
and we have this independent variable so as you can see here we have the students underscore data and then we have the class size and this s feature and then we want to estimate why which is the test score so uh well here is the code a sample code that will fit a linear regression model we are keeping here everything very simple we are not splitting our data into training test and then fting the model on the training data and making The predictions with the test score but we just want to see how we can
interpret the uh coefficients so keeping everything very simple so you can see here that we are getting an intercept equal to 63.7 and the coefficient corresponding to our single independent variable class size is equal to minus 0.4 what this means is that so each increase of the uh class sides by one unit will result in the decrease of the Test scores with 0.4 so there's a negative relationship between the two now the next question is whether there is a statistical significance whether the uh coefficient is actually significant and whether the class size has actually statistically
significant impact on the dependent variable but all those are things that we have discussed as part of the fundamental statistic section of this course as well as we are going to look into a linear regression Example when we are going to discuss the hypothesis testing so I would highly suggest you to uh stop in here to revisit the fundamentals statistic section of this course to refresh your memory in terms of linear regression and then um check also the hypothesis testing uh section of the the course in order to look into a specific example of linear
regression when we are discussing the standard errors how you can evaluate your OLS estimation results how you can Use the student T Test the P value and the confidence intervals and how you can estimate them in this way you will learn for now only the theory related to the coefficients and then you can um add on the top of this Theory once you have learned all the other sections and the other topics in this course let's finally discuss the advantages and the disadvantages ages of the linear regression model so some of the advantages of the
linear regression Model are the following the linear regression is relatively simple and easy to understand and to implement linear regression models are well suited for understanding the relationship between a single independent variable and inde dependent variable also linear regression can help to handle multiple independent variables and can estimate the unique relationship between each independent variable and a corresponding dependent variable the ne regression Model can also be extended to handle more complex models such as pooms interaction terms allowing for more flexibility in the modeling the data also linear regression model can be easily regularized to prevent
overfitting which is a common problem in modeling as we saw uh in the beginning of this section so you can use for instance re regression which is an extension of linear regression you can use l regression which is also an Extension of linear regression model and then finally linear regression models are widely supported by software packages and libraries making it easy to implement and to analyze and some of the this advantages of the linear aggression are the following so the linear aggression models make a lot of strong assumptions regarding for instance the linearity between independent
variables and independent variables while the true relationship can actually be also Nonlinear so the model will not then be able to capture the complexity of the data so the nonlinearity and the pr predictions will be inaccurate therefore it's really important to have a data that has a linear relationship for linear regression to work linear regression also assumes that the error terms are normally distributed and also homoskedastic error terms are independent across observations violations of the strong assumptions Will lead to bias and inefficient estimates linear regression is also sensitive to outliers which can have a disproportionate
effect on the estimate of the regression coefficient linear regression does not easily handle categorical independent variables which often require additional data preparation or the use of indicator variables or using encodings finally linear regression also assumes that the independent variables Are exogeneous and not affected by the error terms if this assumption is violated then the result of the model may be misleading e [Music] imagine you have a friend Alex who collects stamps every month Alex buys a certain number of stamps and you notice that the amount Alex spends seems to depend on the number of stamps
bought now you want to create a little tool That can predict how much Alex will spend next month based on the number of stamps bought this is where linear regression comes into play in technical terms we're trying to predict the dependent variable amount spent based on the independent variable number of stamps bought below is some simple python code using pyit learn to perform linear regression on a created data set the linear regression analysis was carried out through a structured process Using Python's numpy and matplot lib libraries as well as psychic learns linear regression class initially
the libraries were imported to facilitate numerical computations and data visualization this Foundation step ensures that all necessary functions and methods are available for executing the analysis subsequently the data was organized with stamps and Bots serving as the independent variable and amount is spent as the dependent Variable the stamps array was reshaped into a two-dimensional array using reshape this modification was necessary because the psyit learn Library requires input features in a specific format once the data was appropriately formatted a linear regression model was instantiated from the linear regression class and then trained with the fit method
by passing the reshaped stamps bought an amount spent arrays to this method the model learned the Relationship between the number of stamps bought and the corresponding amount spent the trained model was then used to predict the expenditure for a hypothetical future scenario where 10 stamps are bought this was accomplished using the predict method which was called with an input array representing 10 stamps the model used its learned parameters to estimate the outcome based on this input for visualization the original Data points and the regression line were plotted using matte plot lib the scatter function was
used to plot the data points in blue illustrating the actual amount spent for different quantities of stamps bought the regression line plotted in red using the plot function demonstrated the predicted relationship as learned by the model finally the prediction for 10 stamps was displayed using a print statement this demonstrated how the model's predictions Can be interpreted and used providing a specific numerical estimate for the amount likely to be spent on stamps in the given scenario this complete process from data preparation through training to prediction showcases how linear regression can be applied to derive insights from
Real World data effectively let's go through some of the concepts and variables we are using in this chapter sample data the term stamps bought refers to the number of stamps Alex bought each month and amount spent represents the corresponding money spent creating and training model we use linear regression from pyit learn to create and train our model using fit predictions the train model is then used to predict the amount Alex will spend for a given number of stamps in the code we predict the amount for 10 stamps plotting we plot the original data points in
blue and the predicted line in red to visually Understand our model's prediction capability displaying prediction finally we print out the predicted spending for a specific number of stamps 10 in this case this graph illustrates the outcome of a simple linear regression analysis which seeks to capture the relationship between two variables the number of stamps purchased and the total expenditure incurred the red line depicted in the graph is known as the regression line representing the best Fit through the plotted green data points the slope of this regression line is particularly telling it quantifies the increase in
total cost associated with each additional stamp purchased such insights are invaluable for budgeting purposes or for forecasting future expenses based on past purchasing Behavior each of the green data points on the graph corresponds to an actual purchase event where both the quantity of stamps bought and the precise amount Spent are known the close clustering of these points around the regression line strongly supports the presence of a linear relationship between the number of stamps purchased and the total expenditure this alignment suggests that the model has effectively captured the underlying Trend in the data offering a reliable
basis for predictions employing simple linear regression in scenarios like this provides a clear and quantifiable understanding of how one Variable affects another in the realm of data analysis this method serves as a powerful tool enabling analysts to draw significant conclusions and make informed decisions based on the observed relationships between variables the Simplicity yet robustness of linear regression make it an indispensible technique in the toolkit of anyone seeking to extrapolate future behavior from historical data now let's go through some other Examples where linear regression is used one real estate pricing linear regression is widely used in
real estate to predict house prices based on various features such as square footage number of bedrooms number of bathrooms age of the house and location for instance a regression model can help determine how much an additional bathroom adds to The house's value for example a real estate company uses linear regression to understand how the Proximity to city center centers affects the market price of properties they find that for each mile closer to the city center house prices increase by an average of $10,000 two credit scoring financial institutions often employ linear regression to predict the credit
worthiness of individuals based on historical financial data including income levels existing debts and past repayment Histories for example a bank may use linear regression to determine how an applicant's credit score changes with variations in their debt to income ratio this helps in deciding whether to approve a loan application three supply chain costs linear regression can analyze and predict costs associated with different components of the supply chain such as Transportation labor and materials based on factors like distance fuel prices and Labor rates for example a manufacturing company uses linear regression to predict Logistics costs as a
function of fuel price fluctuations and shipping distance to better manage their budget and set product prices four Health Care in healthcare linear regression could be used to predict patient outcomes based on treatment methods dosage levels and patient demographics for example a medical research team of applies linear Regression to study the relationship between dosage of a new drug and patient recovery rate the findings indicate that increasing the drug dose by one unit enhances recovery rates by 5% five academic performance educational institutions might use linear regression to predict student Performance Based on study habits attendance rates and
previous grades for example a university conducts a study using linear regression to Understand how the number of hours spent studying per week impacts students GPA the analysis reveals that every additional hour of study per week correlates with an increase of 0.05 in GPA six energy consumption energy companies can use linear regression to forast consumption levels based on factors like temperature time of the year and economic activity for example an energy utility uses linear regression to model how Electricity usage increases with Rising temperatures during summer months the model helps the company prepare for will demonstrate how
to use logistic regression to predict Jenny's book preferences we have a data set where each entry records the number of pages in the books Jenny read and whether she liked them Step One Import libraries we start by reporting numpy to manage our data matte plot lib for visualization and Scar's logistic regression and Accuracy score for building and evaluating our model step two prepare the data We Begin by setting up our data set Pages holds the number of pages in each book as an independent variable and likes records Jenny's reaction as a dependent binary variable one
for like and zero for dislike it's essential to reshape pages into a 2d array since scikit learns model expects features in this format this Preparation ensures our model can interpret the data correctly step three create and train the model we Define a logistic regression model and train it with our data set using the F method this method optimizes the model's parameters to best explain the relationship between the number of pages and Jenny's preferences training involves finding the statistical parameters that minimize prediction error effectively learning From the data step four make predictions after training we use
the predict method to estimate Jenny's reaction to a book with 260 Pages this part of the code shows how the trained model applies what it has learned to new data step five plotting we visually present the data and model predictions we plot the data points in green displaying actual likes and dislikes the model's predicted probabilities are shown in red giving a visual representation of the likelihood Of liking books of various page lengths specific markers highlight the query point at 260 Pages helping to contextualize the prediction visually step six displaying prediction finally we we display the
prediction result directly from our model using a print statement this tells us if Jenny is predicted to like or dislike the book based on the number of pages illustrating the practical application of our logistic regression model Conclusion this process illustrates how logistic regression can be used to make predictions based on historical data providing insights that can be applied in various Fields Beyond just reading preferences now let's go over the results produced by our code the slope of the red line not linear change the slope of the red line is positive indicating an upward curve as
we move along the x axis representing the number of pages this is crucial because it Means the relationship between page count and Jenny's liking isn't a simple straight line increase in other words each additional page doesn't increase her enjoyment by the same amount the sigmoids effect the s-shaped curve typical of logistic regression means that the change in probability is more pronounced for certain ranges of page length there might be zones where adding more pages drastically increases the likelihood of liking the book in Contrast other zones might show very little increase in probability despite many more
pages this is different from linear regression where the slope remains constant and the change is directly proportional reasoning behind the slope this curve's shape could suggest some underlying patterns in Jenny's preferences short is not sweet very short books might not be her cup of tea as the stories Could Feel Underdeveloped The Sweet Spot there might be a middle range of page counts where the probability jumps significantly indicating her favorite type of book length too long oh too much perhaps extremely long books thought to have a diminishing return on her enjoyment Green Dots model accuracy proximity
matters the fact that the green dots representing actual books Jenny has read are clustered tightly around the red line is a very good sign It signifies that the model's predictions closely align with Jenny's real world preferences data validation this clustering demonstrates the model has successfully picked up on the underlying pattern between page number and Jenny's like dislike reactions consequently we can have more confidence in its predictions for new books exceptions if some green dots were far away from the line that would be a cause for concern it would mean the Model is consistently mispredicting in
those regions and might need refinement the threshold line 0.5 decision time the line at 0.5 is where we translate The Continuous probability values into the Practical like or dislike recommendations for Jenny not set in stone while 0.5 is a common threshold it's not mandatory depending on how much we prioritize avoiding false positives like predictions that turn out to be dislikes Or false negatives dislike predictions when Jenny would have actually enjoyed the book we might move this threshold higher or [Music] lower customizing for Jenny if Jenny indicates she generally likes to try books even if she's
not super sure we might lower the threshold this gives her more recommendations even if the certainty of her liking them is slightly lower this logistic regression model Reveals that Jenny's book preferences are influenced by page count in a nonlinear way and it has successfully learned the underlying pattern to provide more informed recommendations let's explore in what other sections logistic regression is used we'll explore how logistic regression can be used to predict customer churn for a subscription based company churn or the act of customers cancelling their subscriptions poses a Significant challenge for such companies logistic regression
with its binary outcome prediction capabilities is an ideal choice for this scenario the problem at hand is straightforward the company wants to proactively identify customers who are at high risk of cancelling their subscriptions also known as churning logistic regression is particularly suitable for this task because it deals with binary outcomes where a customer either churns one or Continues their subscription how logistic regression is used data Gathering the company starts by collecting historical data on various aspects of customer behavior and demographics this includes usage patterns such as frequency and feature engagement support interactions such as tickets
opened and types of complaints plan types distinguishing between basic and premium subscriptions demographic information like age location Etc model Training the collected data is divided into training and testing sets a logistic regression model is then trained using the training data during training the model learns how different factors such as usage patterns and demographics relate to the probability of churn scoring new customers once the model is trained it can be used to score new customers each customer's data is fed into the trained model which generates a probability score between do Z and one indicating their likelihood
of churning proactive action customers with high churn probabilities are identified and receive targeted attention interventions may include offering special deals personalized Outreach or addressing common pain points known to lead to [Music] churn imagine Sarah who loves cooking and trying various fruits she's noticed that the fruits she enjoys tend to fall Into certain size and sweetness ranges could she predict whether she'll like a new fruit based on these characteristics linear discriminant analysis LDA is the perfect tool to help her out LDA is a powerful technique for classifying Things based on several features think about how facial
recognition software can identify individuals in Sarah's case we can use LDA to find patterns in the size and sweetness of fruits she's liked or Disliked in the past LDA will search for a way to create a sort of like versus dislike boundary based on these features imagine each fruit as a point on a graph where the x-axis is size and the y- axis is sweetness LDA tries to draw the best possible line to separate the like and dislike fruits of course some overlap might happen maybe there are some small fruits she surprisingly loves LDA aims
to find the line that does the best possible job of separating the groups Overall LDA is great when you have multiple features to consider at once Sarah could try looking at size or sweetness alone but LDA lets her combine this information for potentially better predictions so let's go through the code now first we need to ensure that we have all the necessary tools at our disposal we'll be using python for our code tasks and will import the powerful libraries numpy for numerical operations and matplot lib P plot for data Visualization additionally we'll utilize psyit learn's
linear discriminant analysis module for implementing LDA now let's create a sample data set we'll curate a data set consisting of eight fruits each characterized by two features size and sweetness these features will serve as our input while the corresponding labels will indicate whether each fruit is liked one or disliked zero this data set will form the foundation for training our Predictive model with our data set ready it's time to build our predictive model using linear discriminant analysis we'll create an instance of an LDA model object and proceed to train it using the features size and
sweetness and corresponding labels from our sample data set we'll select a new fruit with a size of 2.5 and a sweetness of six as our test case using these feature values the model will predict whether Sarah would Like this fruit or not this prediction will provide valuable insights into the model's decision-making process and its ability to generalize to unseen instances to visualize the results of our analysis will plot the sample data set on a scatter plot in this plot fruits that are liked by Sarah will be represented by Blue markers while disliked fruits will be
marked in yellow additionally we'll highlight the new fruit being predicted with a distinct Red X marker providing a clear visual representation of its classification after making the prediction we'll display the outcome we'll print a statement indicating whether Sarah is likely to enjoy the new fruit based on the model's classification decision this Insight will offer valuable information on how the model interprets the given features and arrives at its prediction here are several considerations we should make Which shows a line graph depicting fruit enjoyment based on size and sweetness the x axis represents size the Y AIS
represents sweetness and the data point points are colored in Orange like and blue dislike here are some observations you can make class separation there appears to be some separation between the orange like and blue dislike data points this suggests that size and sweetness may be useful factors in predicting whether Sarah will Enjoy a particular fruit overlap between classes there's also some overlap between the two classes particularly in the region of larger and sweeter fruits this indicates that size and sweetness alone may not perfectly predict Sarah's preferences other factors not considered here might also influence her
enjoyment here are additional considerations we should make sample size the number of data points is not visible in the image a larger data Set might provide a clearer picture of the class separation and improve the model's generaliz iability nonlinear relationships the graph assumes a linear relationship between size sweetness and enjoyment if the true relationship is more complex a model like LDA might not capture it perfectly overall the data suggests a potential link between fruit size sweetness and Sarah's preferences LDA is a promising technique to explore for building a Classification model but it's important to consider
potential limitations and the need for more data if available where decision trees take root decision trees with their intuitive branching structure find use across various Industries and problem domains let's dive into some key areas where they prove particularly effective business and finance customer segmentation analyze customer data to identify groups with similar behaviors Or purchasing p patterns for targeted marketing strategies fraud detection identify patterns and transactions that may indicate fraudulent activity credit risk assessment evaluate the credit worthiness of loan applicants based on their financial history and other factors operations management optimize decision making in areas like
Inventory management Logistics and resource allocation Healthcare care medical Diagnosis support assist in diagnosing diseases by guiding clinicians through a series of questions and tests based on patient symptoms and medical history treatment planning help determine the most suitable treatment options based on patient characteristics and disease severity disease risk prediction identify individuals at high risk of developing certain health conditions based on factors like lifestyle family history and medical data science and Engineering fault diagnosis isolate the cause of malfunctions or failures in complex systems by analyzing sensor data and system logs classification in biology categorize species based on
their characteristics or DNA sequences remote sensing analyze satellite imagery to classify land cover types or identify areas affected by natural disasters customer service troubleshooting guides create interactive decision trees to guide customers through troubleshooting Steps for products or Services chatbots power automated chatbots that can categorize customer inquiries and provide appropriate responses reducing weight times and improving support efficiency other applications game playing design AI opponents in games that can make strategic decisions logistic regression is a popular approach for performing classification when there are two classes but when the classes are well separated or the number Of classes
exceeds two the parameter estimates for the logistic regression model are surprisingly unstable unlike logistic regression LDA does not suffer from this instability problem when the number of classes is more than two if n is small and the distribution of the predictors x is approximately normal in each of the classes LDA is again more stable than the logistic regression model Noah this video was Sponsored by lunarch at lunarch we are all about making you ready for your dream job in Tech making data signs and AI accessible to everyone whether it's data science artificial intelligence or engineering
at lunar Tech Academy we have courses and boot camps to help you become a job ready professional we are here to help also businesses and schools and universities with a top notot training modernization with data science and AI corporate training including the Latest topics like generative AI with lunar Tech learning is easy fun and super practical we care about providing an endtoend learning experience that is both practical and grounded in fundamental knowledge our community is all about supporting each other making sure you get where you want to go ready to start your Tech Journey lunner
Tech is where you begin for students or aspiring data science and AI professionals visit Lun Tech Academy Section to explore our courses and boot camps and just in general our programs businesses in Need for employee training upscaling or data science and AI Solutions should head to the technology section on the lunch. page Enterprises looking for corporate training curriculum modernization and customize AI tools to enhance education please visit the lunarch Enterprises section at lunch. for a free consultation and customize estimate join lunch and start Building your future one data point at a time is a botanist
who has collected data about various plant species and their characteristics such as Leaf size and flower color Noah is curious if he could predict a plant species based on these features here we'll utilize random forest and Ensemble learning method to help him classify plants technically we aim to classify plant species based on certain predictive variables using a random forest [Music] model the bias variance tradeoff the key challenge in machine learning lies in finding the right balance between bias and variance generally reducing bias increases variance and vice versa complex models tend towards low bias and high
variance while simpler models tend towards the opposite the ideal model finds a sweet spot between underfitting High bias and overfitting high variance a balance that Depends heavily on the nature of your specific problem and the trade-off you're willing to make between flexibility and stability naive Bay versus logistic regression naive Baye is known for high bias it assumes features are independent which often isn't true however its Simplicity makes it less prone to overfitting low variance and computationally fast to train on the other hand logistic regression is more Flexible low bias and can model complex decision boundaries
this comes at the risk of overfitting high variance especially with many features or little regularization when to choose which if you prioritize speed and simplicity naive Bay might be a good starting point when your data relationships are unlikely to be simple and independent logistic regressions flexibility becomes valuable however if you choose logistic regression you need to actively manage Overfitting potentially using techniques like regularization plotting purposes and CET learns logistic regression and linear discriminant analysis for classification tasks step two generating synthetic data next we Define a function called generate data to create synthetic data for our
classification experiment the function generates data points from three initial classes each centered at 0 0 3 0 and 6 0 for each class random data Points are generated around the respective Center using a goian distribution step three data generation and model fitting we generate a data set with 40 samples per class using the generate data function then we fit logistic regression and LDA models to this data set step four analyzing the results after fitting the models we print the coefficients for both listic regression and LDA for the initial three classes these coefficients provide Insights into
the decision boundaries learned by each model step five adding an extra class we then introduce a new class to our data set by generating additional data points centered at 90 we append this new class class to our data set and update the corresponding labels accordingly step six refitting the models following the addition of the new class we refit both logistic regression and LDA models to the updated data set with four classes step seven analyzing The results for four classes finally we print the coefficients for both models after fitting them to the data set with four
classes this allows us to observe how the decision boundaries change with the inclusion of the new class the provided code serves to elucidate the intricate concepts of bias variance and the bias variance tradeoff by just opposing naive Bay and logistic regression classifiers on a synthetic data set here's a cohesive explanation Of the codes functionality to begin the script generates a synthetic data set comprising two classes arranged in circular patterns the make circles function from s kit learn is employed for this purpose creating data that challenges the assumptions of naive Bay due to its nonlinear separability
following data Generation The Script proceeds to train two Distinct classifiers firstly aian naive Bas model is trained on the data set this Choice aligns with the high bias low variant characteristics of naive Bay given its Simplicity and Assumption of feature Independence secondly a logistic regression model is trained with regularization CMA 1.0 regularization is introduced to combat overfitting a concern for logistic regression due to its Flexibility potentially leading to higher variants and the script visualizes their decision boundaries through plots showcasing the decision boundaries of both models it becomes evident that naive Bas delineates a simpler linear
boundary while logistic regression captures the data set's nonlinearity furthermore the script calculates the accuracy of each model on a heldout test set facilitating a comparative analysis of their Performance a deeper understanding of the bias variance tradeoff emerged from this comparison naive Bas exhibits higher bias due to its simplifying assumptions resulting in a less complex decision boundary on the contrary logistic regressions flexibility enables it to learn the non-linear pattern with lower bias potential albeit at the risk of overfitting the importance of contextual considerations becomes apparent while logistic regression often Boasts lower bias naive baz's Simplicity and
computational efficiency May render it preferable in certain [Music] contexts Tom is a movie Enthusiast who watches films across different genres and Records his feedback whether he liked them or not he has noticed that whether he likes a film might depend on two aspects the movie's length and its genre can we predict whether Tom will like a movie based on these two Characteristics using naive Bay technically we want to predict a binary outcome like or dislike based on the independent variables movie length and genre let's delve into the code's functionality the initial step involves importing essential
libraries necessary for the code's operations numpy facilitates numerical operations while matplot lib AIDS in visualization Additionally the Gan naive based implementation from Psychic learn is Imported to utilize its functionalities following the Imports the script defines sample data representing movie features and corresponding likes each movie is characterized by its length in minutes and a genre code notably genres are numerically encoded with zero signifying action one representing romance and so forth this structured representation prepares the data for subse quent analysis moving forward a gan naive Bas Model is instantiated this model serves as the predictive engine leveraging
its inherent assumptions about feature Independence to classify movie likes subsequently the model is trained using the provided movie features and their Associated likes once the model is trained it is ready to make predictions a new movie is defined with its length and genre code leveraging the trained naive baz model predictions are made regarding whether Tom would like this Movie based on its features this step demonstrates the practical application of the trained model in making real world predictions the script proceeds to visualize the data set through a scatter plot each existing movie is plotted based on
its length and genre code with liked movies depicted in one color and disliked movies in another Additionally the new movie is plotted with a distinct marker providing visual context to Aid interpretation the provided code serves To elucidate the intricate concepts of bias variance and the bias variance tradeoff by just opposing naive Bay and logistic regression classifiers on a synthetic data set here's a cohesive explanation of the code's functionality to begin the script generates a synthetic data set comprising two classes arranged in circular patterns the make circles function from s kid learn is employed for this
purpose Creating data that challenges the assumptions of naive Bays due to its nonlinear separability following data Generation The Script proceeds to train two distinct classifiers firstly aian naive Bas model is trained on the data set this Choice aligns with the high bias low variance characteristics of naive Bays given its Simplicity here are a few observations and conclusions we can make clear Separation we can see that there is a clear separation between the like and dislike clusters this indicates that the naive Bas model has successfully found a distinct boundary based on the movie length and genre
features the strong separation suggests these features are powerful predictors of preference prediction confidence since the new movie Red X Falls squarely within a cluster we can have a high degree of confidence in the model's Prediction further exploration even with good separation movies lying close to the boundary deserve closer examination they may offer insights into less predictable cases and help refine the model further overlapping clusters if the like and dislike groups overlap significantly it suggests that movie length and genre alone may not be enough to accurately predict preferences in all situations model limitations in cases of
overlap naive Bas might be too Simplistic exploring more sophisticated models like decision trees or support Vector machines could improve accuracy need more data a larger and more diverse data set or the inclusion of additional features EG star ratings director could help uncover clearer patterns in situations where the current features aren't sufficient genre significance if distinct clusters form based on genre codes it means the naive Bay model recognizes genre as a strong Indicator of preference personaliz recommendations this genre based Insight can be used to tailor recommendations if a user consistently enjoys a particular genre movies within
that genre should be prioritized even if their length deviates from the usual pattern caveats it's important to remember that genre preferences are subjective and might have naturally mixed reactions within a given genre where decision trees take Root decision trees with their intuitive branching structure find use across various Industries and problem domains let's dive into some key areas where they prove particularly effective business and finance customer segmentation analyze customer data to identify groups with similar behaviors or purchasing patterns for targeted marketing strategies fraud detection identify patterns in transactions that may Indicate fraudulent activity credit risk assessment
evaluate the credit worthiness of loan applicants based on their financial history and other factors operations management optimize decision making in areas like Inventory management Logistics and resource allocation healthare medical diagnosis support assist in diagnosing diseases by guiding clinicians through a series of questions and tests based on patient symptoms and medical history treatment Planning help determine the most suitable treatment options based on patient characteristics and disease severity disease risk prediction identify individuals at high risk of developing certain health conditions based on factors like lifestyle family history and medical data science and engineering fault diagnosis isolate the
cause of malfunctions or failures in complex systems by analyzing sensor data and system logs classification in Biology categorize species based on their characteristics or DNA sequences remote sensing analyze satellite imagery to classify land cover types or identify areas affected by natural disasters customer service troubleshooting guides create interactive decision trees to guide customers through troubleshooting steps for products or Services chatbots power automated chatbots that can categor iiz customer inquiries and provide appropriate responses reducing Weight times and improving support efficiency other applications game playing design AI opponents in games that can make strategic decisions based on the
state of the gain e-commerce personalized product recommendations based on user browsing behavior and past purchases Human Resources identify key factors influence employee retention and make informed decisions why decision trees Thrive here decision trees excel in these scenarios due to several Factors interpretability the decisionmaking process is transparent allowing humans to understand the reasoning behind the model's predictions handles diverse data accommodates both numerical and categorical features nonlinear relation [Music] Alex is intrigued by the relationship between the number of hours studied and the scores obtained by students Alex collected data from his peers about Their study hours and
respective test scores he wonders can we predict a student's score based on the number of hours they study let's leverage decision Tre regression to uncover this technically we're predicting a continuous outcome test score based on an independent variable study hours complex patterns this is like trying to fit the curved data set with a straight line in contrast a low bias model is more flexible allowing it to potentially Match intricate Trends in the data Noah is a botanist who has collected data about various plant species and their characteristics such as Leaf size and flower color Noah
is curious if he could predict a plant species based on these features here we'll utilize random forest and Ensemble learning method to help him classify plants technically we aim to classify plant species based on certain predictive variables using a random forest [Music] model the bias variance tradeoff the key challenge in machine learning lies in finding the right balance between bias and variance generally reducing bias increases variance and vice versa complex models tend towards low bias and high variance while simpler models tend towards the opposite the ideal model finds a sweet spot between underfitting High bias
and overfitting high variance a balance that Depends heavily on the nature of your specific problem and the tradeoff you're willing to make between flexibility and stability naive Bays versus logistic regression naive Bay is known for high bias it assumes features are independent which often isn't true however its Simplicity makes it less prone to overfitting low variance and computationally fast to train on the other hand logistic regression is more flexible low bias and can model complex Decision boundaries this comes at the risk of overfitting high variance especially with many features or little regularization when to choose
which if you prioritize speed and simplicity naive Bay might be a good starting point when your data relationships are unlikely to be simple and independent logistic regressions flexibility becomes valuable however if you choose logistic regression you need to actively manage overfitting potentially using techniques Like regularization the provided code serves to elucidate the intricate concepts of bias variance and the bias variance tradeoff by just opposing naive bays and logistic regression classifiers on a synthetic data set here's a cohesive explanation of the code's functionality to begin the script generates a synthetic data set comprising two classes arranged
in circular patterns The make circles function from s kid learn is employed for this purpose creating data that challenges the assumptions of naive Bays due to its nonlinear separability following data Generation The Script proceeds to train two distinct classifiers firstly aian naive Bas model is trained on the data set this Choice aligns with the high bias low variance characteristics of naive Bays given Simplicity and Assumption of Feature Independence secondly a logistic regression model is trained with regularization regularization is introduced to combat overfitting a concern for logistic regression due to its flexibility potentially leading to higher
variant once the models are trained the script visualizes their decision boundaries through plots showcasing the decision boundaries of both models it becomes evident that Naive Bas delineates a simpler linear boundary while logistic regression captures the data set's nonlinearity furthermore the script calculates the accuracy of each model on a heldout test set facilitating a comparative analysis of their performance a deeper understanding of the bias variance tradeoff emerges from this comparison naive Bay exhibits higher bias due to its simplifying assumptions resulting in a less complex Decision boundary on the contrary logistic regressions flexibility enables it to learn
the nonlinear pattern with lower bias potential albeit at the risk of overfitting the importance of contextual considerations becomes apparent while logistic regression often boasts lower bias naive bases Simplicity and computational efficiency May render it preferable in certain contexts model selection hinges on factors such as data set characteristics Computational constraints and the relative significance of interpretability versus Raw predictive performance it's crucial to acknowledge the limitations and caveats of the presented Example The observed results May Vary on different data sets or with alternative hyperparameter configurations additionally while decision boundary visualization AIDS comprehension accuracy metrics are equally essential
For comprehensive model evaluation finally the script underscores the significance of continuous learning in machine learning it advocates for a methodical approach involving experimentation with diverse models rigorous evaluation and judicious selection based on problem specific requ requirements and performance metric Tom is a movie Enthusiast who watches films across different genres and Records his feedback whether he Liked them or not he has noticed that whether he likes a film might depend on two aspects the movie's length and its genre technically we want to predict a binary outcome like or dislike based on the independent variables mov length
and genre Alex is intrigued by the relationship between the number of hours studied and the scores obtained by students Alex collected data from his peers about their study hours and respective test scores he wonders can we Predict a student's score based on the number of hours they study let's leverage decision tree regression to uncover this technically we're predicting a continuous outcome test score based on an independent variable study hours Let's dissect the code to understand its functionality importing libraries We Begin by importing necessary libraries numpy assists in numerical operations Matt plot lib facilitates Visualization and
decision tree regressor from Psychic learn is utilized for decision tree regression sample data definition following the library Imports the code defines sample data this data includes the number of hours studied and the corresponding test scores achieved each entry pairs a study hour with its corresponding test score forming the data set for training the regression model creating and training the model a Decision tree regression model is instantiated with a maximum depth set to three this parameter controls the maximum number of levels within the decision tree subsequently the model is trained using the provided study hours and
their corresponding test scores prediction after training the model is capable of making predictions an example Study Hour 5.5 hours is chosen and the model predicts the test score corresponding to this input based On its training plotting the decision tree the code then generates a visualization of the decision tree regression model this visualization elucidates the decision-making process of the model utilizing the provided features study hours plotting study hours versus test scores another plot is created to illustrate the relationship between study hours and test scores this scatter plot displays the actual data points While the regression line
portrays the predictions made by the decision regression model Additionally the predicted test score for the new study hour is highlighted displaying prediction finally the code prints out the predicted test score for the specified study hour now here are some key features sample data study that hours contains hours studied and test scores contains the corresponding test scores Creating and training model we create a decision tree regressor with the specified maximum depth to prevent overfitting and train it with dotfit using our data plotting the decision tree plot tree helps visualize the decision-making process of the model representing
splits based on study hours prediction and plotting we predict the test score for a new study hour value 5.5 in this example visualize the original data points the Decision tre's predicted scores and and the new prediction now here are a few conclusions we can make from the decision tree regressor visualization and the study hours versus test scores plot observations from the plot linear progression with step function the Orange Line representing the predicted scores demonstrates a clear step function a typical characteristic of decision tree regressors this indicates that the model provides constant Prediction within certain ranges
of study hours changing abruptly at specific thresholds where new rules splits apply prediction accuracy the red dots actual scores mostly align with the orange step function predicted scores suggesting that the decision tree model does a good job of capturing the general Trend in the data the close alignment also indicates that the model handles the nonlinear relationship between study Hour and test scores well within the constraints of its maximum depth specific prediction the plot marks a prediction for 5.5 hours of study green X this specific prediction falls on the step increase suggesting that additional study hours
Beyond a certain threshold significantly improve the predicted test score according to the model's training data conclusions model fit the decision tree appears well fitted to the range of data presented without signs of Overfitting or underfitting the choice of Maximum depth seems appropriate balancing model complexity and generalization utility of decision tree for educational data decision trees are useful for educational data like study hours and test scores because they can easily model thresholds EG minimum hours needed to achieve a certain score that are intuitive for educ ational planning and Interventions implications for students and Educators this model
can help in setting realistic study goals based on expected outcomes for instance Educators can advise students about the probable benefit of studying an additional hour based on the model's [Music] predictions potential for refinement while the current model provides valuable insights further refinement with additional features like type of Study materal material individual student Baseline performance Etc could enhance prediction accuracy testing the model on more diverse data sets or incorporating Ensemble methods like random forests could provide a more robust analysis and mitigate any variance not captured by a single decision tree visualization and interpretation the stepwise visualization
AIDS in understanding how additional study hours could lead to Increment m ments in test scores which is valuable for explaining Model Behavior to non-technical stakeholders where decision trees take root decision trees with their intuitive branching structure find use across various Industries and problem domains let's dive into some key areas where they prove particularly effective business and finance customer segmentation analyze customer data to identify groups with with similar Behaviors or purchasing patterns for targeted marketing strategies fraud detection identify patterns and transactions that may indicate fraudulent activity credit risk assessment evaluate the credit worthiness of loan applicants
based on their financial history and other factors operations management optimize decision making in areas like inventory management Logistics and resource allocation Health Care medical diagnosis Support assist in diagnosing diseases by guiding clinicians through a series of questions and tests based on patient symptoms and medical history treatment planning help determine the most suitable treatment options based on patient characteristics and disease severity disease risk prediction identify individuals at high risk of developing certain health conditions based on factors like lifestyle family history and medical data science and Engineering fault diagnosis isolate the cause of malfunctions or failures in
complex systems by analyzing sensor data and system logs classification in biology categorize species based on their characteristics or DNA sequences remote sensing analyze satellite imagery to classify land cover types or identify areas affected by natural disasters customer service troubleshooting guides create interactive decision trees to guide customers through troubleshooting Steps for products or Services chatbots power automated chatbots that can categorize customer inquiries and provide appropriate responses reducing weight times and improving support efficiency other applications game playing design AI opponents in game games that can make strategic decisions based on the state of the game eCommerce personalized
product recommendations based on user browsing behavior and past purchases Human Resources identify key Factors influencing employee retention and make informed decisions why decision trees Thrive here decision trees excel in these scenarios due to several factors interpretability the decision making process is transparent allowing humans to understand the reasoning behind the model's predictions handles diverse data accommodates both numerical and categorical features nonlinear relationships can capture complex nonlinear patterns within data Versatility applicable for both classification predicting a class label and regression [Music] meet Lucy a fitness coach who is curious about predicting her client's weight loss based on
their daily calorie intake and workout duration Lucy has data from past clients but recognizes that individual predictions might be prone to errors let's utilize bagging to create a more stable prediction model technically Will'll predict a continuous outcome weight loss based on two independent variables daily calorie intake and workout duration using bagging to reduce variance in predictions let's now go through the code importing libraries We Begin by importing the necessary libraries numpy is imported as NP to facilitate numerical operations matplot li. pyplot is imported as PLT for visualization purposes and gaan NB is imported from pyit
learn to implement The gajan naive Bas classifier sample data definition moving on the code defines sample data representing movie features and their corresponding likes the movies features array contains pairs of movie length and minutes and genre codes where each genre is encoded numerically EG zero for action one for Romance the movies likes array denotes whether each movie is liked one or disliked Zer creating and training the model A Gausian naive Bas model is instantiated using gausian NB creating an instance named model this model is then trained using the fit method with movies features as the
input features and movies likes as the target labels prediction following training the model is ready to make predictions an example new movie is defined with a length of 100 minutes and a genre code of one this new movie is represented as a numpy array named new movie the Model's predict method is then employed to predict whether Tom will like or dislike this new movie plotting to visualize the movie data the code generates a scatter plot using PLT do scatter existing movies are plotted using circles marker o with their length on the X x axis and
genre code on the Y AIS liked movies are depicted in one color while disliked movies are depicted in another Additionally the new movie is plotted with a different marker a red X To highlight his position displaying prediction finally the code prints a message indicating whether Tom will like the new movie if the predicted like value is one the message states that Tom will like the movie otherwise it suggests that he will will not here are several key features clients data contains daily calorie intake and workout duration and weight loss contains the corresponding weight loss train
test split we split the data into Training and test sets to validate the model's predictive performance creating and training model we instantiate bagging regressor with decision tree regressor as the base estimator and train it using dot fit with our training data prediction and evaluation we predict weight loss for the test data evaluating prediction quality with mean squared error msse visualizing one of the base estimators optionally visualize one tree from The Ensemble to understand Individual decision-making processes keeping in mind an individual tree may not perform well but collectively they produce stable predictions a breakdown of the
key points true weight loss this range 2 to 4.5 lb represents the actual weight loss experienced by the clients in the test set predicted weight loss this range 3.1 to 3.96 lb represents the model's predictions for weight loss in the test set mean squared Error msse 0.75 this metric measures the average squar difference between the predicted weight loss and the true weight loss loss a lower msse generally indicates better model performance in simpler terms the model predicts weight loss somewhat accurately but there are deviations between the predictions and the actual weight loss experienced by the
clients on average the model's predictions were off by 0.75 lb squared in our previous explorations of machine learning we've come to recognize the inherent strengths and limitations of individual models some models are masters of Simplicity offering quick interpretable results While others thrive in handling complex high-dimensional data sets yet all models are susceptible to the twin challenges of bias and variance this is where bagging enters as a powerful Ally leveraging the wisdom of crowds to forge Better predictive Frameworks applications of bagging regression problems imagine you're attempting to predict housing prices within a bustling City factors like
square footage location number of bedrooms and countless others collectively influence the price a single linear regression model might struggle to capture the intricate relationships between these features bagging comes to the Rescue by training Multiple regressors like decision trees on diverse samples of the data and averaging their predi c s we reduce variance and improve accuracy classification quests perhaps you're tasked with classifying customer reviews as positive or negative a lone naive Bay classifier might make oversimplified assumptions about word independence resulting in subpar performance bagging empowers us to assemble an ensemble of classifiers each Member of The
Ensemble casts a vote and the majority vote often yields a superior classification decision mitigating the shortcomings of any single model image recognition the vast world of image recognition presents unique challenges with high dimensional data convolutional neural networks cnns while remarkably powerful can fall prey to overfitting with bagging at our disposal we can create a council of independently trained SE CNN's where Each Network focuses on distinct subsets of the image data aggregating their predictions instills robustness and can significantly improve classification results harnessing the power of diversity the Cornerstone of bagging lies in cultivating diversity within its
Ensemble by constructing models on varying bootstrapped samples of the original data set each model develops slightly different biases this clever strategy combats bias through its Collective approach moreover the random nature of sampling reduces variance particularly when working with unstable algorithms like decision trees real world examples Healthcare in medical diagnosis where Precision is Paramount bagging is widely used ensembles of models trained on patient data often lead to enhanced accuracy in identifying diseases contributing to Better Health Care decision making Finance financial institutions employ Bagging for critical tasks such as fraud detection and risk assessment aggregated models built
with bagging techniques are frequently more efficient at detecting anomalies and spotting fraudulent patterns aiding in the protection of valuable assets environmental science bagged model are leveraged in tasks ranging from land cover classification to climate modeling the ability to create more stable and reliable predictions From diverse data sets and models proves invaluable when tackling complex environmental challenges remember while bagging empowers us with stronger predictive prowess it's not a Magic Bullet this video was sponsored by lunarch at lunarch we are all about making you ready for your dream job in Tech making data science and AI accessible
to everyone with is data science artificial intelligence or Engineering at Lun Tech Academy we have courses and boot camps to help you become a job ready professional we are here to help also businesses and schools and universities with a top notot training modernization with data science and AI corporate training including the latest topics like generative AI with lunar Tech learning is easy fun and super practical we care about providing an endtoend learning experience that is both practical and grounded in Fundamental knowledge our community is all about supporting each other making sure you get where you
want to go ready to start your Tech Journey lunner Tech is where you begin for students or aspiring data science and AI professionals visit Lun Tech Academy section to explore our courses and boot camps and just in general our programs businesses in Need for employee training upscaling or data science and AI Solutions should head to the technology Section on the lunch. page Enterprises looking for corporate training curriculum modernization and customize AI tools to enhance education please visit the lunch Enterprises section at lunch. for a free consultation and customize estimate join lunder and start building your
future one data point at a time random Forest provide an improvement over bag trees by way of a small tweak that decorrelates the trees as in bagging we build a number of Decision trees on bootstrap training samples but when building these decision trees each time a split in a tree is considered a random sample of M predictors is chosen as split candidates from the full set of P predictors the split is allowed to use only one of those m predictors a fresh and random sample of M predictors is taken at each split and typically we
choose MP that is the number of predictors considered at each split is approximately equal to the Square root of the the total number of predictors this is also the reason why random Forest is called random the main difference between bagging and random forests is the choice of predictor subset size M decorrelates the trees using a small value of M in building a random Forest will typically be helpful when we have a large number of correlated predictors so if you have a problem of multicolinearity RF is a good method to fix that problem so unlike in
Bagging in the case of random forest in each tree split not all P predictors are considered but only randomly selected M predictors from it this results in not similar trees being decorrelated and due to the fact that averaging decorrelated trees results in smaller variants random Forest is more accurate than bagging Noah is a botanist who has collected data about various plant speci species and their characteristics such as Leaf size and Flower color Noah is curious if he could predict a plant species based on these features here we'll utilize random forest and Ensemble learning method to
help him classify plants technically we aim to classify plant species based on certain predictive variables using a random forest model let's walk through the provided code step by step importing libraries the code starts by importing necessary libraries such as numpy for numerical operations matplot Lib for visualization random forest classifier from s kit learn for random Forest classification train test split for splitting the data and classification d report for evaluating classification performance data preparation the code defines two non aray plants ey CH features containing the features of plants Leaf size and flower color and plants are
species containing the corresponding species labels each row in Plants features represents a plant and the corresponding entry in Plants species denotes its species Z or one train test split the data set is split into training and testing sets using the train test split function this ensures that the model is trained on a portion of the data and evaluated on an unseen portion the split ratio is set to 75% training data and 25% testing data model initialization and training a random forest classifier model model is Initialized with 10 estimators trees and a random state of 42
for reproducibility the initialized model is then trained using the training data X train and Y train random Forest build multiple decision trees and combine their predictions to improve generalization performance prediction and evaluation the trained model is used to make predictions y spread on the test data X to test these predictions are then Evaluated using the classification forse report function which generates a detailed report including Precision recall F1 score and support for each class displaying prediction and evaluation the class ification report containing evaluation metrics is printed to the console providing insights into the model's performance visualization
two visualizations are generated to gain insights into the data and model scatter plot of species this plot visualizes the Distribution of plants features Leaf size and flower color for each species different marker shapes and colors represent different species making it easier to distinguish between them feature importance this horizontal bar plot visualizes the importance of each feature Leaf size and flower color in predicting plant species features with higher importance values contribute more to the model's decision-making process interpreting the scatter Plot the scatter plot shows the distribution of plant data points based on their Leaf size and
flower color coed the different marker colors green and red represent the two plant species 0 and 1 partial separation there appears to be some separation between the green and red data points suggesting that leaf size and flower color might be partially effective in distinguishing between the two plant species overlap we can also observe some Overlap between the green and red clusters particularly towards the center of the plot this overlap indicates that some plants might have similar Leaf size and flower color features regardless of species this overlap may lead to some classification Errors By the random
forest model potential challenges the overlap between the two species in the scatter plot suggest that the model might struggle to accurately classify plants That fall in these overlapping areas overall the scatter plot provides a visual representation of the data that can be helpful in interpreting the performance of the random forest model key points from the bar chart the bar chart depicts the feature importances for leaf size and flower color the height of the bar represents the feature's importance in this case the bar for flower color is considerably higher than the one for leaf Size interpretation
this visualization confirms what we might have inferred from the SC scatter plot flower color carries more weight in the model's decisionmaking process for plant species classification this is likely because the flower color data seems to have a clearer separation between the two species green and red dots in the scatter plot overall Conclusions the random forest model partially leverages both Leaf size and flower color for classification but flower color appears to be the more dominant feature the model's performance might be limited by the overlap between the two species in the data particularly for plants with similar
Leaf size and flower color values random Forest a versatile machine learning algorithm find applications across various domains including finance And banking healthc care eCommerce marketing and more more finance and banking in fraud detection random forests excel in spotting irregular patterns in transactions leveraging features such as transaction amount location frequency and Merchant type to flag potential fraudulent activities for credit risk assessment these models evaluate borrowers credit worthiness by analyzing factors like income debt to income ratio credit history and Employment status predicting the likelihood of default accurately additionally in stock market prediction random Forest leverage historical stock
prices company fundamentals new sentiment and market trends to forecast future prices though this remains challenging healthc care in medical diagnosis random forests classify patients based on various Medical Data like test results symptoms and patient history Aiding healthc care providers in making informed decisions for drug Discovery and development researchers utilize these models to identify potential drug candidates by analyzing molecular structures gene expression data and existing drug information furthermore in personalized medicine random forests help tailor treatments to individual patients by considering factors like genetics Medical history and lifestyle enabling predictions of patient response to specific therapies or
medication dosages e-commerce and marketing customer segmentation benefits from random forests as they group customers based on purchase history browsing Behavior demographics Etc facilitating targeted marketing and personalization efforts for product recommendations these models analyze customer purchase patterns product ratings and search history to Offer relevant product suggestions thereby enhancing user experience and boosting sales moreover inur prediction random forests identify customers at risk of leaving by examining usage patterns service interactions and demographic data allowing for proactive retention strategies other notable areas random forests find applications in environmental science aiding tasks like land cover classifications using Satellite imagery monitoring
deforestation and assessing climate change impact in image analysis they assist in image classification tasks such as facial recognition object detection in self-driving cars and analyzing Medical Imaging scans furthermore in network intrusion detection random forests help identify suspicious Network traff graic patterns by analyzing features like source and Destination IP addresses protocols used and packet sizes contributing to cyber security efforts it's important to note that while random forests are generally robust to overfitting due to their Ensemble nature careful feature selection and hyperparameter tuning are crucial for Optimal Performance additionally random forests work well with both categorical with
with missing values or [Music] Outliers like bagging which averages correlated decision trees and random Forest which averages uncorrelated decision trees boosting aims to improve the predictions resulting from a decision tree boosting is a supervised machine learning model that can be used for both regression and classification problems unlike begging or random for Forest where the trees are built independently from each other using one of the B bootstrapped samples copy of The initial training date in boosting the trees are built sequentially and dependent on each other each tree is grown using information from previously grown trees boosting
does not involve bootstrap sampling instead each tree fits on a modified version of the original data set it's a method of converting weak Learners into strong Learners in boosting each new tree is a fit on a modified version of the original data set so unlike fitting a Single large decision tree to the data which amounts to fitting the data hard and potentially overfitting the boosting approach instead learns slowly given the current model we fit a decision tree to the residuals from the model that is we fit a tree using the current residuals right rather than
the outcome y as the response we then add this new decision tree into the fitted function in order to update the residuals each of these trees can be rather small with just a Few terminal nodes determined by the parameter D in the algorithm now let's have a look at three most popular boosting models in machine learning the first Ensemble algorithm we will look into today is addab Boost like in all boosting techniques in the case of adab boost the trees are built using the information from the previous Tree and more specifically part of the tree
which didn't perform well this is called the weak learner decision stump this Decision stump is built using only a single predictor and not all predictors to perform the prediction so adaboost combines weak Learners to make classifications and each stump is made by us using the previous stumps errors here is the step-by-step plan for building an adaab boost model Step One initial weight assignment assign equal weight to all observations in the sample where this weight represents the importance of the observations being Correctly classified one TOs and all samples are equally important at this stage step two
optimal predictor selection the first stump is built by obtaining the RSS in case of regression or guinea index entropy in case of classification for each predictor picking the stump that does the best job in terms of prediction accuracy the stump with the smallest RSS or guinea entropy is selected as the next tree step three Computing stumps weight Based on stumps we increase the weight of the observations which have been incorrectly predicted and decrease the the remaining observations which had higher accuracy or have been correctly classified so that the next stump will have higher importance of
correctly predicted the value of this observation step five building the next stump based on updated weights using weighted Genie index to choose the next stump step six combining B stumps then All the stumps are combined while taking into account their importance weighted some imagine a scenario where we aim to predict house prices based on certain features like the number of rooms and age of the house for this example let's generate synthetic data where num rooms the number of rooms in the house house age the age of the house in years price the price of the
house in $1,000 importing libraries the code Starts by importing necessary libraries I numpy for numerical operations pandas for data manipulation math plot lip for visualization and specific modules from psyit learn for machine learning tasks like model selection Ensemble learning and evaluation metrics data generation synthetic data is generated to mimic a real world scenario random numbers are generated to represent the number of rooms in a house num rooms the age of the house house age and noise the price Of the house price is then calculated based on a linear relationship with the number of rooms age
of the house and added noise data visualization the generated data is visualized using Scatter Plots two Scatter Plots are created one showing the relationship between the number of rooms and the house price and the other show the relationship between the age of the house and the house price this visualization helps in Understanding the distribution and relationships between features and the target variable price data splitting the data is split into training and testing sets using the train test split function from Psychic learn this step is essential for training the model on one subset of data training
set and evaluating its performance on another subset testing set model initialization and training an adaboost regressor model is initialized with specific parameters Like the number of estimators decision trees set to 100 and a random seed for reproducibility the model is then trained using the training data X train and Y train model evaluation once trained the model makes predictions on the test data X test the mean squared error MSE and root mean squared error rmse metrics are calculated to evaluate the model's performance compared to the actual house prices wide test result visualization the actual house prices
Whyi test and the predicted prices are visualized using a scatter plot the plot also includes a diagonal line representing perfect predictions this visualization AIDS in assessing how closely the models predictions aligned with the actual prices the scatter floods provided shows two key relationships number of rooms versus price this is represented by the green data points there appears to be a positive correlation meaning as the Number of rooms increases the price of the house also tends to increase this is likely because houses with more rooms are generally larger and more expensive house age versus price this
is represented by the red data points the relationship here is less clear there might be a slight negative correlation where newer houses lower House age tend to be more expensive however the data points are scattered making it difficult to draw a definitive Conclusion additional points to consider the data points themselves show some variation around the general Trends this indicates that there might be other factors influencing house price besides the number of rooms and House age EG location amenities it's important to note that this is simulated data and real estate prices can be influenced by many
complex factors overall the scatter plot suggests that the number of rooms has a positive correlation with house Price while the relationship between House age and price is less clear-cut the Scatter Plots you saw provide valuable insights into the data but adaab boost plays a crucial role in uncovering the underlying relationship between features number of rooms House age and price here's how capturing complex relationships the data exhibits some scatter around the general Trends suggesting the price isn't perfectly Explained by just number of rooms and House age adaab boost excels in handling such scenarios it's an ensemble
method that combines multiple we weak decision trees into a stronger final model these decision trees can effectively capture nonlinear patterns in the data providing a more nuanced understanding of the price feature relationship than a single linear model focus on informative features while both Scatter Plots offer Clues adaboost goes beyond simply visualizing correlations it analyzes the data to determine which features number of rooms House age are most informative for predicting price by focusing on these features during the decision tree creation process adaboost prioritizes the factors that have the strongest influence on price iterative refinement adaboost Works
in a stage manner it trains a series of weak decision trees each focusing on correcting the errors Of the previous one by visualizing the data we can get a general sense of the trends but adabo iteratively refines its understanding through these multiple stages ultimately leading to a more accurate prediction model in essence the Scatter Plots provide a starting point for understanding the data but idab boost acts as a powerful tool to leverage that initial understanding and build a more robust model that captures the complexities of the price feature Relationship Ada boost and gradient boosting are
very similar to each other but compared to Ada boost which starts the process by selecting a stump and continuing to build it by using the weak Learners from the previous stump gradient boosting starts with a single leaf instead of a tree of a stump the outcome corresponding to this chosen Leaf is then an initial guess for the outcome variable like in the case of adaab boost gradient boosting uses the Previous stumps errors to build a tree but unlike in adaboost the trees that gradient boost builds are larger than a stump that's a parameter where we
set a max number of leaves to make sure the tree is not overfitting gradient boosting uses the learning rate to scale the gradient contributions gradient boosting is based on the idea that taking lots of small steps in the right direction gradients will result in lower variance For testing data the major difference between the adaboost and gradient boosting algorithms is how the two identify the shortcomings of weak Lear learners for example decision trees while the adaboost model identifies the shortcomings by using highweight data points gradient boosting performs The Same by using gradients in the loss function
why a a e needs a special mention as it is the error term the loss function is a measure indicating how Good a model's coefficients are at fitting the underlying data a logical understanding of loss function would depend on what we are trying to optimize early stopping the special process of tuning the number of iterations for an algorithm such as GBM and random Forest is called early stopping a phenomenon we touched upon when discussing the decision trees early stopping performs model optimization by monitoring the model's performance on a separate test Data set and stopping the
training procedure once the performance on the test data stops improving Beyond a certain number of iterations it avoids overfitting by attempting to automatically select the inflection point where performance on the test data set starts to decrease while performance on the training data set continues to improve as the model starts to overfit in the context of GBM early stopping can be based either on an outof Bag sample set o or cross validation CV like mentioned earlier the ideal time to stop training the model is when the validation error has decreased and started to stabilize before it
starts increasing due to overfitting to build GBM follow this step-by-step process step one train the model on the existing data to predict the outcome variable step two compute the error rate using the predictions and the real values pseudo residual step three use the Existing features and the pseudo residual as the outcome variable to predict the residuals again step four use the predicted residuals to update the predictions from step one while scaling this contribution to the tree with a learning rate hyper parameter step five repeat steps 1 to 4 the process of updating the pseudo residuals
and the tree while scaling with the learning rate to move slowly in the right direction until there is no Longer on Improvement or we come to our stopping rule the idea is that each time we add a new Scale tree to the model the residuals should get smaller let's break down the provided code step by step model initialization and training the code initializes a gradient boosting regressor model model GBM with specific parameters such as the number of estimators trees set to 100 learning rate set to 0.1 maximum depth of each Tree set to one and
a random seed for reproducibility this model is then trained using the training data XT train and Y train gradient boosting Builds an ensemble of weak Learners decision trees in this case sequentially with each tree learning from the errors made by the previous ones predictions after training the model is used to make predictions on the test data X test the predict method is applied to model GBM and the predicted house prices are stored in the Variable predictions model evaluation to assess the model's performance the mean squared error msse and root mean squared error rmsc metrics are
calculated these metrics quantify the average square difference between the actual house prices YY test and the predicted prices predictions lower values indicate better model performance the calculated MSE and rmse are then printed to the console using formatted strings result visualization finally the code generates A scatter plot to visualize the relationship between the actual house prices y test and the predicted prices predictions the scatter plot displays the actual prices on the x-axis and the predicted prices on the Y AIS additionally a diagonal Dash line is drawn to represent perfect predictions where actual prices equal predicted prices
this visualization helps in assessing how closely the model's predictions align with the actual prices Providing insights into the model's accuracy and potential areas for improvement now if we compare this result with the result that we got before with adaab Boost we can say the following scatter plot characteristics both plots follow a similar structure with predicted prices on the Y AIS and actual prices on the x-axis points represent individual predictions with ideal predictions lying on a dashed diagonal line indicating where predicted Prices equal actual prices performance indication a a boost the points are spread around the
diagonal but show a trend of underestimating higher values as seen from the concentration of points below the line as ual prices increase GBM the points are more tightly clustered around the diagonal line throughout the range of values suggesting that GBM predicts both low and high prices with better accuracy than Ada boost algorithm Effectiveness The GBM model generally appears to perform better given the closer clustering of points around the identity line this indicates a more accurate prediction across the range of house PR prices the auto boost plot shows greater deviation from the line especially at higher
price points suggesting less consistency in prediction accuracy across the price Spectrum data distribution both models handled the Full range of data from about 150 to 500 in units consistent across both models but the GBM seems to manage the upper range more effectively overall from these plug thoughts we can infer that GBM provides a more accurate and consistent prediction for house prices compared to Ida boost particularly at higher price points where idab boost tends to underestimate values one of the most popular boosting or Ensemble algorithms Is Extreme gradient boosting XG boost the difference between the GBM
and XG boost is that in the case of XG boost the second order derivatives are calculated second gradients this provides more information about the direction of gradients and how to get to the minimum of the loss function remember that this is needed to identify the weak learner and improve the model by improving the weak Learners the idea behind XG boost is that the Second order derivative tends to be more precise in terms of finding the accurate Direction like the Ida boost XG boost applies advanced regularization in the form of L1 or L2 Norms to address
overfitting unlike Ada boost XG boost is parallelizable due to its special caching mechanism making it convenient to handle large and complex data sets also to speed up the training XG boost uses an approximate greedy algorithm to consider only a limited amount of Thresholds for splitting the nodes of the trees to build an XG boost model follow this step-by-step process step one fit a single decision Tree in this step the loss function is calculated for example ndcg to evaluate the model step two add the second tree this is done such that when this second tree is
added to the model it lowers the loss function based on first and second order derivatives compared to the previous tree where we also used Learning rate at EA step three finding the direction of the next move using the first degree and second degree derivatives we can find the direction in which the loss function decreases the largest this is basically the gradient of the loss function with regard to the output of the previous model step four splitting the nodes to split the observations XG boost uses an approximate greedy algorithm about three approximate weighted quantiles usually Quantiles
that have a similar sum of weights for finding the split value of the nodes it doesn't consider all the candidate thresholds but instead it uses the quantiles of that predictor only optimal learning rate can be determined by using cross validation and grid search Imagine you have a data set containing information about various houses and their prices the data set includes features like the number of bedrooms bathrooms the total area the Year built and so on and you want to predict the price price of a house based on these features let's dissect the provided code step
by step model initialization and training the code starts by importing the XG boost Library import XG boost as xgb XG boost is a powerful implementation of gradient boosting machines next an XG boost regressor model model xgb is initialized with specific parameters the objective is set to regi Square Dera Indicating that the model aims to minimize the mean squared error loss function Additionally the number of estimators trees is set to 100 and a seed value of 42 is specified for reproducibility the initialized model is then trained using the training data xxe train and YC train XG
boost Builds an ensemble of decision trees sequentially optimizing a specified loss function predictions after training the trained model model Jo xgb is used to make Predictions on the test data xas test the predict method is applied to the model and the predicted house prices are stored in the variable predictions model evaluation the code proceeds to evaluate the model's performance using two common metrics mean squared error MSE and root mean squared error rmse these metrics quantify the average squar difference between the actual house prices whyde Quest test and the Predicted prices predictions lower values indicate better
model performance the calculated msse and rmsse are then printed to the console using formatted strings result visualization lastly the code generates a scatter plot to visually compare the actual house prices y test with the predicted prices predictions the scatter plot displays the actual prices on the X x axis and the predicted prices on the Y AIS additionally a diagonal dash line is Drawn to represent perfect predictions where actual prices equal predicted prices this visualization AIDS in assessing the model's accuracy by examining how closely the predicted prices align with the actual prices providing insights into the
model's performance now let's compare adaab boost GBM XG boost here's how they compare adaboost the adaab Boost plot shows predictions that tend to underestimate the actual prices Especially as the values increase this is evident from the larger number of points lying below the diagonal line in the higher price ranges the overall fit to the diagonal is less tight suggesting higher prediction errors or bias particularly for higher priced houses GBM the GBM model produces predic that are generally closer to the diagonal across all price ranges this indicates better accuracy and consistency in predicting both lower and
higher priced Houses compared to adab boost the points in the GBM plot are more tightly clustered around the diagonal indicating lower variance in prediction errors XG boost the XG boost plot also shows a tight clustering of points around the diagonal similar to G GBM this suggests a high level of accuracy unlike GBM the XG boost plot seems to slightly overestimate the lowest priced houses while matching or slightly underestimating the highest priced Houses but still maintains a close adherence to the diagonal line summary of comparison accuracy and consistency both GBM and XG boost exhibit high accuracy
with their predictions closely clustered around the diagonal line showing their effectiveness in both lower and higher price predictions Ada boost however shows more variance and a tendency to underestimate particularly at higher price points performance at different Price ranges GBM and XG boost handle extremes and prices better than Ada boost which struggles with underestimation as prices increase General predictive performance XG boost and GBM are quite comp comparable with slight differences in how they handle the very low and very high ends of the price Spectrum ID boost appears to be less reliable especially for higher priced properties
from these observations GBM And XG boost seem more suitable for scenarios where precise and consistent predictions across a wide range of house prices are critical adaab boost might be more prone to prediction errors particularly in higher price brackets this video was sponsored by lunar Tech at lunar Tech we are all about making you ready for your dream job in Tech making data science and AI accessible to everyone with its data science artificial intelligence or engineering At lunar Tech Academy we have courses and boot camps to help you become a job ready professional we are here
to help also businesses and schools and universities with a top notot training modernization with data science and AI corporate training including the latest topics like generative AI with lunar Tech learning is easy fun and super practical we care about providing an endtoend learning experience that is both practical and grounded in Fundamental knowledge our community is all about supporting each other making sure you get where you want to go ready to start your Tech Journey lunner Tech is where you begin for students or aspiring data science and AI professionals visit Lun Tech Academy section to explore
our courses and boot camps and just in general our programs businesses in Need for employee training upscaling or data science and AI Solutions should head to the technology Section on the learner tech. a page Enterprises looking for corporate training curriculum modernization and customize AI tools to enhance education please visit the lunarch Enterprises section at lunch. for a free consultation and customize estimate join Lun Tech and start building your future one data point at a time hi I'm vah and in this project we will learn how to understand your customers better trck sales Patterns and show
those results if you like working with data or own the store this video will show you how to use information to make better choices and get better results you will speed your customers into smaller groups based on how they shop this helps you send the right messages to the right people and give them offers they would like loyal customers are the best you will use data to find your biggest supporters and those who are ready to spend more then You can reward your your best customers with programs that fit their shopping habits this makes them
happy and stops them from going to other stores we will use data to guess what people will buy and when they will buy it we will find sales patterns in among different items and figure out what cool new products people will want this lets you always have the right stuff at the right time you won't have too many items everything will sell and customers will be Surprised by how well you know what they need we'll look at how sales change throughout the year will this helps you plan for busy times can slow downs early and
know exactly when to have big sales we will use location data and what people say about you to find places where sales are going well and where you could grow you will even show it all on the map this helps you spend your advertising money wisely find great spots from your stores and even choose The perfect things to sell in each place so let's get started all right there's no go with the D I will be using so we are using the superstore cells DS set and it has 9,800 rows and the columns order ID
order date ship date shipping mode standard class second class or other classes and the customer ID with them custom name the segment meaning um who bought the product whether the consumer a corporate or home Office and the clients mainly come from the United States and it's also specified from which city of the United States they come from so we shall import this to our laab and start working on it okay so let's now import the necessary Pyon libraries we import pandas as PD also import numai SMP import mold lip iile import caborn as SMS so
let's also import the da and we will be using copy perfect so this is how it looked Like on kagle and this is also how it looks like when we have imported so let's now look up the frames larger info so everything seems to be consistent but the postal code it seems that 11 Post CES are missing okay so what we can do is to fill in those no values [Music] okay so as you can see we have replaced the no uh postal codes customers that didn't have any postal code and we have Fo the
zero inside it all right so let's now move on to checking for duplicates if you have to duplicate do some figure that in zero and so let's now see if they're actually duplicates and if there are duplicates we print duplicates exist and if there are not BL print know duplicat found all right so move as you can see there exists no duplicate so let's move on to customer segmentation let's first create a variable name types of Customers and let's extract out of our D Frame called segment as you can see from our data frame we
have a married segment within our data this segment includes a list of the types of customers in our data frame we have both consumer and corporate customers so let's get started with customer segmentation the main problem is that many large businesses struggle to understand the contribution and importance of their various customer Segments they often lack precise information about their main buyers relying on intuition rather than data this leads to misallocation of resources resulting in Revenue loss and decreased customer satisfaction for example if your store primarily caters to Consumers it's crucial to tailor your marketing and
customer satisfaction efforts to resonate with their needs and preferences by focusing your resources on understanding and Catering to your Consumer base you can avoid misallocating resources to large corporates this ensures you're providing a satisfying customer experience for your primary demographic ultimately leading to increased customer loyalty and revenue growth and we can also you can create a part chart pie chart or barar from it to clearly illustrate the revenue contribution of each customer segment and this will allow us to tailor more of our marketing resources our Customers satisfaction resources towards once you've completed customer segmentation The
Next Step depends on your strategic goals here are a few ways to proceed focus on your most valuable segment if your existing customer segmentation reveals a particularly profitable segment such as consumers tailor your marketing product offerings and customer service to deepen your engagement with that group Target new segments if you want to attract more Corpor or home offices you'll need to understand their unique needs and pain points start by researching these segments what are their challenges what solutions would appeal to them develop tailored messaging and consider offering specialized products or services to attract these new
customer types all right so let's get started [Music] so this will extract the types of customers from data frame perfect so It's consumer corporate and home office those are all the um variables that are in our data frame all right so let's count the unique values in our segment and you will do this by number customer most so what this meaning does is it counts unique value in our segment and resets the index to turn them into a column and then we can correct the renaming of columns so we want to give our segment the
name as like total Customer or type of customer I will go with the type of customer so we will say number of customers is equal to number of customers that rename and want to call the call on which is the name segment we want to rename it to to type of customer now if you want to print that print number of customers there are 5,100 101 consumers and for corporate there are like 2,953 corporate buyers and and 1,746 home offices and if you want to create a P chart out of this we can plot it
by saying PL pip number of customers and want to base a pie chart on the account and want to label number of customers total custom Perfect all right so from as you can see we had the r new uh type of customer total customer so you can see that from this uh pie chart our main consumer segment is 52% 3% of our orders come from corporates and 18% from all offices You can see who we have to exactly focus on which are consumers while consumers hold the majority focusing solely on them Overlook significant potential within
the corporate and home office segments let's explore how to balance resource allocation for all three segments to maximize growth to gain even deeper insights we should integrate our customer data with sales figures this analysis will help us identify which segments generate the most Revenue per Customer average order value and overall profitability customer lifetime value additionally we can segment customers by purchase frequency and basket size to understand their buying Behavior within each segment here are some additional questions to consider for a more comprehensive analysis customer acquisition cost CAC how much does it cost to acquire a
customer in each segment customer satisfaction how satisfied are customers in each segment Turn rate what is the rate at which customers leave in each segment by analyzing these factors alongside revenue and customer lifetime value we can create a customer segmentation model that prioritizes segments based on their overall value and growth potential we can and also PL a bar graph for the total sales for each customer type and group the data by the segment column and calculate the total Sals for each segment and you want to do this by so Right now you don't see the
exact sales numbers the bar chart you can see the exact sales numbers for each customer type so let's PA it what for [Music] so there are around 1.2 million from our consumers and we have around 600 or 700,000 on we corporates now we can also PL out uh barware from This which means plot out bar sales per segment customer type type of customer sales hair segment color sale this bar chart effectively illustrates the distribution of sales across our customer segments consumers account for the largest portion of sales 1.2 million followed by corporates 1.0 million and
Home Offices 0.8 million while the chart is clear a deeper analysis can help us optimize our marketing efforts customer lifetime Value CLTV calculate the CLTV of each segment to identify which segments generate the most Revenue over time this will help prioritize customer segments for marketing efforts for example if you find that the home office segment has a higher CLTV than the Consumer segment you may want to invest more resources in marketing campaigns targeting home office customers market research conduct market research to understand the Specific needs and preferences of each customer segment this will inform the
development of targeted marketing campaigns for instance you might discover that consumers in your data are price sensitive while corporate customers are more interested in bulk discounts and reliable service you can use this knowledge to tailor your marketing messages to each segment average order value analyze average order value by segment to identify Opportunities to increase Revenue per customer let's say your analysis reveals that corporate customers have a higher average order value than consumers you could develop marketing campaigns that encourage consumers to purchase bundles or higher price products to increase their average order value customer acquisition cost
CAC how much does it cost to acquire a customer in each segment knowing CAC can help determine the return on investment Roi for Marketing efforts here's an example let's say it cost $100 to acquire a new corporate customer but only $20 to acquire a new consumer customer if the CLTV customer lifetime value of a corporate customer is sign significantly higher than the CLTV of a consumer customer then spending $100 to acquire a corporate customer may still be profitable however if the CLTV of the corporate customer is only slightly higher than the CLTV of the consumer
Customer you may want to focus your marketing efforts on acquiring more consumers because the cost of acquisition is much lower customer satisfaction action how satisfied are customers in each segment understanding satisfaction levels can help identify areas for improvement and reduce churn here's an example you can conduct surveys or collect customer feedback to understand satisfaction levels if you find that corporate customers are less Satisfied than consumer customers you may want to investigate the reasons for their dis satisfaction and make changes to improve their experience this could involve improving your customer service offering more competitive pricing for
corporate customers or developing products or services that better meet the needs of corporate customers we can also create a pie chart for our sales which you can do by pled pie p ey sales hair segment purple sales And we name it labels special to sales per segment C type of customer type of customer 51% of our sales come from our consumers 30% from our corporates and 19% on home offices all right so let's move on to the customer loyalty as a business you want to make sure that your most loyal customers stay happy this will
make sure that those customers keep on coming back keep on bringing new people and also placing new Orders so you will decrease the cost on acquisition of new customers because there will be already existing customers and you also be able to make sure that your Revenue either they at the same level or increases by keeping your most loyal customers happy and you want to do that as a business business now we can do this by either the following ways we can rank the most loyal customers by the amount of orders they have placed or the
total uh they have spent you have Analyzed your data pinpointing your 30 most lawyer customers this represents a significant opportunity to strengthen these relationships and maximize their lifetime value here's a powerflow pro design a targeted email specifically for those high value segments for actively offer personalized support with inquiries such as how can we assist you today this demonstrates your commitment to their success proactively addressing potential issues And fostering a deep sense of loyalty loyalty programs consider a tiered loyalty program that offers exclusive rewards T to your most valuable customers this include ear access to new
products personalized discounts or even Point based reward systems personalized experiences leverage your data insight to go beyond email consider personalized website recommendations targeted promotions based on past purchase history or even Handwritten thank you notes for high value customers customer feedback loops make sure your top customers feel hurt Implement surveys or invite them to participate in exclusive Focus groups this demonstrates you value their input and are actively using feedback to improve the customer experience Community Building depending on your business model fostering a community among your most loyal customers can create a sense of belonging this could
Involve access to online forums exclusive events or opportunities to network with like-minded individuals now this strategy extends Beyond on customer satisfaction prioritizing the experience of your top customers that directly correlates with increased retention positive referrals and ultimately improves Revenue now um let's dive deeper and see who are our most loyal customers all right so let's now get started with that let's create The variable with the name all so let's first display the first three roow of our dat frame so as you can see there is a row called sales or the con con sales and
each customer has a specific ID with a specific name so if you can count of the number of times this shows up then you also have the number of total orders which then you can which you can use later however you want so let's start with doing that [Music] Now let's rename the columns you want the column order ID which is where is the order ID what right here under the order ID so will be named uh total orders now we want to rename the columns that are equal to order ID in this column this
must be renamed to Total ERS place is equal to True okay so now let's identify the repeat customers customers with order frequency greater than one so repeat customers are equal to customers Order frequency c order frequency total orders and like I said want to make sure it's equal it's great than one it's equal great than one perfect now we can we want to organize this in a way that is the sting can do that by saying repeat customer sort it or repeat customers that sort values perfect now let's print this out print repeat customers sorted.
head 12 want to display our top 12 customers reset in next so the customer with name William Brown who is a consumer has placed to Total 35 orders so this is the list of your top how many customers and as a business or as a Superstore you can identify exactly the number of the total orders a person or a business has to place in order to be considered a lower customer and then according to that you can tailor your services to it now the data clearly reveals that a small group of customers Place orders with
considerably higher frequency 30 plus we Have a William problem with 35 orders and other home office customers with 34 and many consumers and one corporate with 32 so it shows clearly that we have a loyal group of customers there's also significant potential for our home office segment several of our most loyal customers belong to the home office segment now this implies that the home office segment has a strong potential for customer loyalty and deserves targeted marketing efforts it also shows That we just don't have like one dominant group of loyal customers we have home offices
consumers and corporates while there are many consumers it doesn't mean that we have to focus on one segment it means that we still have to devise a plan that caters to our multiple segments so some recommendations now we can prioritize loal customers segment customers by a your frequency and uh we can develop exclusive offers Rewards or Early Access programs tailor to our most um all customers so for example we can provide them exclusive discounts tier reward programs and earlier access and we can also Target uh more home offices because we see that home offices um
keep on coming back and we are able to satisfy few of the home offices that mean that means we can we have catered to their needs and provided a good enough service for them to keep on coming back that means our product is Great for home offices that means we can Target more home offices using content marketing social media ads or other type of marketing strategies and we can also analyze our the behavior with the way we provide service to our um to these customers and because it worked out pretty well and if we provide
this kind of service to our newcoming customers then we increase the chance that they also become a loyer customer so those are like several conclusions we can Make now we can also identify loyal customers by sales so this is uh identified them by total number of orders they have place but we can also use amount of sales so the total amount to identify them because a person can come and place 35 AOS but if they play 35 $1 order then obviously that's just 35 bucks now this doesn't say anything about the sales amount so I
um ideally you want to organize it by the sales amount to be able to um identify the Actual top spending and loyal customers or that said you when there is a significant customer so let's say someone has spent like 25,000 that can be done also in one order so that doesn't mean that it's a repeated customer it's it's just a top spender now um let's start with identifying our top spending customers so let's first create a variable customer sales go to data frame do Go byy customer ID want their customer ID want to also see
the Name and also what type of customer they are this segment and we want to do it by sales and we want to sum that sum those all and we don't want to resend the index and now let's identify our top Spenders by having them ranked descendingly meaning our top spans will be rank all the way up customer customer sales at the S values by equal so Sean Miller has spent the most who is from home office using total amount of 25,000 USD William Brown has placed the Most number of orders which are 35 but
Willi Brown is nowhere to be found here the same as Sean Miller he has spent he's the he's a customer who has spent the most in our Superstore but is also nowhere to be found here meaning that the repeated customers doesn't really Define their spending habits so if you it depending on the way you you run a super store now obviously I would want to I would want our customer to come back but I would dedicate my Resources to the customers who spend the most because those are the customers who BR bring the most business
to my to me meaning those are the customers I have to keep uh happy so the number the total number of orders is great but it doesn't really speak that much about their spending habits and about their value to your store all right let's now go over to the next chapter which is shipping now as a super St you also want to know what shipping methods customers Prefer and which are the most cost effective and reliable and overall knowing this impacts your customer satisfaction and also meaning it also has a great impact on your Revenue
so that so for example Amazon has mov the shipping methods but it has the most popular shipping method which keeps the most amount of customers happy and it also makes uh Amazon the most amount of money so as a superstar you want to know which one of your Shipping methods is the most reliable oh we create the variable the type all customer so our shipping model let's create a variable we will use the T of the DAT frame ship mode we want to count those values and of course we want to reset the index [Music]
so our standard class is the most popular by it's almost like four times more popular my first class is go east And the same day day of first class is one least so let's create a pie chart of this PL pie shipping model all right so this is our class side class or these are like the shipping methods the most popular one is standard class which is 60% of the orders use style class shipping and rest is like 40% so as a Superstore or as any store you invest in your shipping so you end up
buying some kind of deals with uh delivery um companies like thehl you PS and others And sometimes you end up recommending the wrong option to your customer so let's say second class is fast but it ends up costing the customer way too much the customer ends up not buying your product and this decreases the as you can see this fure store but for if know that standup class is the most popular option then you can have like a button saying this is our most popular option which is stand up class and most of the time
people choose the most Popular option so this will help you s this will help the superstore save the cost of investment into these others or dedicate the amount of resources that each class brings and it also allows the superstore to recommend it's most popular option which is a class so the problem that many super stores have because many stores have um soures in many locations in many um States but they don't know how much how well each is performing on a dashboard For example you could have that but they have no idea how well each
of the stores in each state are performing and leaving them with cless where where there is an underperformance or for example where they can where there is a high potential area in which they can open a new store so let's move on to this chapter which is geographical analysis so many stores have hard time in identifying high potential areas or also identifying Stores that are underperforming so things like Walmart Target they have like many brand es and they they will want to know how well each branch is doing and the perfect way to do this
is by counting up the number of sales for each City the number of sales for each state and then this will allow you to see which of the states or which of the cities is performing the best and which of them is performing the least and dedicate your resources accordingly so Let's say if one city the store is simply just losing money for years or for more then you want to adjust your strategy according to that so maybe you will want to close this store or adjust it in a way so it starts bringing in
more profit or Revenue well so let's get started with that all right so as you can see the most popular state is California and the least popular is New Jersey so maybe you can go over this and let's say in few of The States where there's still high potential for a profitable store you can identify that's in Washington and calculate all right so maybe from from this you can see that Washington is performing forth or New Jersey is performing like the least of our top 20 from this You can conclude that you might have to
work on New Jersey more to increase the order count this also allows you to increase the Revenue or you can see that the California is your most popular option so you might want to keep California happy and you can also do it person City so City TF City Val counts reset next print City that had the most top 50 top 15 so the most popular city is New York with the order count of 891 and uh then Los Angeles and Jackson is the least popular out of our top 15 and you can also increase this
to the 25 so not only can you can f Fus on the States but for each state you can also focus on the the city that's underperforming or overperforming so this allows you to also dedicate your resources to the to the city that you want maybe to increase your revenue or increase your potential or maybe there is like a city for example Long Beach where there's high potential but you're not using any of your resources now we can Also uh organize it sales per state let's say state sales so previously we did by order account
and we can also do it for state sales the Su up and then we set index you want to rank it the call call perfect so as you can see our still our most popular state is California and then New York work and then it changes it and doesn't change yet taxes so this is according to the sales amount the popularity of the state according to the sales amount and let's also sort it for Her City someone public city is New York La seatt San Francisco this is exactly the same as our previous analysis on
the city nothing really has changed all right so as a store you want to be able to track down your most popular category of products or your best selling products or sales performance across categories and subcategories and find the sweet spots where strong categories also have top selling sub categories and also spot Weaker subcategories with otherwise strong categories that might need Improvement or product popularity fluctuations see see if popularity seasonal Turing up and down and helps and this helps to forast the Future demand or um you can group it by location for each location there
might be a different popular product you you want to put it in a certain place to maximize your Revenue so let's get started with finding our top performing Products or their categories so let's first extract our products the categories of our products from our frame the unique PR product so right now in our data frame we have only three sorts of products as you can see the category and each one has a subcategory Lo case is chairs but we have mainly three uh categories which are Furniture office supplies and technology so let's go now over
to the types of sub category her product subcategory the uh a print Product subcategory Ras is CH fa and bunch of it now let's group the data by a product category and how many subcategory it has so we want to say for example office supplies may have like 20 subcategories or Furniture may have uh five subcategories so let's see how many subcategories each one has [Music] so so there are nine office nine first office supplies four for furniture four For technology so for suppli is much more sophisticated category now we can also see our top
performing subcategory so let's say subcategory then you want to count the sales go buy category so our most popular sub category is ha specifically phones it has the most amount of Sals Furniture chairs office suppli storage so from this you can see our most popular subcategories and what subcategories you want to recommend them On a front page or um in the store now let's see which one of our main categories performs or has the most amount of sales product category goby so as expected uh Tech is the most popular one and the furniture in office
supplies so maybe you will have inside the inside your store you will have a much larger Department or not much larger maybe a little bit larger or maybe it's in the first row right in front of the customers to be able To or present your most popular option immediately to your customers now this will allow you to of course increase your revenue and sales if you want to create a pie chart for this you can say product P top product category I can organize by sales labels top prod category but it seems that pack is
perform a little bit bad and most like these two but it's not that much different that much it's not really that different all right so let's now see Which one of our subcategories is the most popular one now remember that we saw which one of our subcategories had the most amount of sales now let's create a bar out of it we can do this buy sending false and sales sending is true and let create the bar bar sub category count sales is category top product sub category the Sals on this pet and someing find no
this shows perfectly that um Our most popular option is phone and churs and so since this are generate the most amount of sales that means that customers are more willing to pay money for this so you can end up spending more of your marketing resources on phones and shairs because it will it's there is already shown that because of the model resources you have provided for phones in ch it already Works meaning if you increase amount of resources you spend for phones and chairs then your sales Will also increase accordingly you can also uh conclude
that art and envelopes and labels aren't that popular so maybe right now you can give a discount and get rid of those and buy less of those for the future so you can buy end up buying more of the popular options for example phones chairs or you can also investigate why they are not popular maybe those are like the most the worst envelopes you could have bought or maybe it's not the right it's not the right Art you have bought maybe those kind of art people don't like but if you were to choose complete other
form of Arts maybe they will customers will end up buying them so this shows exactly how now this data stores can use to optimize their sales or optimize how their resources are allocated so you would end up making more um more money or more sales now businesses love making sales they love seeing Revenue Increase and profits increase it's all lovely but there you should be able to track down your sales so that you can see in what kind of situation you are and you can adjust to that situation and what's what is a better way
than having a high chart or a bar graph or just a normal graph to see how much growth or how much decline you're experiencing for example if you are a business and there's a declining Revenue then year over year or month of Month over month then you can see that there is a problem and then you can also allocate your resources towards fixing that problem whether that is investing more in customer performences or investing more money into marketing or into resources that make your customers more satisfied or adapting new technologies those are all um things
that you can do whenever you see declining Revenue But first you must be able to see it coming and also Businesses um have a problem with st growth so they may grow one month and the next month there's like no growth or maybe there's a decline so you want to see that as a business to be able to St stabilize the growth so use uh continuously grow as a business or missed seasonal opportunities if a business isn't aware of how sales changes throughout the the year they could miss out on maximizing profits during big Seasons
maybe are some Seasons there's a certain product that's uh what that's high in demand but you are you don't have enough stuff enough stock to cover it so you end up not being able to meet the demand and losing out on um revenue and profits and there's so those were regarding our yearly sales F and there's also a problem with quality monthly sales so for example cash flow issues now many businesses experience cash flow issues so that may maybe one day they look at Their bank and see that they are out of money and they
cannot invest more in their business or there's also inventory imbalance or ineffective marketing so for example whenever you have a cash flow issue drastic dips in sales during specific quarters or months can lead to cash crunches making it hard to pay suppliers employees or ongoing expenses or when whenever you have in inventory imbalance some uh periods you're Overstock and and those items you have To give away and some periods you're under stock you are not able to meet the demand or maybe your marketing is ineffective so if you spend significant amount of time in marketing
and you don't reach your desired out outcome that means there's a major issue with your marketing campaign and then you can see that from the sales you're making so for example if you're spending this amount of money with marketing or increased amount marketing and there's No significant increase of sales that means you're doing something wrong with your marketing campaign or lagging response to emerging trends so monthly sales data can highlight new trends or drops in demand more quickly than just he overviews so you can react much faster to uh emerging trends for example a certain
product was released uh 2024 and all of a sudden it's high in demand in uh many countries they you want to be able to adjust to that demand And get get the supplies for the product but you are not able to do that if you track yellowy sales or don't do any tracking at all so those are all the all the problems that exist if you are not able to track down your sales be it monthly partly or yearly and so we are intending to solve that problem by having we graphing it and concluding from
the results we get from our graphs all right so let's get started now let's convert the order dates column to data Frame format and the order dates is equal to pd. to daytime thef all the dates they first is equal to the true let's go the data by years and calculate the total sales amount for each year we can do it by y sales scre the variable first and then grou buy thef or the date the year sales some early sales is equal to early sales this reset the index now want give the appropriate call
the appropriate name cuz right now in the data frame it's not The order date should be named year and the sales should be named total sales now let's print this out so this is the amount of sales for each year in total and we can also this product bar out of this product bar in sales year sales total sales all right so from this uh biograph there can be few uh conclusions made for example there's a steady growth from 2016 to 2018 that might explain for example new product launches but they Are effective or economic
factors or marketing efforts those are all the explanations that a person can make but you can make this conclusions only when you have a larger data available to you and in this data frame we don't have the marketing cost or any other cost involved regarding this so that's our conclusions are PR limited but what we can see is that um this barograph combined with um any other barograph for example marketing cost it you can make a Pretty good amount of conclusions from that as a business now how about um total we can also plug this
using just a normal graph which means I will just copy it oh no I won't this plot it you need sales here the sales so this shows a little bit different now I prefer this sort of graph for instead of a bar graph for tracking the yearly sales because it shows much more clearer the amount of increase with the model decrease now we Can also uh focus on the qual sales like said to be able to uh react to emerging Trends or to emerging or to um react fast to or to be able to react
fast to any kind of change now let's again convert the order date to date format order dates [Music] from our cils we can see that there is a steady increase of quar quad Sals and all of a sudden in July it blows up to new heists so from this graph we can see Exactly that uh Q3 and Q4 is very well and q1 and Q2 didn't so something might have changed for example seasonal Trend or you have increased your marketing or you have introduced a new product or you have targeted a specific customer segment and
as a business is really important to know that so you can also expect that if you follow a similar line of actions then might have higher demand for your products on Q Q3 and Q4 so you might want to Overstock it or Maybe you can analyze this uh further and replicate the successful strategies for future quarters and you this will cause for this will make sure that a business can steadily grow and increase its Revenue which is good and also for um q1 you can see that there is um it starts out with pretty slow
I mean businesses may want to start out the year more quickly so they can investigate also for example is this a seasonal uh for the industry maybe on Certain Seasons a certain product is not high in demand or maybe the a competitor did some kind of marketing or um some kind of or use some kind of a strategy to drive more customers to them or maybe we have changed um marketing effort in our Q3 and our marketing efforts were not productive for q1 and Q2 so maybe that's it all right so maybe you want to
investigate this uh much more deeply so not Orly but monthly so let's do it now um let's start off the same way cor the Order dat column to the DAT time format thef order date periods to daytime [Music] [Music] all right so from this shft you can see that it's growing one over month beside first month of uh 2019 and also the third month the the third month of 2018 so this is generally an upward Trend which which might suggest like a healthy Sals going on and it looks like August and December might be your
seasonal so you might want to Overstock the products there and okay you can also see that there are seasonal dips in third month of 2018 and also 2019 and November of 2018 so you might want to consider it uh like seasonal promotions to stimulate offseason sales or diverse diversify your product service offerings to reduce the Reliance on seasonal demand maybe you want to start deploying new marketing strategies here or try to Target new customer segments by introducing new products so that you might offset the seasonal Trends and overall it seems pretty consistent so it looks
like that there's a healthy sales fund so you might want to invest more in your proven strategies which might be for example certain marketing tactic promotion or product offering for a certain month and like I said for the dips the store might um try to deploy new marketing strategies or introduce New products to Target new uh customer segments which might which might offset uh the dip and that said um and that said it uh more it's important to consider the lower time frame a year is certainly a year certainly has great amount of data but
it does not really accurately um reveal seasonal patterns because for one year that might be the case but for another year it might just be completely uh the opposite so you might Want to consider the larger um sales line graph for example maybe for 5 years and then you can see if that's the case and if that's the case then this might suggest a seasonal Trend if not then you will hack accordingly all right so we have covered um the sales Trends we can move on to the next chapter which is all right let's move
on to the next chapter which is mapping so we want to create a map out of uh Sals per state so for each state we want to color it um according To the amount of sales so if you are if there is a high amount of sales then it should be colored yellow and if there is low amount of sales then it should be colored uh blue so the question is why would someone want to do this now compan is looking to expand into new Geographic areas face the challenge of identifying the most promising States
and regions for their products or Services now for example how do you know if um your product will sell in a certain state so One of the one of the tactics that people often use is by seeing if there is a similar store like them operating in that state or in that City and so if there is a similar store um working there and it's not a saturated Market meaning there are substantial amount of people who might buy your product then it's a good idea to go there for example a company that manufact factures Athletic
Apparel is considering expanding its retail footprint by Analyzing total sales data by US state they can see that states with a high concentration of fitness centers and active population like for example California Texas or Florida might be good candidates for new stores and if there are currently no uh um sports stores there then it's even better or for example if um you're a business and you want to strategically allocate your Market budget and sales team and so you have uh stores all over The States you you want to optimize it for each state maybe you
want state is performing well and another state is not performing so you might then from the map you might see which state is not performing well and allocate your resources accordingly to be able to um maximize your return on investment by optimizing certain strategies but if you don't know which state is not uh performing well then if you don't know which ST which State is performing well then you have no information on where you have to optimize like for example a n a national Pizza chain wants to optimize his marketing spend sales data reveals that
their p p areas in the midwest consistently outperform those on the the West Coast at this sta suggest that they might need to allocate more marketing budget to increase brand awareness and sales in the west states and you might also want To do this because of competitive analysis so staying ahead of the competition means understanding where your competitors are having the most success analyzing their sales patterns across states can reveal their Geographic strengths and weaknesses now for example a Coffee Roasting Company notices a competive coffee brand is experience in high sales in the Pacific Northwest
State now this could indicate the competitor has established a strong Partnership with local grow grocery stores or lot successful marketing campaigns in that region the company can use this information to Target similar grocery stores or develop competitive marketing strategies for the Pacific Northwest so without further Ado let's get started with it now I will walk you through the code instead of writing it writing it we will I will just walk you through it so let's first import the clely graph function or the Library so we initialized the pling or jup notebook and we also want
to create a map for all 50 states which we do this here and add the ab abbreviation column to the data frame because right now there is not we as we initialize the variable and calculate the amount of Sals for each state and gr it by State and this is exactly what we need and then we add the abbreviation to some of cells we do that here and finally we plug it and this is how the map looks Like so the Blue Area are the ones with low amount of sales and the yellow area which
California has high amount of sales so from this one you can see which areas um main cells come from and according to it you can op you can optimize accordingly so let's say you have a a pizza chain or pizza chain of stores and you want to see which one of your States is performing the best and which one is performing the poorest so you want to spend your most energy on Optimizing what doesn't work so from this you can see clear is great so you can leave it alone but for example Texas is not
performing that well so you might allocate more marketing budget or more resources there to have a to start making um or to start getting more sales because in Texas there are still many amount of people who consume P serious but they are not buying so why is that and you can also see so for example if this is a completely another store this Is like relates to for example a retail store for Sport Goods and all this States they've got um a store in from this You can conclude that uh California is profing realville so
it it's probably not a good idea to go there since it might have since there might be like Market situation there but you can go for example to for example Florida you might to go to Florida and start selling um Similar uh sport good there because like you see the market is still uh new or the market is still not saturated all right so that was that so we can also create a baph out of it now from this you can see that most of the total sales P my St California is doing the best
and know the is doing the worst and of course now I you remember we previously categorized or showed how large each of our categories are and we we did the same for sub our Subcategories but we never did it in the same uh plot so here we display our main category of products which are Furniture office supplies and technology and for each uh category we have subcategories and based on their size it's uh and it's openers based on their size so cheris seems that cheris is the largest or sells the most in our category it
goes to tables and we have then small amount of B cases and fhings and in our office supplies you Can see that the best uh the storage um product is performing the best and envelopes and labels are performing the worst and for our attack category phones are performing the best and the machines accessors and copies and from this you can see that phones is overall the best subcategory it's even I think it's even larger than uh um chairs yeah little larger so this is a much better way to display um if you're trying to make
an Argument uh to display the plot and of course you can also do it this way all right and hope you guys enjoy this project I definitely did and I will see you guys in the next video this video was sponsored by lunch at Lun Tech we are all about making you ready for your dream job in P making data science and AI accessible to everyone within data science artificial intelligence or engineering as will Tech Academy we have courses and boot camps to help you Become a job ready professional we are here to help also
businesses as schools and universities with a top notot training modernization with data science and AI corporate training including the latest topics like generative AI with luner tech learning is easy fun and super practical we care about providing an endtoend learning experience that is both practical and grounded in fundamental knowledge our community is all about supporting each other making Sure you get where you want to go ready to start your Tech Journey learner Tech is where you begin for students or aspiring data science and AI professionals visit ltech Academy section to explore our courses and boot
camps and just in general our programs businesses in Need for employee training upscaling or data science Andi Solutions should head to the tech section on the larer tech. page Enterprises looking for corporate training curriculum Modernization and customize AI tools to enhance education please visit the lunch Enterprises section at lunch. for a free consultation and customizes this video was sponsored by lunar Tech at lunar Tech we are all about making you ready for your dream job in Tech making data science and AI accessible to everyone with is data science artificial intelligence or engineering at lunar Tech
Academy we have courses and boot camps to help you become a job ready Professional we are here to help also businesses and schools and universities with a top notot training modernization with data science and AI corporate training including the latest topics like generative AI with lunar Tech learning is easy fun and super practical we care about providing an endtoend learning experience that is both practical and grounded in fundamental knowledge our community is all about supporting each other making sure you Get where you want to go ready to start your Tech Journey learner Tech is where
you begin for students or aspiring data science and AI professional visit Lun Tech Academy section to explore our courses and boot camps and just in general our programs businesses in Need for employee training upscaling or data science and AI Solutions should head to the technology section on the lunch. page Enterprises looking for corporate training curriculum Modernization and customized AI tools to enhance education please visit the lunarch Enterprises section at lunch. for a free consultation and customize estimate join Lun Tech and start building your future one data point at a time in this part we are
going to talk about a case study in the field of Predictive Analytics and causal analysis so we are going to use this simple yet powerful regression technique called linear regression in order to perform Causal analysis and Predictive Analytics so by causal analysis I mean that we are going to look into this correlations clation and we're trying to figure out what are the features that have an impact on the housing price on the house value so what are these features that are describing the house that Define and cause the variation in the uh house prices the
goal of this case study is to uh practice linear regression model and to get this first feeling of how uh you Can use a machine learning model a simple machine learning model in order to perform uh model training model evaluation and also use it for causal analysis where you are trying to identify features that have a statistically significant impact on your response varibles so on your dependent variable so here is the step-by-step process that we are going to follow in order to find out what are the features that Define the Californian house values So first
we are going to understand what are the set of independent variables that we have we're also going to understand what is the response variable that we have so for our multiple in regression model we are going to understand what are this uh techniques that we uh need and what are the libraries in Python that we need to load in order to be able to conduct this case study so first we are going to load all this libraries and we are going to Understand why we need them then we are going to conduct data loading and
data preprocessing this is a very important step and I deliberately didn't want you to skip this and didn't want you to give you the clean data cuz uh usually in normal real Hands-On data science job you won't get a clean data you will get a dirty data which will contain missing values which will contain outliers and those are things that you need to handle before you proceed to the actual and F Part which is the uh modeling and the uh analysis so therefore we are going to do missing data analysis we are going to remove
the missing data from our Californian house price data we are going to conduct outlier detection so we are going to identify outliers we are going to learn different techniques that you can use visualization uh techniques uh in Python that you can use in order to identify outliers and then remove them from your data then we are going to Perform data visualization so we are going to explore the data and we are going to do different plots to learn more about the data to learn more about this outliers and different statistical techniques uh combined with python
so then we are going to do correlation analysis to identify some problematic features which is something that I would suggest you to do independent the nature of your case study to understand what kind of variables you have what is a Rel Relationship between them and whether you are dealing with some potentially problematic variables so then we will be uh moving towards the fun part which is performing the uh multiple linear regression in order to perform the caal nazes which means identifying the features in the Californian house blocks that Define the value of the Californian houses
so uh finally we will do very quickly another uh implementation of the Same multiple uh multiple linear regression in order to uh give you not only one but two different ways of conducting linear regression because linear regression can be used not only for causal analysis but also as a standalone a common machine learning regression type of model therefore I will also tell you how you can use psychic learn as a second way of training and then predicting the Californian house values so without Further Ado let's get started once you become a data scientist or machine
learning researcher or machine learning engineer there will be some cases some handson uh data science projects where the business will come to you and will tell you well here we have this data and we want to understand what are these features that have the biggest influence on this aom factor in this specific case in our case study um let's assume we have a client that uh is interested in In identifying what are the features that uh Define the house price so maybe it's someone who wants to um uh invest in uh houses so it's someone
who is interested in buying houses and maybe even renovating them and then reselling them and making a profit in that way or maybe in the long-term uh investment Market when uh people are buying real estate in a way of uh investing in it and then longing for uh holding it for a long time and then uh selling it later Or for some other purposes the end goal in this specific case uh for a person is to identify what are this features of the house that makes this house um to be priced at a certain level
so what are the features of the house that are causing the price and the volue of the house so we are going to make use of this very popular data set that is available Kel and it's originally coming from psyit learn and is called California housing prices I'll also make Sure to put the link uh of this uh specific um data set uh both in my GitHub account uh under this repository that will be dedicated for this specific case study as well as um I will also point out the additional links that you can use
to learn more about this data set so uh this data set is derived from 199 20 um US Census so United uh States census using one row Paris census block so a Blog group or block is the smallest uh geographical unit for which the US Cus Bureau publishes sample data so a Blog group typically has a population of 600 to 3,000 people who are living there so a household is a group of people residing within a single home uh since the average number of rooms and bedrooms in this data set are provided per household this
conss may be um May take surprisingly large values for blog groups with few households and many empty houses such as Vacation Resorts so Um let's now look into uh the variables that are available in this specific data set so uh what we have here is the med Inc which is the median income in blog group so uh this um touches the uh financial side and uh Financial level of the uh block uh block of households then we have House age so this is the median house age in the block group uh then we have average
rooms which is the average number of rooms uh per Household uh then we have average bedroom which is the average number of bedrooms per household then we have population which is the uh blog group population so that's basically like we just saw that's the number of people who live in that Blog then we have a uh o OU uh which is basically the average number of household members uh then we have latitude and longitude which are the latitude and Longitude of this uh block group that we are looking into so as you can see here
we are dealing with aggregated data so we don't have the uh the data per household but rather the data is calculated and average aggregated based on a block so this is very common in data science uh when we uh want to reduce the dimension of the data and when we want to have some sensible numbers and create this crosssectional data and uh Cross-sectional data means that we have multiple observations for which we have data on a single time period in this case we are using using as an aggregation unit the block and uh we have
already learned as part of the uh Theory lectures this idea of median so we have seen that there are different descriptive measures that we can use in order to aggregate our data one of them is the mean but the other one is the median and often times especially if we Are dealing with skew distribution so if we have uh distribution that is not symmetric but it's rather right skewed or left skewed then we need to use this IDE IDE of median because median is then better representation of this um uh scale of the data um
compared to the mean and um in this case we will soon see when you're presenting and visualizing this data that we are indeed dealing with uh skewed data so um this basically a very simple a very basic Data set with not too many features so great um way to uh get your hands uh uh on with actual machine learning use case uh we will be keeping it simple but yet we will be learning the basics and the fundamentals uh in a very good way such that uh learning more um difficult and more adval learning models
will be much more easier for you so let's now get into the actual coding part so uh here I will be using the Google clap so I will be sharing the link to this Notebook uh combined with the data in my uh P for data science repository and you can make use of it in order to uh follow this uh tutorial uh with me so uh we always start with importing uh libraries we can run a linear regression uh manually without using libraries by using matrix multiplication uh but I would suggest you not to do
that you can do it for fun or to understand this metrix multiplication the linear algebra behind The linear regression but uh if you want to um get Hands-On and uh understand how you can use the new regression like you expect to do it on your day-to-day job then do expect to use um instead Library such as psyit learn or you can also use the stats models. API liaries in order to understand uh this topic and also to get Hands-On I decided to uh showcase this example not only in one Library inler but also the starts
models and uh the reason for this is Because many people use linear regression uh just for Predictive Analytics and for that using psyit learn this is the go-to option but um if you want to use linear regression for causal analysis so to identify and interpret this uh features the independent variables that have a statistically significant impact on your response variable and then you will need to I use another Library a very handy one for linear regression which is called uh Stats models. API and from there you need to import the SM uh functionality and this
will help you to do exactly that so later on we will see how uh nicely this Library will provide you the outcome exactly like you will learn on your uh traditional econometrics or introduction to linear regression uh class so I'm going to give you all this background information like no one before and we're going to interpret and learn everything such that um you start Your machine Learning Journey in a very proper and uh in a very um uh high quality way so uh in this case uh first thing we are going to import is the
pendis library so we are importing pendis Library as PD and then non pile Library as NP we are going to need pandas uh just to uh create a data frame to read the data and then to perform data wrangling to identify the missing data outliers so common data wrangling and data proposing steps and then we are Going to use npy and npy is a common way to uh use whenever you are visualizing data or whenever you are dealing with metrices or with arrays so pandas and nonp are being used interchangeably so then we are going
to use MPL lip and specifically P platform it uh and this library is very important um when you want to visualize a data uh then we have cburn um which uh is another handy data visualization library in Python so whenever you want to Visualize data in Python then methot leip and Cy uh Seaburn they are two uh very handy data visualization techniques that you must know if you like this um cooler uh undertone of color that curn will be your go-to option because then the visualizations that you are creating are much more appealing compared to
the med plotly but the underlying way of working so plotting scatter plot or lines or um heat map they are the same so then we have the STS mods. API Uh which is the library from which we will be importing the uh SM uh that is the p uh linear regression model uh that we will be using uh for our c NIS uh here I'm also importing the uh from psyit learn um linear model and specifically the linear regression model and um this one uh is basically similar to this one you can uh use both
of them but um it is a common um way of working with machine learning model so whenever you are dealing with Predictive Analytics so we you are using the data not for identifying features that have a statistically signific can impact on the response variable so features that have an influence and are causing the depend variable but rather you are just interested to use the data to train the model on this data and then um test it on an unseen data then uh you can use psyit learn so pyit learn will uh will be something that
you will be using not only for linear regression but also for Other machine learning models think of uh Canon uh logistic regression um random Forest decision trees um boosting techniques such as light GBM GBM um also clustering techniques like K means DB scan anything that you can think of uh that fits in in this category of traditional machine learning model you'll be able to find in Cy Lear therefore I didn't want it to limit this tutorial only to the S models which we could do uh if we wanted to use um if we Wanted to
have this case study for uh specifically for linear regression which we are doing but instead I wanted to Showcase also this usage of psychic learn because psychic learn is something that you can use Beyond linear regression so for all this added type of machine learning models and given that this course is designed to introduce you to the world of machine learning I thought that we will combine this uh also with psychic learning something That you are going to see time and time again when you are uh using python combined with machine learning so then I'm
also uh importing the uh training test split uh from the psyit learn model selection such that we can uh split our data into train and test now uh before we move into uh the uh actual training and testing we need to first load our data so therefore uh what I did was to to uh here uh in this sample data so in a folder in Google Collab I uh put it this housing. CSV data that's the data that you can download uh when you go to this specific uh page so uh when you go here
um then uh you can also uh download here that data so download 409 kabes of this uh housing data and that's exactly what I'm uh downloading and then uploading here in Google clab so this housing. CSV in this folder so I'm copying the path and I'm putting it here and I'm creating a variable that holds this um name so the Path of the data so the file un thecore path is the variable string variable that holds the path of the data and then what I need to do is that I need to uh take this
file uncore path and I need to put it in the pd. read uncore gsv uh which is a function that we can use in order to uh load data so PD stands for pandas uh the short way of uh naming pandas uh PD Dot and then rore CSV is the function that we are taking from Panda's library and then within the Parentheses we are putting the file uncore path if you want to learn more about this Basics or variable different data structures some basic python for data science then um to ensure that we are keeping
this specific tutorial structured I will not be talking about that but feel free to check the python for data science course and I will put the link um in the comments below such that you can uh learn that if you don't know yet and then you can come back to This tutorial to learn how you can use python in combination with linear regression so uh the first thing that I tend to do before moving on to the actual execution stage is to um look into the data to perform data exploration so what I tend to
do is to look at the data field so the name of the variables that are available in the data and that you can do by doing data. columns so you will then look into the columns in your data this will be the Name of your uh uh data fields so let's go ahead and do command enter so we see that we have longitude latitude housing uncore median age we have total rooms we have total bedrooms population so basically the the um amount of people who are living in the in those households and in those houses
then we have households then we have median income we have median house uncore value and we have ocean proximity now you might notice that the name of these Variables are a bit different than in the actual um documentation of the California house so you see here the naming is different but the underlying uh explanation is the same so here they are just trying to make it uh nicer and uh represented in a better uh naming but uh it is a common um thing to see in Python when we are dealing with uh data that uh
we have this underscores in the name approvation so we have housing uncore median AG which in this case you Can see that it says house uh age so bit different but their meaning is the same this is still the median house age in the block group so uh one thing uh that you can also uh notice here is that the um in the official uh documentation we don't have this um one extra variable that we have here which is the ocean proximity and this basically uh describes the uh closeness of the house from the ocean
which of course uh for some people can definitely mean a Increase or decrease in the house price so I basically um we have all these variables and next thing that I tend to do is to look into the actual data and one thing that we can do is just to look at the um top 10 rows of the data instead of printing the entire uh data frame so when we go and uh execute this specific part of the code Cod and the command you can see that here we have the top 10 rows uh of
our data so we have the longitude the latitude we have The housing median age you can see we are seeing some 41 year 21 year 52 year basically the number of years that a house the median age of the house is 41 21 52 and this is per block then we have the number of total bedrooms so we see that uh we have um in this blog uh the total number of rooms that this houses have is 7,99 so we are already seeing a data that consists of these large numbers which is something to take
into account When uh you are dealing with machine learning models and especially with linear regression then we have total bedrooms um and we have then population households median income median house value and the ocean proximity one thing that you can see right of the bed is that uh we have longitude and latitude uh which have some uh unique uh characteristics um and longitude is with minuses latitude is with pluses uh but That's fine for the linear regression because what it is basically looking is uh whether a variation in certain independent variables in this case longitude
and latitude but that will cause a change in the dependent variable so just to refresh our memory what this linear regression will do in this case um so we are dealing with multiple linear regression because we have more than one independent variables so we have as independent variables those Different features that describe the house except of the house price because median house value is the dependent variable so that's basically what we are trying to figure out we want to see what are the features of the house that CA so Define the house price we want
to identify what are um the features that cause a change in our dependent variable and specifically uh what is the uh change in our median house price uh value if we apply a one unit change in Our independent feature so if we have a multiple linear regression we have learned during the theory lectures that what linear regression tries to use during coal Nazis is that it tries to keep all the independent variables constant and then investigate for a specific independent variable what is this one unit uh change uh increase uh in the specific independent variable
will result in what kind of change in our dependent variable so if we for Instance change by one unit our uh housing median age um then what will be the corresponding change in our median houseold value keeping everything else conent so that's basically the idea behind multi multiple linear regression and using that for this specific use case and in here um what we also want to do is to find out what are the uh data types and whether we can learn bit more about our data before proceeding to the next step and for that I
tend to use This uh info uh function in pandas so given that the data is a pandas data frame I will just do data. info and then parenthesis and then this will uh show us what is the data type and what is the number of new values per variable so um as we have already noticed from this header which we can also see here being confirmed that ocean proximity is a variable that is not a numeric value so here you can see nearby um also a value for that variable which Unlike all the other values
is represented by a string so this is something that we need to take into account because later on when we uh will be doing the data prop processing and we will actually uh actually run this model we will need to do something with this specific variable we need to process it so um for the rest we are dealing with numeric variables so you can see here that longitude latitude all the other variables including our dependent Variable variable is a numeric variable so float 64 the only variable that needs to be taken care of is this
ocean underscore proximity uh which um we can actually later on also see that is um categorical string variable and what this basically means is that it has these different categories so um for instance uh let us actually do that in here very quickly so let's see what are all the unique values for this variable so if we take the name Of this variable so we copy it from this overview in here and we do unique then this should give us the unique values for this categorical variable so here we go so we have actually five
different unique values for this categorical string variable so this means that this ocean proximity can take uh five different values and it can be either nearby it can be less than 1 hour from the ocean it can be Inland it can be near Ocean and it can be uh in The Iceland what this means is that we are dealing with a feature that describes the distance uh of the block from the ocean and here the underlying idea is that maybe this specific feature has a statistically significant impact on the house value meaning that it might
be possible that for some people um in certain areas or in certain countries living in the uh nearby the ocean uh will be increasing the value of the house so if there is a huge demand for Houses which are near the ocean so people prefer to uh leave near the ocean then most likely there will be a positive relationship if there is a uh negative relationship then it means that uh people uh if uh in that area in California for instance people do not prefer to live near the ocean then uh we will see this
negative relationship so we can see that um if we increase uh the uh if if people uh if the house is in The uh um area that is uh not close to Ocean so further from the ocean then the house value will be high so this is something that we want to figure out with this the regression we want to understand what are the features that uh Define the value of the house and we can say that um if the house has those characteristics then most likely the house price will be higher or the house
price will be lower and uh linear aggression helps us to not only Understand what are those features but also to understand how much higher or how much lower will be the value of the house if we have the certain characteristics and if we increase the certain characteristics by one unit so next we are going to look into uh the missing data in our data so in order to have a proper machine learning model we need to do some uh data PR processing so for that what we need to do is we need to check for
the uh missing values in Our data and we need to understand what is this amount of n values per data field and this will help us to understand whether uh we can uh remove some of those missing values or we need to do imputation so depending on the amount of missing data that we got in our data we can then understand which all those Solutions we need to take so here we can see that uh we don't have any n values when it comes to longitude latitude Housing median age and all the other variables except
a one variable one independent variable and that's the total bedrooms so we can see that um out of all the observations that we got the total uh underscore bedrooms variable has 207 cases when we do not have the corresponding uh information so when it comes to representing these numbers in percentages which is something that you should do as your next step we can see That um out of um the entire data set uh for total underscore bedrooms variable um only 1. n uh 3% is missing now this is really important because by simply looking at
the number of times the uh number of missing uh observations per data field this won't be helpful for you because you will not be able to understand relatively how much of the data is missing now if you have for a certain variable 50% missing or 80% missing then it means that for majority Of your house blocks you don't have that information and including that will not be beneficial for your moral nor will be it accurate to include it and it will result in biased uh moral because if you have for the majority of observations uh
no information and for certain observations you do that inform you have that information then you will automatically skew your results and you will have biased results therefore if you have uh for the majority of your um Data set that specific uh variable missing then I would suggest you choose just to drop that independent variable in this case we have just one uh% uh of the uh house blocks missing that information which means that this gives me confidence that uh I would rather keep this independent variable and just to drop those observations that do not have
a total uh underscore bedrooms uh information now another solution could also be is uh to instead of dropping That entire independ variable is just to uh use some sort of imputation technique so uh what this means is that uh we will uh try to find a way to systematically find a replacement for that missing value so we can use mean imputation median imputation or more model based more advanced or econometrical approach is to perform imputation so for now this is out of the scope of this problem but I would say look at the uh percentage
of uh observations that for which this uh Independent variable has missing uh values if this is uh low like less than 10% and you have a large data set then uh you should uh be comfortable at dropping those observations but if you have a small data set so you got only 100 observations and for them like 20% or 40% is missing then consider from imputation so try to find the values that can be um used in order to replace those missing values now uh once we have this Information and we have identified the missing values
the next thing is to uh clean the data so here what I'm doing is that I'm using the data that we got and I'm using the function drop na which means drop the um uh observations where the uh value is missing so I'm dropping all the observations for which the total underscore bedrooms has a null value so I'm getting rid of my missing observations so after doing that I'm Checking whether I got rid of my missing observations and you can see here that when I'm printing data that is null. sum so I'm summing up the
number of uh Missing observations no values pa uh variable then uh now I no longer have any missing observations so I successfully deleted all the missing observations now the next state is to describe the data uh through some descriptive statistics and through data visualization so before moving on Towards the causal analysis or predictive analysis in any sort of machine learning traditional machine learning approach try to First Look Into the data try to understand the data and see whether you are seeing some patterns uh what is the mean of different um numeric data fields uh do
you have certain uh categorical values that cause an un unbalanced data those are things that you can discover uh early on uh before moving on to uh the model Training and testing and blindly believing to the numbers so data visualization techniques and data exploration are great way to understand uh this uh data that you got before using that uh in order to train and test machine learning model so here I'm using the uh traditional describe function of pendis so data. describe parenthesis and then this will give me the descriptive statistics of my data so here
what we can see is that in total we got uh 20,600 observations uh and then uh we also have a mean of uh all the variables so you can see that perir variable I have the same count which basically means that for all variables I have the same number of rows and then uh here I have the mean which means that um here we have the mean of the uh variables so per variable we have their mean and then we have their standard deviation so the square root of the variance we have the minimum we
have The maxim asse but we also have the 25th percentile the 15 percentile and the 75th percentile so the uh percentile uh and quantiles those are uh statistical terms that we oftenly use and the 25th percentile is the first quantile the 15 percentile is the second quantile or the uh median and the 75th percentile is the uh third quantile so uh what this basically means is that uh this percentiles help us to understand what is this threshold when it comes to Looking at the um observations uh that fall under the 25th per uh and then
above the 25% so when we look at this uh standard deviation standard deviation helps us to interpret the variation in the data at the unit so scale of that variable so in this case the variable is median house value and we have that the mean is equal to 206,000 approximately so more or less that uh range 206 K and then the standard deviation is 115k what this means is that uh in the data set we will find blocks that will have the median house value that will be uh 200 uh 6K 206k plus 115 K
which is around 321k so there will be blocks where the median house value is around 321k and there will also be blocks where the um median house value will be around uh 91k so 206,000 minus 115k so this the idea behind standard deviation this variation your data so Next we can interpret the idea of this uh minimum and maximum of your data in your data fields the minimum will help you to understand what is this minimum value that you have per data field numeric data field and what is the maximum value so what is the
range of values that you are looking into in case of the median house value this means what um are the uh what is this minimum median house value per uh block and uh in case of Maximum what is this um Highest value per block when it comes to the Medan house value so this can uh help help you to understand um when we look at this aggregated data so the median house value what are the blocks that have the uh cheapest uh houses when it comes to their valuation and what are the most expensive uh
blocks of houses so we can see that uh the cheapest um block uh where in that block the median house value is uh 15K so 14,999 and the house block with the um Highest valuation when it comes to the median house value so uh the median um valuation of the houses is equal to $500,000 And1 which means that when we look at our blocks of houses um that uh the median house value in this most expensive blocks will be maximum 500k so uh next thing that I tend to do is to visualize the data I
tend to start with the depend depent variable so this is the variable of interest the target Variable or the response variable which is in our case the median house value so this will serve us as our dependent variable and what I want you to do is to upload this histogram uh in order to understand what is the distribution of Medan house values so I want to see that when when looking at the data what are the um most frequently appearing median house values and uh what are this uh type of blocks that have um unique
less frequently um appearing uh meded house Values by ploting this type of plots you can see some outliers some um frequently appearing values but also some values that uh go uh and uh are lying outside of the range and this will help you to identify and learn more about your data and to identify outliers in your data so in here I'm using the uh caburn uh Library so given that earlier I already imported this libraries there is no need to import here what I'm doing is that I'm setting the the grid so which Basically means
that I'm saying the background should be white and I also want this grid so this means those this grid behind then I'm initializing the size of the figure so PLT this comes from met flly P plot and then I'm setting the figure the figure size should be 10x 6 so uh um this is the 10 and this is the six then we have the main PL so I'm um using the uh his plot function from curn and then I'm taking from the uh clean data so from which we Have removed the missing data I'm picking
the uh variable of interest which is the median house value and then I'm saying upload this um histogram using the fors green color and then uh I'm saying uh the type title of this figure is distribution of median house values then um I'm also mentioning what is the X label which basically means what is the name of this variable that I'm putting on the xaxis which is a median house value and what Is the Y label so what is the name of the variable that I need to put on the Y AIS and then I'm
saying pl. show which means show me the figure so that's basically how in Python the visualization works we uh first need to write down the the actual uh figure size uh and then we need to uh Set uh the function uh and the right variable so provide data to the visualization then we need to put the title we need to put the ex Lael while label and then we need To say show me the visualization and uh if you want to learn more about this visualization techniques uh make sure to check the python for data
science course cuz that one will help you to understand slow slowly uh and in detail how you can uh visualize your data so in here what we are visualizing is the frequency of this median house values in the entire data set what this means is that we are looking at the um number of times each Of those median house values appear in the data set so uh we want to understand are there uh certain uh median house values that appear very often and and are there certain house values that do not appear that often so
those can be maybe considered outliers uh because we want in our data only to keep those uh most relevant and representative data points we want to derive conclusions that hold for the majority of our uh uh observations and Not for outliers we will be then using that uh representative data in order to run our linear regression and make conclusions when looking at this graph what we can see is that uh we have a certain cluster of um median house values that appear quite often and those are the cases when this frequency is high so you
can see that uh we have for instance houses in here in all this block that appear um very often so for instance the a median House value U of about 160 170k this appears very frequently so you can see that the frequency is above 5,000 those are the most frequently appearing median house values and um there are cases when the um so you can see in here and you can see in here houses that uh whose Medan house value is not appearing very often so you can see that their frequency is low so um roughly
speaking those houses they are unusual houses they can be Considered as outliers and the same holds also for these houses because you can see that for those the frequency is very low which means that in our population of houses so California house prices you'll most likely see houses uh blocks of houses whose medium value is between let's say um 17K up to uh let's say uh 300 or or 350k but anything below and above this is considered as unusual so you don't often see a houses that are um so house Blocks that have a median
house value less than uh 70 or 60k and then uh also uh houses that are above um 370 or 400k so do consider that uh we are dealing with 1990 um a year data and not the current uh prices because nowadays uh Californian houses are much more expensive but this is the data coming from 1990 so uh do take that into account when interpreting this type of data Visualizations so uh what we can then do is to use this idea of inter quantile range to remove this outl what this basically means is that we are
looking at the lowest 25th uh percentile so uh we are looking at this first quantile so 0 25 which is a 25th percentile and we are looking at this upper 25th um percent which means the third quantile or the 75th percentile and then we want to basically remove those uh by using this idea of 25th percentile and 75th Percentile so the first quantile and the third quantile we can then identify what are the uh observations so the blocks that have a median house value that is below the uh 25th perenti and above the 75 perti
so basically we want to uh get the middle part of our data so we want to get this data for which the median house uh value is above the 25th percentile so um above all the uh median house values that is above the uh lowest 25% uh percent and then we also want to Remove this very large median house values so we want to uh keep in our data the so-called normal uh and representative blocks blocks where the median house uh value is above the lowest 25% and smaller than the largest 25% what we are
using is this statistical uh term called inter quanti range you don't need to know the name but I think it would be just Worth to understand it because this is a very popular way of uh making a data driven Uh removal of the outliers so I'm selecting the um 25th percentile by using the quantile function from pandas uh so I'm saying find for me the uh value that divides my entire uh block of observations or block observations to observations for which the median house value is below the um the um 25th percentile and above the
25th percentile so what are the largest 75% and what are the smallest 25th per when it comes to the median house value And we will then be removing this 25% so that I will do by using this q1 and then uh we will be using the uh Q3 in order to remove the very large median house Valu so the uh upper 25th percentile and then uh in order to um calculate the inter Quan range we need to uh pick the Q3 and subtract from it the q1 so just to understand this idea of q1 and
Q3 so the Quantas better let's actually print this uh q1 and this uh Q3 so let's actually remove this part for now and then run it so as you can see here what we are finding is that the uh q1 so the 25th percenti or first quanti is equal to $119,500 so it basically is a number in here what it means is that um we have uh 25% um of the um observations the smallest observ ations have a median house value that is below the $1 $119,500 and the remaining 75 uh% of our observations have
a median house value that is above the $190,500 and then the uh Q3 which is the third quantal or the 75th percentile it describes this threshold the volume where we make a distinction between the um uh lowest median house values the first 75th uh% of the lowest uh median house values versus the uh most Expensive so the highest median house values so what is this upper uh 25% uh when it comes to the median house value so we see that that distinction is 264,000 $700 so it is somewhere in here which basically means that when
it comes to this uh to this blocks of uh houses the most expensive ones with the highest valuation so the 25% top rated median house values they are above 24,700 that's something that we want to Remove so we want to remove the observations that have a smallest median house value and the largest median house values and usually it's a common practice when it comes to the inter quantel uh range approach to multiply the inter quanti range by 1.5 in order to um obtain the lower bound and the upper bound so to understand what are the
um thresholds that we need to use in order to remove the uh blocks uh so observation from our data where the Median house value is very small or very large so for that we will be multiplying the IQR so inter quanti range by 1.5 and when we uh subtract this value from q1 then we will be getting our lower bound when uh we will be adding this value to Q3 then we will be using and getting this threshold when it comes to the uh upper bound and we will be seeing that um after we uh
clean this uh outliers from our data we end up uh getting um smaller data so this means that uh Previously we had uh 20K so 2,433 observations and now we have 9,369 observations so we have roughly removed um like about 1,000 or bit over 1,000 observations from our data so uh next let's look into some other variables for instance the median uh income and um one other technique that we can use in order to identify outliers in the data is by using the box plot so I wanted to showcases different Approaches that we can use
in order to visualize the data and to identify outliers such that you will be familiar with uh different techniques so let's go ahead and plot the uh box plot and box plot is a statistical um way to represent your data uh the central box uh represents the inter Quant range so um that is the IQR uh and with the uh with the bottom and the top edges they indicate the 25th percentile so the first quantile and the 75th percentile So the third quantile respectively the length of this box that you see here uh this dark
part is basically the 50% of your data for the median income and uh this uh median uh line inside this box um this is the uh the one with uh contrasting color that represents the median of the data set so the median is the middle value when data is sorted in an ascending order then we have this whiskers in our box fote and this line of whiskers extends from the Top and the bottom of the box and indicate this range for the rest of the data set excluding the outliers they are typically this 1.5 IQR
above and 1.5 times um IQR uh below the q1 something that we also so uh just previously when we were removing the outliers from the median house volume so in order to um identify the outliers you can quickly see that we have all these points that um lie above the 1.5 times IQR above the um third quantile so the 75 percentile and um that's something that you can also see here and this means that those are uh blocks of houses that have unusually high median income that's something that we want to remove from our data
and therefore we can use the uh exactly the same approach that we used previously for the median house value so we will then identify the uh 25 percentile or the first quantile so q1 and then Q3 so the third quantile or the 7 percentile then we will compute the IQR um and then we will be obtaining the lower bound and the upper bound using this 1.5 um as a scale and then we will be using that this lower bound and upper bound to then um use these filters in order to remove from the data all
the observations where the medium income is above the lower bound and all the observations that have a median income below the upper bound so we are using Lower bound and upper bound to perform double filtering we are using two filters in the same row as you can see and we are using this parentheses and this end functionality to tell to python well first look that this condition is satisfied so the observations have a median income that is above this lower bound and at the same time it should hold that the observation so the block should
have a median income that is below the upper bound and if this uh Block this observation in the data satisfies to two of this criteria then we are dealing with a good point a normal point and we can keep this and we are seeing that this is our new data so let's actually go ahead and execute this code in this case we can see too high as all our outliers lie in this part of the box fote and then we will end up with a clean data I'm taking this clean data and then I'm putting
it under data just for Simplicity and uh this data now uh is much more clean and uh it's better representation of the population something that ideally we want because we want to find out what are the features that uh describe and Define the house value not based on this unique and rare houses which are too expensive or which are in the blocks that have uh very high income uh people but rather we want to see the uh the uh true representation so the most frequently Appearing data what are the features that Define the house value
of the prices uh for common uh houses and for common areas for people with average or with normal income that's what we want to uh find so uh the next thing that I tend to do uh when it comes to especially regression NES and Co nazes is to plot the correlation heat map so this means that uh we are getting the um uh correlation Matrix pairwise correlation score uh for For each of this pair of variables in our data when it comes to the linear regression one of the uh assumptions of the linear regression that
we learned during the theory part is that we should not have a perfect multicolinearity what this means is that there should not be a high correlation between pair of independent variables so knowing one should not help us to automatically Define the value of the other independent variable and if the Correlation between the two independent variables is very high it means that we might potentially be dealing with multicity there something that we do want to avoid so heat map is a great way to identify whether we have this type of problematic independent variables and whether we
need to drop any of them or maybe multiple of them to ensure that we are dealing with proper linear regression model and the assumptions of linear Regression model is satisfied now when we look at this corelation heat map um and uh here we use the Seaburn in order to plot this as you can see here the colors can be from very light so white from uh till very dark green where uh the light means um there is a negative strong negative correlation and very dark uh green means that there is a very strong positive correlation
so uh we know that Correlation a value Pearson correlation can take values between minus one and one minus one means uh very strong negative correlation one means very strong positive correlation and um usually when uh we are dealing with correlation of the variable with itself so a correlation between longitude and longitude then U this correlation is equal to one so as you can see on the diagonal we have therefore all the ones because those are The pairwise correlation of the variables with themselves and then um in here uh all the values under the diagonal are
actually equal to the uh mirror of them in the upper diagonal because the variable between so the correlation between um the same two variables independent of how we put it so which one we put first and which one the second is going to be the same so basically correlation between longitude and latitude and correlation latitude And longitude is the same so um now we have uh refreshed our memory on this let's now look into the actual number and this heat map so as we can see here we have this section where we um have uh
variables independent variables um that have a low uh positive correlation with the uh remaining independent variables so you can see here that we have this light green uh values which indicate a low positive relationship ship between pair of Variables one thing that is very interesting here is the middle part of this heat map where we have this dark numbers so the numbers uh below the diagonals are something we can interpret and remember that below diagonal and above diagonal is basically the mirror we here already see a problem because we are dealing with variables which are
going to be independent variables in our model that have a high correlation now why is this a problem because one of the Assumptions of linear regression like we so during the theory section is that we should not have a multiple uh colinearity so multicolinearity problem when we have perfect multicolinearity it means that we are dealing with independent variables that have a high correlation knowing a value of one variable will help us to know automatically what is the value of the other one and when we have a correlation of 0.93 which is very high or 0.98
this means that those two variables those two independent variables they have a super high positive relationship this is a problem because this might cause our model to result in uh very large standard errors and also not accurate and not generalizable model that's something we want to avoid and uh we want to ensure that the assumptions of our model are satisfied I now um we are dealing with independent variable which is total underscore bedrooms and Households which means that number of total bedrooms uh per block and the uh households is highly correlated positively correlated and this
a problem so ideally what we want to do is to drop one of those two independent variables and uh the reason why we can do that is because uh those two variables given that they are highly correlated they already uh explain similar type of information so they contain similar type Of variation which means that including the two just it doesn't make sense on one hand it's U violating the modal assumptions potentially and on the other hand it's not even adding too much value because the other one already shows similar variation so um the total underscore
bedrooms basically contains similar type of information as the households so we can as well um so we can better just drop one of those uh two independent Variables now uh the question is which one and that's something that we can uh Define by also looking at other correlations in here because we uh have uh total bedrooms uh having a high correlation with households but we can also see that the total underscore rooms has a very high correlation with our households so this means that there is yet another independent variable that has a high correlation with
our households Variable and then this total underscore rooms has also High uh correlation with the total underscore bedroom so this means that um we can decide which one is has um more frequently uh High correlation with the rest of independent variables and in this case it seems like that the largest two numbers in here are the um this one and this one so we see that the total bedroom has a 0.93 as correlation with the total underscore rooms and uh at the same time we also See that total bedrooms has also um very high correlation
with the household so 0.98 which means that total underscore bedrooms has the highest correlation with the remaining independent variables so we might as well drop this independent variable but before you do that I would suggest you do one more quick visual check and it is to look into the total uncore bedroom correlation with the dependent variable To understand how strong of a relationship does this have on the response variable that we are looking into so we see that in the uh total underscore bedroom uh has this one 0.05 correlation with the response variable so the
median house volue when it comes to the total rooms that one has much higher so I'm already seeing from here that uh we can feel comfortable uh excluding and dropping the total uncore Bedroom from our data in order to ensure that we are not dealing with perfect multicolinearity so that's exactly what I'm doing here so I'm dropping the um a total bedrooms so after doing that we no longer have this uh total bedrooms as the column so before moving on to the actual caal analysis there is one more step that I wanted to uh show
you uh which is super important when it comes to the ca analysis and some uh Introductory econometrical stuff so uh when you have a string categorical variable there are a few ways that you can deal with them one easy way that you will see um on the web is to perform one H encoding which basically means transforming all this uh string values so um we have a near Bay less than 1 hour ocean uh Inland near Ocean Iceland to transform all these values to some numbers such that we have for the ocean proximity variable values
such as 1 2 3 4 5 one way of doing that can be uh something like this but better way when it comes to using this type of variables in linear regression is to transform this uh string uh category type of variable to what we are calling dami variables so dami variable means that this variable takes two possible values and usually uh it is a binary Boolean variable which means that it can take two possible values zero and one where one means that the condition is Satisfied and zero means condition is not satisfied so let
me give you an example in this specific case we have that the ocean proximity has five different values and ocean proximity is just a single variable then uh what we will do is we will use the uh get uncore Dumis function in Python from pendes in order to uh go from this one variable to uh five different variable per each of this category which means that now we will Have new variables that uh will uh basically be uh whether uh it is a nearby or not whether it's less than 1 hour uh uh from the
ocean a variable whether it's Inland whether it's near Ocean or whether it's an island this will be a separate binary variables a dummy variable that will take value 0 and one which means means that we are going from one string categorical variable to five different dami variables and in this case um each of Those dami variables that you can see here we are creating five dami variables each of each for uh each of those five categories and then uh we are combining them and uh from the original data we will then be dropping the ocean
proximity data so on one hand we are getting rid of this Str variable which is a problematic variable for linear regression when combined with the pyler library because cyler cannot handle this type of um data when it comes to linear Regression and B we are making our job easier when it comes to interpreting the results so uh interpreting linear regression for closer analysis uh is much more easy when we have dami variables than when we have a one string categorical variable so just to give you an example if we are creating from this uh string
variable uh five different dami variables and those are those five different dami variables that you can see in here so this means that if we are Looking at this one category so let's say uh ocean _ proximity under Inland it means that for all the rows where we have the value equal to zero it means this criteria is not satisfied which means that uh ocean proximity uh underscore Inland is equal to zero which means that the house block we are dealing with is not from Inland so that criteria is not satisfied and otherwise if this
value is equal to one so for all these rows when the ocean Proximity Inland is equal to one It means that the criteria is satisfied and we are dealing with house blocks that are indeed in the Inland one thing to keep in mind uh when it comes to uh transforming a categorical variable to um set of dummies is that you always need to drop at least one of the categories and the reason for this is because we learned during the theory that uh we should have no perfect multicolinearity this means that um we Cannot have
five different variables that are perfectly correlated and if we include all these values and these variables it means that um when uh we know that the uh uh block of houses is not near the bay is not less than 1 hour ocean is not Inland is not near the ocean automatically we know that it should be the remaining category which is Inland so we know that for all those blocks the um uh ocean proximity underscore uh uh Island uh Iceland will Be equal to one and that's something that we want to avoid because that is
the definition of perfect multicolinearity So to avoid one of the oil assumptions to be violated we need to drop one of those categories so uh we can see in here uh that's exactly uh what I'm doing I'm saying so let's go ahead and actually drop one of those variables so let's see first what is the set of all variables we got so we got less than one hour uh Ocean Inland Iceland new bay and then uh new ocean let's actually draw drop one of them so let's drop the Iceland and uh that we can do
very simply by let me see I allowing me to add a code in here so we are doing data is equal to uh and then data. drop and then the name of the variable within the uh quotation marks and then uh X is equal to one so in this way I'm basically dropping one of the uh Dami variables that uh I created in order to avoid the perfect multicolinearity assumption to be violated and once I go ahead and print the columns now we should see uh this uh column uh disappearing here we go so we
successfully deleted that variable let's go ahead and actually get the head so now you can see that that we no longer have a string in our data but instead we got four additional binary variable out Of a string categorical variable with five categories all right now we are ready to do the actual work uh when it comes to the training a machine learning model uh or statistical model we learn during the uh theory that we always need to split that data into train uh and test set that is the minimum in some cases we also
need to do train validation test such that we can train the model on the training data and then optimize the Model on validation data and find out what is the optimal set of hyper parameters and then uh use this information to uh apply this fitted and optimized model on an unseen test data we are going to skip Dev validation set for Simplicity especially given that we are dealing with a very simple machine learning model as linear regression and we're going to split our data into train and test and here uh what I'm going to do
is first I'm creating this list of The name of variables that we are going to use in order to um train our machine learning model so uh we have a set of independent variables and a set of dependent variable so in our multiple linear regression here is the set of uh independent variables that we will have so we have longitude latitude housing median Edge total rooms population households median income median house value and then four different Categorical dumi uh four different uh dumy variables that we built from the categorical variable then um I am specifying
that the uh Target variable is so the target so the response variable or the dependent variable is the um median house value this is the value that we want to uh uh Target because we want to see what are the features and what are the independent variables out of the set of all features that have a Statistically significant impact on the uh dependent variable which is the median house value because we want to find out what are these features um describing the houses in the block that cause a change cause a variation in the um
Target variable such as the median house volume so here we have X is equal to and then uh from the data we are taking all the features that have the following names and then we have the uh Target which is a mid in house uh house Value and that's uh the column that we are going to subtract and select from the data so we are doing data filtering so here we are done selecting and what I'm using here is the train test spit function from the pych learn so you might recall that in the beginning
we spoke and imported this uh model selection um library and from the scyler model selection we imported the train _ testore split function now this is the function that you are going to Need quite a lot in machine learning because this a very easy way to uh split your data so um in here uh the arguments of this function is first the uh Matrix or the data frame that contains the independent variables in our case X so here you fill in X and then the second uh argument is the dependent variable so uh the Y
and then we have test size which means um what is the uh proportion of um observations that you want to put In the test and what is the proportion of observation that you um don't want to put basically in the training if you are putting 0.2 it means that you want your test size to be uh 20% of your entire 100% of data and the remaining 80% will be your training data so if you provide your point2 to this argument then the function automatically understands that you want this 80 20 division so 80% training and
then 20% test size and then finally you can also uh add the random State because the splitting is going to be random so the data is going to be randomly selected from the entire data and to ensure that your results are reproducible and uh the next time you are running this um notebook you will get the same results and also to ensure that me and you get the same results we will be using a random State and a random state of 111 is just um random number that I liked and decided to use here so
uh when we go and um use this And run this command you can see that we have a training Set Side 15K and then test size 38k so when you look at these numbers you will then get a verification that you are dealing with 20% versus 80% thresholds so then we go and we do the training one thing to keep in mind is that here we are using the SM Library uh an SM function that we imported from the uh stats models. API so this is one uh function that we can use in order to
conduct our uh as this and to trinal Linear aggression model so uh for that what we need to do so uh when we are using this Library uh this Library doesn't automatically add the uh first uh column of ones uh in your uh set of independent variables which means that it only goes and looks at what are the features that you have provided and those are all the independent variables but we learned from the theory that uh when it comes to to linear regression we always are adding this intercept so the Beta zero if you
go back to the theory lectures you can see this beta0 to be added to both to the simple linear regression and to the multiple linear regression this ensures that we look at this intercept and we see what is this average uh in this case median house value if all the other features are um equal to zero so um therefore given that the this specific stats models. API is not adding this uh constant um column to the beginning for intercept it means That we need to add this manually therefore we are saying sm. addore constant to
the ex train which means that uh now our uh x uh table or ex data frame uh adds a column of ones uh to the features so let me actually show you uh before doing the uh training cuz I think this also something that you should be aware of so if we do here a pause so I'm going to do xcore Shore uh constant and then I'm also going to print um the same um Feature data frame before adding this constant such that you see what I mean so as you can see here here this
is just the same set of all columns that form the independent variables the features so then when we add the constant now after doing that you can see that now we have this initial column of ones this is done such that we can have a uh beta zero at the end which is the intercept and we can then perform a valid multiple linear regression otherwise you don't Have an intercept and this is just not what you're looking for now the Cy learn Library does this automatically therefore when you are using uh this touch models. API
you should add this constant and then I use the pit learn without adding the constant and if you're wondering why to use this specific model as uh we already discussed about this just to refresh your memory we are using this T models. API because this one has this nice Property of visualizing the summary of your results so your P values your T Test your standard errors something that you definitely are looking for when you are performing a proper causal analysis and you want to identify the features that have a statistically significant impact on your dependent
variable if you are using a machine learning model including linear regression only for Predictive Analytics so in that case you can use the psychic learn without Worrying about using stats mods. API so this is about adding a constant uh now we are ready to actually uh fit our moral or train our model therefore what we need to do is to use sm. so OS is the ordinar squares estimation technique that we also discussed as part of the theory and we need to provide first the dependent variable so Yore train and then the um feature set
which is xcore Trainor constant so then what we need to do is to do do feed Parenthesis which means that take the OS model and you use the Yore train as my dependent variable and xcore Trainor constant as my independent variable set and then feed the OLS algorithm and linear regression on this specific data if you're wondering why y train or X train and what is the different between train and test and sure to go and revisit the training um Theory lectures because there I go in detail into this concept of training and testing and
how We can divide the data into train and test and uh this Y and X as we have already discussed during this tutorial is simply this distinction between independent variables defined by X and the dependent variable defined by y so y train y test is the dependent variable data for the training data and test data and then ex train x uh test is simply the training data features so xtray and then test data features X test we need to use xtray and Y train to fit our data To learn from the data and then once
it comes down to evaluating the model we need to uh use the fitted model from which we have learned using both the dependent variable and the independ variable set so y train Mi train and then uh once we have this model uh that is fitted we can apply this to an unseen data xcore test we can obtain the predictions and we can compare this to the true y so ycore test and to see how different the ycore test is from the Y Predictions for this unseen data and to evaluate how moral uh is performing this
prediction so how model is uh managing to identify the median uh house values and predict median house uh values based on the um fitted model and on an unseen data so exore yes so this is just a background info and some refreshment and now um in this case we are just uh fitting the data on the training uh dependent variable and then training uh independent variable added a constant And then we are ready to print the summary now let's now interpret those results first thing that we can see is that uh all the coefficients and
all the independent variables are statistically significant and how can I say this well um if we look in here we can see the column of P values this is the first thing that you need to look at when you are getting this results of a caal analysis and linear aggression so here we are seeing that the P value is very Small and just to refresh our memory P value says what is this probability that you have obtained too high of a test statistics uh given that this is just by a random chance so you are
seeing statistically significant results which is just by a random chance and not because your uh n h this is false and you need to reject it so that's one thing in here you can see you can see that we are getting much more so first thing that you can do is to verify that You have used the correct dependent variable so you can see here that the dependent variable is a median house value the model that is used to estimate those coefficients in your model is the OS the method is the least squares so list
squares is simply uh the uh technique that is the underlying approach of minimizing the sum of uh uh squared residuals so the least squares the days that we are running this analysis is the 26th of January of 2024 uh so we have the number of observations which is the number of training observations so the 80% of our original data we have R squ which is the um Matrix that showcases what is the um goodness of fit of your model so r s is a matrix that is commonly used in linear regression specifically to identify how
good your model is able to fit your data with this linear regression line and the r squ uh the maximum of R squ is one and the minimum is zero 0.58 uh in this case approximately 59 it means that uh all your data that you got and all your independent variables so those are all the independent variables that you have included they are able to explain 59% % so 0.59 out of the entire set of variation so 59% of variation in your response variable which is the median house value you are able to explain with
the set of independent variables that you have provided to the model now what does this Mean on one hand it means that you have a reasonable enough information so anything above 0.5 is quite good which means that more than half of the uh entire variation in your median house value you are able to explain but on the other hand it means also that there is approximately 40% of variation so information about your house values that you don't have in your data this means that you might consider going and looking for extra additional information So additional
independent variables to add on the top of the existing independent variables in order to increase this amount and to increase the amount of information and variation that you are a able to explain with your model so the R square this is like the best way to uh explain what is the quality of your regression model another thing that we have is the adjusted R squ adjusted r squ and r s in this specific case as you can see they are the same so 0 um 59 this usually means that uh you are fine when it
comes to amount of features that you are using once you uh overwhelm your model with too many features you will notice that the adjusted R squ will be different than your R squ so adjusted R squ helps you to understand whether your model is performing well only because you are adding so many of you of those variables or because really they contain some useful information cuz sometimes the r Squ it will automatically increase just because you are adding too many independent variables but in some cases those independent variables they are not useful so they are
just adding to the complexity of the model and possibly overfitting your model but not providing any edit information then we have the F statistics here which corresponds to the F test and uh F test um it comes from statistics uh you don't need to know it But I would say uh check out the fundamentals to statistics course if you do want to know it because it means that uh you are testing whether all these independent variables Al together whether they are helping to explain your dependent variable so the median house value and uh if the
F statistics is very large or the P value of your F statistics is very small so 0. it means that all your independent variables jointly are statistically Significant which means that all of them together help to explain your uh uh median house value and have a statistically significant impact on your median house value which means that you have a good set of independent variables so then we have the log likelihood not super relevant in this case you have the AIC Bic which stand for AAS information criteria and bation information criteria those are also not necessary
to know for now but once you Advance in your career in machine learning it might be useful to know at higher level for now think of it like um value that helps to understand this uh information that you gain when you are adding this uh set of independent variables to your model but this is just optional uh ignore it if you don't know it for now okay let's now go into the fun part so in this m uh part of the summary uh table we got first the set of uh independent variables so we have
our Constant which is The Intercept we have the longitude latitude housing median age total rooms population households median income and the four Dam variables that we have created then we have the coefficients corresponding to those independent variables those are basically the beta 0o beta 1 head beta 2 head Etc which are the um parameters of the linear regression model that our oils method has estimated based on the data that we have Provided now before interpreting this independent variables the first thing you need to do as I mentioned in the beginning is to look at this
P value column this showcases the set of all independent variables that are statistically significant and usually this table that you will get from a sator API is at 5% significant level so the alpha the threshold of statistical significant is equal to 5% and any P value that is smaller than 0.05 it means You are dealing with um statistically significant independent variable now the next thing that you can see here in the left is the T statistics this P value is based on a t test so this T Test is simply stating as we have during
the theory and you can also check the fundamental statistics course from lunar tech for more detailed understanding of this test but for now this T has um States a hypothesis whether um each of this independent variables individually Has a statistically significant impact on the dependent variable and whenever this uh test has a P value that is smaller than the 0.05 it means you are dealing with statistically significant uh independent variable in this case we're super lucky all our independent variables are statistically significant then the question is whether we have a positive statistical significance or negative
that's something that you can see by the Signs of these numbers so you can see that longitude has a negative coefficient latitude negative coefficient housing median AG positive coefficient Etc negative coefficient means that this independent variable causes a negative change in the dependent variable so more specifically when we look for instance the um let's say which one should we look uh let's say the uh total underscore rooms when we look at the total underscore rooms And it's minus 2.67 it means that when we look at this total number of rooms and we increase the number
of rooms uh by uh one additional unit so one more room added to the total underscore rooms then the uh house value uh decreases by minus 2.67 now you might be wondering but how is this possible well first of all the value the coefficient is quite small so on one hand it's it's not super relevant As we can see the uh relationship between them is not super strong because the U margin of this um coefficient is is quite small but on the other hand you can explain that at some point when you are adding more
rooms it just doesn't add any value and in fact in some cases just decreases the value of the house this might be the case at least this is the case based on this data we can see that if there is a negative coefficient then one unit increase in that specific Independent variables or else constant will result in um in this case for instance in case of the total rooms 2.67 decrease in the median house value everything else conert we are also referring to this ass set that is parus in econometrics which means that everything else
Conant so one more time let's refresh our memory on this so ensure that we are clear on this if we add one more room to the total number of rooms then the median house value will Decrease by 2.6 $7 and this when the longitude latitude house median age population households median income and all the other criterias are the same so if we have uh for instance this negative value this means that we are getting a decrease in the median house value if we have an increase by one unit in our uh total number of roles
now let's look at the opposite uh case when the coefficient is actually positive and large which is the The housing median age this means is if we have two houses they have uh exactly the same characteristics so they have the same longitude latitude they have the same total number of rooms population housing households median income they are uh the same in terms of the distance from the ocean then um if one of these houses has one more additional year added on the uh median age so housing median age so it's one year older then the
house value of this Specific house is higher by $846 so this house which has one more additional median age has $846 higher median house value compared to the one that has all these characteristics except it has just the um uh house Medan age that uh is one year less so one more additional uh year in the median age will result in 846 uh increase in the median house value everything else concerned so this is regarding this idea Of negative and then positive and then the margin of coefficient now let's look at one dami variable and
um explain the idea behind it and how we can interpret it and uh it's it's a good way to understand how the dond variables can be interpreted in the context of linear aggression so one of the independent variables is the ocean proximity Inland and the coefficient is equal to- 2108 e plus 0.5 this simply means- 210 uh K uh approximately and um what this Means is that if we have two houses they have exactly the same characteristics so their longitude latitude is the same house median age is the same they have the same total number
of rooms population households median income all these characteristics for this two blocks of houses is the same with a single difference that one block is located in the um Inland when it comes to Ocean proximity and the other block of houses is not located in the Inland So in this case the reference so the um category that we have removed from here was the I land you might recall uh so if the block of houses is in the Inland that their value is on average uh smaller and less by 210k when it comes to the
median house value compared to the block of houses that has exactly the same characteristics but it's not in the Inland so for instance is in the uh Iceland so uh when it comes to this dumi Variables where there is also an underlying referenced variable which you have deleted as part of your string categorical variable then you need to reference your dami variable to that specific category this might sound complex it is actually not I would say uh it's just a matter of practicing and trying to understand what is this approach of D variable it means
that you either have that criteria or not in this specific case it means that if you have Two blocks of houses with exactly the same characteristics and one block of houses is in the Inland and the other one is not in the Inland for instance is in the Iceland then the block of houses in the England will have on average 210,000 less uh median house value compared to the block of houses that is in the for instance in the Iceland uh when it comes to the ocean proximity which kind of uh makes sense because in
California people might Prefer living uh in the iseland location in the houses might have more demand when it comes to the iseland location compared to the um Inland locations so the longitude uh has a statistically significant impact on the uh median house value latitude house median age has an impact and causes a a statistically significant difference in the median house value if there is a change in the median age the total number of rooms have an impact on the Median house volume and the population has an impact households median income as well as the uh
proximity from the ocean and this is because all their P values is uh zero which means that they are smaller than 0.05 and this means that they all have a statistically significant impact on the median house value in the California housing market now when it comes to the uh interpretation of all of them uh we have interpreted just few uh for the Sake of Simplicity and ensuring that this uh this entire case study doesn't take too long but what I would suggest you to do is to uh interpret all of the uh coefficients here because
we have interpreted just the housing median age and the um the total number of rooms but you can also interpret the population uh as well as the median income and uh we have also interpreted one of those Dy variables but feel free also to interpret all the other ones so by doing This you can also uh Even build an entire case study paper in which you can explain in one or two pages the results that you have obtained and this will showcase that you have an understanding of how you can interpret the linear gressional results
another thing that I would suggest you to do is to uh add a comment on the standard error so let's now look into the standard errors we can see a huge standard error that we are um making and this is the direct Result of the fourth assumption that was violated now this case study is super important and useful in the way that it showcases what happens if some of your um assumptions are satisfied and if some of those assumptions are violated so in this specific case the Assumption related to the uh uh the uh errors
having a constant variance is violated so we have a heteros assist issue and that's something that we are seeing back in our results and this is a very good Example of the case that even without checking the assumptions you can already see that the standard error is very large and uh you can see here that given that the standard error is large this already gives a hint that most likely our heteros city uh is present and our homos just the assumption is violated you keep in mind this um idea of um large standard errors that
we just saw because we are going to see that this becomes a problem also for the um Performance of the model and we will see that we are obtaining a large error uh due to this and uh one more comment when it comes to the total rooms and the housing median age in some cases the linear regression results might not seem logical but sometimes they actually is an underlying explanation that can be provided or maybe your model is just overfitting or biased that's also possible and uh that's something that uh you can do by checking
Your OLS assumptions and uh before uh going to that stage I wanted to briefly showcase to you this um idea of predictions so we have now fitted our model on the uh uh training data and we are ready to perform the predictions so we can then use our fitted model and we can then uh use the test data so ex test in order to perform the predictions so to uh use that data to get new house median house values for the um blocks of houses for which we are not providing The uh corresponding median house
price so on the aning data we are uh re um applying our model that we have already fitted and we want to see what are this predicted median house values and then we can compare these predictions to the true median house values that we have but we are not yet exposing them and we want to see how good our model is doing a job of estimating and finding this unknown median house values for the test data so for all the blocks of houses for Which we have provided the characteristics in the X test but we
are not providing the Y test so uh as usual like in case of training we are adding a constant with this library and then we are saying model. F model uncore fitted so the fitted model and then that predict and providing the test data and those are the test predictions now uh once we do this we can then get the test predictions and uh if we print those you can see that we Are getting a list of house values those are the house values for the um um blocks of houses which were included as part
of the testing data so the 20% of our entire data set uh like I mentioned just before in order to ensure that your model is performing well you need to check the OS assumptions so uh during the um Theory section we learned that there are a couple of assumptions that your model should satisfy and your data should Satisfy uh for oils to provide uh B unbiased and um ient uh estimates which means that they are accurate their standard error is low something that um we are also seeing as part of the summary results and uh
your estimates are accurate so the standard error is a measure that showcases how efficient your estimates are which means um do you have a high variation uh can the coefficients that you are showing in this table vary a lot which means that You don't have accurate um coefficient and your coefficient can be all the way from one place to the other so the range is very L large which means that your standard error will be very large and this is a bad sign or you are dealing with an accurate estimation and uh it's more precise
estimation and in that case the standard ER will be low uh and unbiased estimate means that your estimates are a true representation of the pattern between each pair of Independent variable and the resp response variable if you want to learn more about this idea of bias unbias and then efficiency and sure to check the U fundamental to statistics course at lunar Tech because it explains very clearly these Concepts in detail so here I'm assuming that you know or maybe you don't even need it but I would suggest you to know at high level at least
then uh let's quickly do the checking of oils assumption so the first assumption is The linearity Assumption which means that your model is linear in parameters one way of checking that is by using your already fitted model and your uh predicted model so the Y uh uh test which are your true house median house values for your test data and then test predictions which are your uh predicted median house values for nonen data so you are using the uh True Values and the predicted values in order to um plot them and then to also plot
the best Fitted line in an ideal situation when you would make no error and your model would give you the exact True Values um and then see how well your um uh how linear is this relationship do we actually have a linear relationship now if the observed versus predicted values where the observes means the uh real uh test wise and the predicted means the test predictions if this pattern is kind of linear and matching this perfect linear line then You have um assumption one that is satisfied your linearity assumption is satisfied and you can say
that your uh data uh and your model is indeed linear in parameters then uh we have the second assumption which states that your uh sample should be random and this basically translates that the uh expectation of your error terms should be equal to zero and uh one way of checking this is by simply taking the Residuals from your fitted model so model oncore fitted and then that's residual so you take the residuales you obtain the average which is a good estimate of your expectation of errors and then this is the mean of residuales so the
average uh residuales uh where the residual is the estimate of your true error terms and then uh here what I do is just I just round up uh to the two decimals behind uh the uh the point this means that uh We are getting uh this average amount of uh errors or the estimate of the errors which we are refering as residuals and if this number is equal to zero which is the case so the mean of the residuals in our model is zero it means that indeed the um uh expectation of the uh error
terms Le the estimate of it expectation of the residual is indeed equal to zero another way of checking the um uh second assumption which is that the um model uh has a is based on a random sample and The sample we are using is random which means that the expectation of the error terms is equal to zero is by plotting the residuales versus fitted values so uh we are taking the residual from the fitted model and we are comparing to the fitted value that comes from the model uh and we are looking at this um
graph this scatter plot which you can see in here and we're looking where this um pattern is uh Symmetric uh around the uh threshold of zero so you can see this line kind of comes right in the middle of this pattern which means that on average we have residuales that are across zero so the mean of the residuales is equal to zero and that's exactly what we were calculating also here therefore we can say that we are indeed dealing with a random sample this quote is also super useful when it comes to the fourth assumption
That we will come a bit later so for now let's check the third assumption which is the Assumption of exogeneity so exogeneity means that uh each of our independent variables should be uncorrelated from the error terms so there is no omitted variable bias there is no um reverse causality which means that the uh independent variable has an impact on the dependent variable but not the other way around so dependent variable should not have an impact and Should not cause the independent variable so for that there are few ways that we can deal with uh with
this uh one way is just straightforward to compute the uh correlation coefficient between between each of these independ variables and the residuales that you have obtained from your fitted model that just simple uh technique that you can use in a very uh quick way to understand what is this uh correlation between each pair of independent Variable and the residuals which are the best estimates of your error terms and in this way you can understand that there is a correlation between your independent variables and your error terms now way you can do that and this is
more advanced and bit more um towards the econometric side is by using this test which is called the Durban uh view Houseman test so this uh Durban view housan test is um a more professional more advanced way of uh using an Econometrical test to find out whether you have um exogeneity so exogeneity assumption is satisfied or you have endogenity which means that one or multiple of your independent variables is potentially correlated with your error terms uh I won't go into detail of this test uh I put some explanation here and also feel free to uh
check any uh introductory to econometrics course to understand more on this Durban Vu housan Test for Exogen assumption the fourth assumption that we will talk about is the homoskedasticity homoscedasticity Assumption states that the error terms should have a variance that is constant which means that when we are looking at this variation that the model is making uh across different observations that when we look at them the variation is kind of constant so uh we have all these uh Cases when the uh in observations for which the residuals are bit small in some cases bit large
we have this miror when it comes to this figure with what we are calling heteros skos which means that homos skos assumption is violated our error terms do not have a variation that is constant across all the observations and we have a high variation and different variations for different observations so we have the heteros skill T we should consider a bit More um flexible approaches like uh GLS fgs GMM all bit more advanced econometrical algorithms so uh the final part of this case study will be to show you how you can do uh this all
but for machine learning traditional machine learning site by using the psychic learn so uh in here um I'm using the um standard scaler function in order to uh scale my data because we saw uh in the summary of the table um that we got from the stats uh Mos. API that our data is at a very high scale because the uh median house values are those large numbers the uh a AG uh the median age of the house is in this very large numbers that's something that you want to avoid when you are using the
linear regression as a Predictive Analytics model when you are using it for interpreting purposes then you should keep the skills because it's easier to interpret those values and to understand uh what is this difference in The median price uh of the house when you compare different characteristics of the blocks of houses but when it comes to using it for Predictive Analytics purposes which which means that you really care about the accuracy of your predictions then you need to uh scale your data and ensure that your data is standardized one way of doing that is by
using the standard scaler function uh in the psyit learn. preprocessing uh and uh the way I do it Is that I initialize the scaler by using the standard scaler and then parenthesis which I just imported from thisit learn library and then uh I am taking the scaler I'm doing do fitore transform exrain which basically means take the independent variables and ensure that we scale and standardize the data and standardization simply means that uh we are standardizing the data that we have to ensure that um some large values do not Wrongly influence the predictive power of
the model so the the model is not confused by the large numbers and finds a wrong variation but instead it focuses on the a true variation in the data based on how much the change in one independent variable causes a change in the dependent variable here given that we are dealing with a supervised burning algorithm uh the exrain uh scaled will be then containing our standardized uh features So independent variables and then each test SC will contain our standardized test features so the Unseen data that the model will not see during the training but only
during prediction and then what we will be doing is that we will also use the um y train and Y train uh is the dependent variable in our supervised model and why train corresponds to the training data so we will then first initialize the linear regression here so linear regression Model from pyit learn and then uh we will initialize the model this is just the empty linear regression model and then we will take this initialized uh model and then we will fit them on the uh training data so X uncore trained uncore scale so this
is the trained features and then the um uh dependent variable from training data so why train uh do note that I'm not scaling the dependent variable this is a common practice CU you don't want to uh Standardize your dependent variable rather than you want to ensure that your features are standard di because what you care is about the variation in your features and to ensure that the model doesn't mess up when it's learning from those features less when it comes to looking into the impact of those features on your dependent variable so then uh I
am fitting the uh model on this training data so uh features and then dependent variable and Then I'm using this fitted uh model the LR which which already has learned from this features and dependent variable during supervised training and then I'm using the ex test scale so the test standardized uh data in order to uh perform the prediction so to predict the median house values for the test data unseen data and you can notice that here in no places I'm using Y test y test I'm keeping to myself which is the dependent variable True Values
such that I can Then compare to this predicted values and see how well my motor was able to actually get the predictions now uh let's actually also do one more step I'm importing from the psyit learn the Matrix such as mean squar error uh and I'm using the mean squared error to find out how well my motel was able to predict those house prices so this means that uh we have on average we are making an error of 59 thousands of dollars when it comes to The median house prices which uh dependent on what we
consider as large or small this is something that we can look into so um like I mentioned in the beginning the uh idea behind linear regression using in this specific uh course is not to uh use it in terms of pure traditional machine learning but rather than to perform um causal analysis and to see how we can interpret it when it comes to the quality of the predictive power of the model then uh if You want to improve this model this can be considered as the next step you can understand whether your model is overfitting
and then the next step could be to apply for instance the um lasso regularization so lasso regression which addresses the overfitting you can also consider going back and removing more outliers from the data Maybe the outliers that we have removed was not enough so you can also apply that factor then another thing that you can do is to Consider bit more advanced machine learning algorithms because it can be that um although the um regression assumption is satisfied but still um using bit more flexible models like random Forest decision trees or boosting techniques will be bit
more appropriate and this will give you higher predictive power consider also uh uh working more with this uh scaled uh version or normalization of your data thank you for watching this video If you like this content make sure to check all the other videos available on this channel and don't forget to subscribe like and comment to help the algorithm to make this content more accessible to everyone across the world and if you want to get free resources make sure to check the free resources section at tech. a and if you want to become a job
ready data scientist and you are looking for this accessible boot camp that will help you to make a jbre Data scientist consider enrolling to the data science boot camp the ultimate data science boot cam at l. you will learn all the theory the fundamentals to become a jbre data scientist you will also implement the learn theory into a real world multiple data science projects beside this after learning the theory and practicing it with the real world case studies you will also prepare for your data science interviews and if you Want to stay up to date
with the recent developments in Tech what are the headlines that you have missed in the last week what are the open positions currently in the market across the globe and what are the tech startups that are making ways in the tech and sure to subscribe to the data science Andi newsletter from lunarch this video was sponsored by lunarch at lunarch we are all about making you ready for your dream job in Tech making Data science and AI accessible to everyone with's data science artificial intelligence or engineering at lunar Tech Academy we have coures and boot
camps to help you become a job ready professional we are here to help also businesses and schools and universities with a topn training modernization with data science and AI corporate training including the latest topics like generative AI with lunar Tech learning is easy fun and super practical we care About providing an endtoend learning experience that is both practical and grounded in fundamental knowledge our community is all about supporting each other making sure you get where you want to go ready to start your Tech Journey Lun Tech is where you begin for students or aspiring data
science and AI professionals visit Lun Tech Academy section to explore our courses and boot camps and just in general our programs businesses in Need for employee training Upscaling or data science and AI Solutions should head to the technology section on the learner Tech a page Enterprises looking for corporate training curriculum modernization and customize AI tools to enhance education please visit the lunarch Enterprises section at lunch. for a free consultation and customize estimate join Lun Tech and start building your future one data point at a time interested in machine learning or data science then This course
is for you Tobe will build the movier commander system using future selection con vectorization cosine similarity and at the end of the video we will create a web app using scrim blets building projects is one of the most effective ways to thoroughly learn a concept and develop essential skills this guide will walk you through building a movie recommendation system that's tailored to user preferences we will leverage a vath 10,000 movie de set As our foundation while their approach is intentionally simple it establishes the core building blocks common to the most sophisticated recommendations in the industry
think nflix Spotify or others will harness the power and versatility of python to manipulate and analyze our data the pandas Library will streamline data appropriation and PSY learn will provide robust machine learning algorithms by convex Riser and cosign Similarity user experiences is key so we will design an intuitive web application for effortless movie selection and the recommendation display at the end of the video develop a data D mindset understand the essential steps in building a recommendation system we will Master Core machine learning techniques dive into Data manipulation future engineering and machine learning for user recommendations
and create a user Center solution deliver a seamless Experience for personalized movie suggestion so let's get start it all right so let's now go over to DS will be using for the mve commander system we are using the tmdb movies dat set from C this dat set is crucial for developing A system that recommends films tailed to your preferences and introduces you to new titles we selected this data set for its comprehensive movie data it includes ID which is essential for movie identification title genre original Language and many other features but the key features that
we will focus on are IDE and title the genre and overview and to at on top of this we will combine the tags of overview and genre together for so we have selected this DS because of how big it is it is around 10K top rated tmdb movies and it has as you see many features so let's move on to next chapter which is future engineering okay so now let's go with the Futures that we'll be using Futures In recommended system are essentially the the data points you use to make decisions about what to recommend
these features helping identifying similarities between movies which is crucial for generating personalized recommendations for your system to be effective it's vital to select features that offer meaningful insight into the content of the movies and the the preferences of the users so you have to be careful with what Futures you choose And for our movie recommended system we will focus on several key features ID this serves as unique identify for each movie crucial for indexing and retrieving movie information accurately title the most basic yet essential future and abling users to identify movies by their names genre
this categorizes movies into different parts facilitating recommendations based on content similarity and user preferences the genre plays a pivotal role in Personalization and of course the overview overing a brief summary of the movie's plot to overview access to rich source for Content based filtering through NLP so we will be using that overview combined with JRE he create a very comprehensive descriptor for each movie there therefore you're able to recommend movies more accurately now combining the over with the genre into a single PS future gives us a fuller picture of each movie this combination Helps the
system to better analyze and find movies that are similar in theme story or style so for example let's consider a movie like Inception it's OB might be like a thief who steals corporate Secrets through dream sharing technology who his task with planting an idea into the mind of a CEO how the genre could be listed as action science fiction Adventure how if you combine this with a text which is action science fiction and Adventure you get a more Fuller pictures which is actions science fiction Adventure th corporate Secrets dream sh technology planting an idea or
a CEO which might recomend your movie like to made for but the the main point is that when you combine the overview with the text you get a much more sophisticated future that you can convert to a much more sophisticated numerical data point the we you can use to better command movies and before you Use the before using the overview and genre data for the Lord of the Rings for example you would remove the and all now we pre-process the selected feutures because there are many movies that include stop fors in their title or in
their genre or overview and those are words that don't contribute anything to the title and therefore we PR process them and remove them and clean out data so those are for example words such as the and in his and other words as well So we have done our future selection and now let's move on to content based versus collaborative based recommend assistance otherwise let's explore now the key recommender systems that are currently being used by Netflix Amazon and other big Tech so there are two main recommended systems which are content based and ctive filtering recommender
systems now let's start with a Content based Commander system now a Content based Commander system only uses the Futures and the overview of the movies that you have likeed to recommend the simil movies so let's say you talk with a friend and you say I have like this I lik Iron Man because of his features such as the Director the genre is and or the overview it will recommend similar movies based on those features so it won't use any other data for example what other people have liked or what her rating is of other movies
or What rating you have given to other movies it's just based solely on the features of the movies that you have liked previously so for example if you've enjoyed Inception a Content based a Content based system I suggest Interstellar because both movies share a similar director and the complex narrative structure genre and overview now let's go on to collaborative filter and recommender system so in Netflix you often see for Example if you have watched and enjoyed stranger things the Netflix might often recommend The Witcher to you because other users who lik stranger things also enjoyed
the Witcher now what's important to note here is that it doesn't use the features or the overview or the other informative data points it only uses what other users have liked what other users that has similar references like you have also liked so that's the difference the rist recommendation is Made based on the viewing habits and the preferences of a larger group of viewers with similar taste to yours now this method doesn't rely on it Futures but on the wisdom of the crowd it uses patterns of ratings or interactions from many users to predict what
an individual like for example if users who like The Avengers also enjoyed the Guardians of the Galaxy you might receive a recommendation for Guardians of the Galaxy if you like The Avengers while Both systems are effective on their own if you could combine them that would that would enhance the accuracy of the recommender system so for example you start off with content based or Commander system but once you start getting more data you can also use C filter collaborative filtering recommended system to provide more accurate recommendations in this session we will focus on the cral
element of transforming text into numerical vectors So our models aren't able to be trained on text but they are be they're able to be trained on newer vectors which means that we have to convert our text into a vector so to do that we use count vectorizer this method simplifies text analysis By ignoring the order of words instead focusing on the frequency by transrating text into a numerical data we'll be able to classify documents a vital function that allows our systems to process and organize large amounts of Text Data efficiency we imp to provide a
straightforward practical understanding of this essential technique so that we can move on to on next chapter which is cosign similarity to provide a more targeted example to provide a more targeted example let's say we are considered three movie overviews which is an an action pack Adventure Venture and Adventure that or adventure movies inpire me and we both love heart racing Adventures so if we count the wordss that we are using here which is pun action hack adventure movies pinspire me we both love art racing so behind the we use the following words pin our sentences
now for our and have sh par Adventure we use we use pun once patum PEX Adventure we use it as well but for the rest use so our Vector is 1 1 1 0 0 0 0 z z and this is our Vector now for the SE example which is pun adventure movies inspire me if we use no pen we use no action pack we use them to ones we use movies once you use Inspire once and you reuse me once and the rest we don't use it as well which means we get the vector
0 0 1 1 1 1 0 zero Zer here here so this is our second vector and you can do the same for we both love part racing venes but I think you get the idea so this step is key as It transform statue information into a numerical format a vector each movie or overview is sence converted into a vector in a high in high dimensional space where each unique word is a dimension and the W's frequency and overview is the value in that Dimension now this structure allows machine learning models to introvert the data
aing tests such as general classification movies as an example so let's see we have Iron Man one one two Three and the Avengers we try to create a vector out of this we use iron B [Music] on so we have the use of words Iron Man 1 2 3 so for example if we have Iron Man one you can expect the following Vector 1 one 1 0 0 you get at this one one0 0 two then you can expect what the structure to be the one Z for Iron Man threee we can expect b1101 so
this process of conver terization translating movies titles Into the milk of vectors is a corn Stone in the realm of text analysis it allows us to convert unstructured text suggest movie titles into a structured format it became later Ed in our machine learning algorithms and we can extend this process to a larger and more complex body of text for instance we could apply con vectorization to movie descriptions reviews or even entire scripts IR respected of the text size or complexity the conf T method remains effective it Allows us to handle a broad spectrum of Text
data to understand the concept of quent similarity to the learns of movies let's take a simple example imagine we are comparing two movies based on the Gen sci-fi thriller and just scii now close similarity is the following it's an equation of the dotproduct of two vectors and the multiplication of the magnitudes of the two vectors so of a scii tuler and the convert Converted to a vector which is s and then crer is a one one and S just s is just one Zer so when we would multiply them you would do 1 + 1
+ 1 * 0 and the magnitud of11 which is the magnitud of a is equal to square root root or Square of one a square of one and square root of it which is two half for sci-fi which is Vector B is the magnitude of0 Square so 1 * so this is equal to 1/ two now if you take the close side of this you will get 0.71 so this is how similar the two movies are the 0.71 now let's have a more broader example so we have Iron Man one and we compare it with
the Avengers and ofenham now Iron Man 1 one 2 or3 it's an action sci-fi Mount Avengers is an action Sai and adventure and Oken is drama and historical now we will calculate the Sim CL and between this movies to the genre And we will recommend movies based on their genre so if we try to make factors out of them out of the genres of each movie you have action venture or you can also do the action sci-fi Adventure drama and historical then Iron Man would be one one 0000 Avengers or be the one 1 1
0 0 and Oaker will be 00 0 011 so let's calculate the similarities between Iron Man and the D Avengers the As I mentioned cosine cosine similarity is equal to Del of the two factors divided by my a Time B so when we take the example of Iron Man and the Avengers we get that get the following vectors 1 1 0 0 0 and the vur which is 1 1 1 0 0 so now when you take a do part of this you get 1 * 1 1 * 1 1 * 0 0 0 0
0 [Music] now if we calculate the cost and suar between Iron Man and the open Hammer that will get zero B [Music] called so as you can see we we calculate the coine similarity using this formula all right let's get started so I have Britten code already so I will just walk into it and let's start with step one which is import pandas so since we will be using pandas we must first import it this is the day experation and pre-processing part but let's install install the pandas okay so let's start with the future selection
part which Means first you not list all the columns in there from to identify relevant features so these are all the columns that we have inside our data set we going to combine overv and genre into a column which will have the name tags and that's one it perfect now move us around into G now there's a new column which is tags now we have an additional tags and this is the only po that we will be using to run them or an our model on which Means we don't need overvie in John anymore so
we can get rid of it we do that by saying mu the is equal to movies. drop and we are going to drop the columns over View and genre perfect as you case you have only ID title and tex Tex and this is great now the text cleaning part we will import NK and then necessary [Music] Modules for our text [Music] preprocessing now let run [Music] this CU our first prob will download the necessary modules through anare resources for pre-processing okay let's start with playing the text so first we want to make sure that it's
not a string do that by saying if not it's instance of [Music] Text now we want to also say all movies titles or the text column must all be lower [Music] case we also want to remove punctuation uh with C digits we also want to tokenize it move upwards and join the hor back together and now we want to L again our clean [Music] The perfect all right right now up to the comater part [Music] [Music] so build it import this and also install it if not already installed but it's Solly install we apply the
cream text function to a types scollar and we create a new column which size TX Queen so we clean the data Here okay so let's finally as a com rizer bra's Futures is 10,000 and stop wordss is a English and we also what we are say here is that our maximum number of Futures are 10,000 and and we must remove all the stop words that contain inside the English English [Music] vocabulary okay now we're back to S hour Um or so now we fit the test data into an array and factorize [Music] it perfect we
have only data the [Music] ID and effect all right so now we have can import coign similarity and start or initializing coine Similarity so through this line we calculate the coign similarity between the movies and and based on the vector representations that we did we already vectorized it through our coner now we can calculate the similarity between them and that's you do that by initializing it and saying similarities perfect it is run now let's see the similarity Great now we click on your beta. info and we we have there is still the same with 10,000
IDs 10,000 titles and still some missing overviews or tags Perfect all right everything looks fine now we want to identify and pin the so here we will we want to see how our model has Prof formed so we are going to calculate we are going to Identify and print the titles of the top five similar movies to a given movie based on coign similarity and here we will do it for [Music] um from movie three movie four let's say which one was movie four [Music] [Music] with for is Godfather part two with this function we
see If our model works so what we want to do here is to calculate the distance between we want to calculate the most similar or the movies that have the highest coign similarity score with movie four and we we want to reverse uh return the reverse order of the list meaning that the most similar movie will come on top and we want to do this by going from 0 to 5 which means we want to only recommend five Movies and of course we want to then print the data of the list of five movies meaning
the title of the list that contains the five most similar movies to Movie 4 perfect so movie four which is the C father or something that related to Godfather Comm as Godfather SP [Music] two all right so let's now create a function that will only recommend us movie based on the title of the movie so Not the ID but just the [Music] title and let's see if that works [Music] okay so now let's save the modified data frame and the similarity Matrix for the use we are going to use this in our web app so
let's import pickle and want to also download the mov list SAR this scod and Want to load the saved similarity [Music] score and perfect all right I will see you in the last part which is building the actual so as you can see the front end will be provided to you and the only thing you have to do is just to C the content f.p s St import pickle import Requests so we have to first write the function to fetch movie poster using movie ID now this function will be provided to you I can you
do that by connecting the API now you have some move load movie data and similarity Matrix you can do it by movies is equal to pickle load we have done the similar in call up but now we're loading It and we have to also do the same for similarity and for M [Music] list so let's now create the header for web app should be should be see header commanded [Music] system so now we have to also bring the necessary components for our streamlet we will be using we will be creating an image Carousel and for
that we must First import the components so we will be importing necessary components to create an image carousel [Music] [Music] so we'll be [Music] using the component from our front end and public [Music] and as a base we want to be able to Fetch movie posters using movie ID or there just to be a base or movies that are recommended so [Music] that that it's not empty and this is works just fine and of course let's display the image TR component and now create a drop- down menu for call movies [Music] page and now let's
create our commmand function so what we want to do first is [Music] is create so the way we are going to do it is by first finding the index of a selected movie from our data frame based on its title and then we are going to calculate its similarity scores against the movies and we are going to rank it from the Movie that's the most similar meaning the highest Co similarity score to the the lowest and we are going to provide we are going to recommend five movies of course you can recommend many or 50
movies that's completely up to you so let's now first find the index or [Music] Um create a variable index [Music] and now let's calculate it by the similarity score similarity score can do that by [Music] distance so as you can see we first calculate the similarity score will will return the perverse order meaning the highest ranking to the lowest [Music] and now we want to initialize the list which we will do by command will put to to list land poster is equal to an empty list [Music] [Music] we [Music] ID command poster [Music] [Music] perfect
and it works perfectly all right so let me walk you through the code to show you exactly how you can do it as well so we first import the streamlet PCO and of course press for our to fetch posters from for our movie IDs the images because we don't have it stored and then we load our data we load our movies list and of course the similarity Score and we grab it t titles now we create the header of our web app and to create a carousel we must first bring or import components for our
streamlet here we initialize our CER component we feture some movie posters images for them to be basic list of [Music] Available um images or rather movies that people can use to navigate without it's just a basic image Carousel and then we display the image charer component creating of course the drop down menus and here's the main function of our movie recommander system which is basically we first initialize the index of our movies or movie movie then we calculate its distance or that the Similarity score between all the movies and return [Music] um one that ranks
the highest so which means to move that the most simar will be returned and all right so let's Now work on the main function this is our main function which allows us to recommend movies based on index so here we initialize the index of the movie then we calculate Here the most similar movies in respect to our selected movie we initialize the list of movie and poster and here we just fill in the lists and once everything is filled we just return it this is the button that we will click on to recommend movies we
will have or five columns and each column will have a text and it works just fine as you can See when we click on the recommend Iron Man thank you for watching this video I hope you enjoyed creating the movie recommendation system if you like to watch more of this content then make sure to subscribe and other than that I will see you guys in the next video when you are to enter in business you need to add at least show that you can work for the test that you are going to be hiring for
my stter point is like showing your data science protfolio Showing that you could actually do the work right so a strong data science portfolio basically you learn the data science communication translation skills business acument all plus but extra but for you as a hiring manager during the hiring process you pay attention at the projects that they have completed unless they have already in [Music] experience hi everybody Welcome Cornelius really excited to have you Here an experienced data scientist a top voice in the field of data science and with wealth of knowledge here to share with
you uh Cornelius uh you are a data science manager at aliens so can you walk us uh through your journey uh in the data science field and how you uh went through the corporate leather oh that's a nice story I think so I think I will go back to the beginning for us so before I go to like all the corporate stuff so I starting Actually like uh every student every aspirant scientist I want to become a data scientist but I'm starting not majoring in any data science field I actually majoring in a biology and
aary biology my bachelor my master atic is all about biology I'm a research at heart obiously but this um just like a moment like where I become a data scientist that I want to become a data scientist so this is the moment that on the first Coming where I was like in my master study I listening at W goar and I try to see what kind of a job that I could have in the future because like like this like as biologist especially by country Indonesia it's kind of a little bit hard to actually finding
that securing that I I was say like a really money money having some some money in the job like that's like Financial security job because like it's a little bit harder in Indonesia And if even it work I if you do as a biologist here this my patient but then I try to find something that is still related to my biology it's still to related that my patient like I in research but what could actually make some money and that's where I actually finding some like called data science so this is like what I try
to search doing my master time look at this web scientist and then I tried to learning about it that okay they're using esta Thing they're using leing things and then they try to use data to actually solving a business problem and then is actually using in business and then day I realize okay this my could be my next current move so at that time my master time even like I'm not yet I try to learn as fast as possible regarding data science I try to joining this kind of online courses I try tojoy it like
I community in the social media and I try to read it as Much as possible and then yeah when I coming back from my master to back to India yeah I focusing on self ring again and then I try to join like uh really offline classes through the data scientist yeah and then they try to make a connection as much as possible from all of the my previous connection that I already have like could I become a data scientist in your company basically how like do you feel yeah and then I by a little bit
hard work so I can s like From 2018 to 2019 9 like one and year and a half work moving from this field from field and then I become a this year super cool that's quite an impressive journey in such short amount of time you managed to you know to go through all these different levels because you know data science can be tough especially in the beginning when you want to get your first job uh as a junior data scientist with almost no experience right and you were went from Uh biology not a so-called traditional
uh data science study all the way to the fields of data science so uh let's talk about that can you talk about some of the challenges that you uh overcame when you were just starting out with no experience at all yeah there's like a lot of challenges even like when my first start as a data scientist I don't know anything I I know it's a little bit about programming I know a little bit About St I know like a little bit here and there but when you already being in the corporate basically because like my
V job right now as a like in the corporate there's like a lot of things to be learning especially in the business partk yeah you can develop like someing model I mean like you can try to running this code but this is is it solving this business problem is it like really valuable to our business is it like really could be convincing enough To be used by the business people and the business team or even like from the customer part to actually they want to using this kind my model like my study my learning so
this is actually something that I learned during the my uh first year if I'm like if I like the law if like the law I still relearning like what is about the business itself because like as a jior scientist we are working in The Bu we are working usually in the to execute to performing uh the TK is going to be right but in corporate it's a little bit different I think like the start upd that might be you could try to like you could like be everything as but in corporate it's having a this
little bit different because it's everything is already being structured organized already in there and there already um business moving there and the business is already know what they want the business already know what could make money in there and then the Business already know what it what should be data science and stuff data science is could be there to automate stuff could be try to INF figurate all the new thing in the business but the but as the data science itself if you're like a junior we need to understand why this Bri process moving so
for example like in my previous experience as Junior like one of the first modeling that I create is like this called a propensity to by model so this is like model it's Like a modeling that try to predict which customer that could be re approached to buy a new insurance basically or a new product it seems easy it's like you just take a data of the customer and then you could try to which one is the one who buy which one who don't buy and then you try to create a modeling that's there but the
movie part is actually there's a lot of the that there is a lot of puzzle pieces in there so yeah you have the Model but what kind of product that you actually sell who's the target of the customer who's going to execute this list who's where the where's the medication is going to be so and then how long it's going to be the is campaign how long is it going to be is it going to be like a continuous or is it like just one time there's like a so much moving part in there and
then you need to like working a lot with the business so that's one of the part that Where I was like still in junior that I relearning but that's actually one of the thing that make my carer actually boosted up because I Le so much about the business and I also learning about the communication within this business part because like like I said before like you want to convince the in the customer I say like customer is user the that they you want to use this model you want I I can do my job but
can you believe my job so B like that it takes Like a really good translation between our technical to the business uh term basically so I canot present my model as like for example like a random Forest this is like how this is work and this like this the Precision this is like recall classification model no they right yeah but what I understand is like for example like I create a proper model this kind of modeling when we are simulated basically it could have to your Ki your Ki is made for example like Uh income
like increase the revenue in monthly it could increase the revenue 20% compared to the noral job that you do and simulation so there there just example but this like a some kind of uh need to be rethinking compared from our technical to how the business actually speaking so I always I always call it translation and I think I couldn't say it even better than what you just said it's really important also from my experience and from what I've seen in my Colleagues being a data scientist is not just you know crunching numbers people think it's
just statistics or maybe some mathematics or really data but it's like you just said all these different skill set that have to come together like business acument what you just mentioned and uh communication translation from business you know to technical terms because uh usually uh the product managers will never come and tell you please make an classification model for Me to classify this thing no they just come and say I need to do this and then you need to to realize that oh you need a classification model oh you need to select those approaches and
I think you perfectly mentioned that this is this combination of different things that from one view it might seem very easy but when you dive deeper you clean the data where to collect the data where to store it how to process it how to make the impact how to measure it so I I Totally understand and how you could go and grow and you know go through this corporate letter very quickly because early on in your career you had an opportunity to learn and gain all these different skills and I think also for our audience
for our aspiring data scientists this is definitely note to make if they want to grow quickly in their career they need to be prepared to work on their communication skills on their business skills like you did uh on Their uh translation skills how to translate from uh business kpis okrs to an actual data science problem on that note since you mentioned very interesting project was there an any particular project in your journey early on when you were data scientist that as a junior meteor that kind of made your career to you know to be um
uh to set yourself apart from other ones and be promoted yeah precisely that's like this this is actually accumulative I don't Say that it's just like one question because like oh maybe I could a little bit the beginning so one of the things that I also try to do during my junior time I feel like a medical is to take an initiative so I not just move I'm not just moving like okay this is your project and then this like you need to do this and that but even like for my junior level I TR
I'm not try to taking like a job like that but it's more like I try to communicate my boss like I know this is like a really interesting project I want to take it or like I know this is going to be having a good business impact so can we to try to talk at Fair spe the business user this kind of project we could try to create together so this is like uh try to take initiative on my own on my career so I could know where I could try to move during that time
but yeah as this is accumulated and this is like one Project that I know like uh during that time that this is this is the project that I done uh is called the the LLP project so it's basically try to create a prediction which of the customer email was as not a SP sorry uh have complain or not so it's like you know like the customer sometimes like really complaining but right yeah but it's not just a complaint what we want to try to predict is like is like is this kind of complaint could be
Uh damaging to our reputation or not so this like the kind of the complaint with TR 3D so in this time uh this kind of project that I undertake is something that uh I could say try to make because like this is about a kind of a project that was done before in our commy at final and then I try to say to the buiness user and my boss that I want to take this kind of project I know this is like could be like really useful for our business and then yeah I will try
to Take a responsibility like from managing this type of project and then yeah going it's going well and it it's like yeah even right now uh the business users still what's it called I still want to try to communicate with me for any kind of project but in the this is because I already starting with this kind of project and they try to in in it and they still try to approach me if it's like they have like a some of problem with another initiative that they want Who done so it start simple it starts
just from ourself but this kind of acob things will be seen will be seen if like our boss uh colleague by your any uh any business partner and under so just take an initiative I think it's like some uh could be like really make on brick kind not just brick it really makes understood so kind of be proactive if you um like you you are already a junior data scientist you already have Some basic skills you have worked with some um senior data scientist under supervision couple of months now it's time to take initiative to
uh look around uh you know Network and see what kind of projects are on the table and then go and try to make something from even if something seems boring or uninteresting you kind of need to dig deeper like you did right so you basically identified them and said to Your boss well can I work on this and if this is impactful because at the end end of the day you will always go to Next Level if you are doing a project if if it has a lot of impact right yes yes that's true and
then yeah it's also uh fin is still from my junior level I already if feel like before I become data scientist right I already having I know what I want to be I know what I want to do and then when I already become what I do I'm not stopping the I Try to making what is called a master plan basically like this is my going to be my career but I need to take uh into my hand basically this is also what my B always always said like before like this is like already inspired
to me like your carer is in your head your life is in your hand so every kind of movie that you make you need to be responsible for that but it's also need to be something that make you improve as you're a person or as your career at anything so that's Why I always try to take initiative in there and it and it worked out right because now you are data science manager you are top voice in data science so it worked out well which is amazing and on that note given your uh nontraditional data
science background because you you have background in biology and there are many people out there from our listeners uh also uh who want to make a career change or who come from a non-traditional data Science background meaning they don't have a traditional statistic ICS mathematics or data science master's degree um I wanted to uh I wanted to ask you actually you can tell to our audience about the impact that your unique background had on your career it's really interesting yeah because as a biologist before I not using much about really regarding my education like use
the specific work for example like gin or like a protein of Course I'm not using that in why everyday life but this more about what how I try to Method thinking during my time of working as a biologist as a researcher really how I structure my work during that time how do I using my statistic and fin I think using that from my biological thinking and and what's it called uh and my education part I think like really taking part like how I could try actually uh break down a lot of like uh The academic
because academics you need to like big down like for example like the structure of your work need to be like from uh the theory and then how the methodology and then how this going to be work the result of the coration so all that that my experience previously uh it actually helping me working as a data scientist but of course like I know every person having like a different background I def Le for Myself because I still from the science part this kind of risk methodology could be really useful during that as dat land this
but I know like there could be like a people coming from the literature for example like com from the philosophy like com from uh the language and then something that really not related to a programming or like really data at all but I think like it's like uh could have like a different unique perspective as well in there and the Very Leist uh when You're approaching your work it could be like how do you approach your work during your study could be applicable the way you're working at the time but I finist if you have
already have the basic I think even if you are not from the data science major so you basically used on your advantage in your advantage all your background even if it was a non-traditional data science background yes so what would be your advice to any One who um who has this non-traditional background in terms of specific steps yes having a rock map for me is like the best one so I know it's like a little bit a cliche I think I to say but having a rock map to follow is really helpful so there's already
a lot in the online like right now like how to become a data scientist step to step First Learning statistic second learn it third learning the machine learning fourth having like the machine learning project Data science project five like try to apply try to communicate try to uh uh learning the presentation this kind of Step might boring but following those step is actually important because like it really a structure way because like to Pi a data SCI I'm going to say like it just start small in there but those step like if ter statistic programming
and machine learning it's are going to be uh connected each other to each other the skill and it going to help you to Become a data sence that you want so just uh following the rockmat that I already have it going to be helpful rather than you try to learning yourself like jumping here and there try to I emo learning then I try to just learn those kind of things but then you forget about the statistic then you forget about the math or you just learning about the programming then forget about the how to presentend
it to the business user just uh follow the step one by one and Then you I think it could be already good so in organized way basically to follow the road map yeah yeah okay and have a clear plan I suppose yes having had clear plan is actually really helpful because like as uh this is like coming from my out experience as well because like I try to create a plan for myself to become a data scientist and then I try to follow that plan and it actually work but it work out not well but
this kind of plan that actually I Think really helped I def is for me because like I know if we are not focusing because this plan can uh this kind of plan actually make you focus because if you are not focusing on that thing you could just going anywhere at that lose lose loose loose lose your way to becoming like the data typ that you want right right no I absolutely agree with you having a clear plan and uh learn in an organized way and not just to go from one course to the other one
To try to learn everything that that would be indeed the best way because otherwise they will everyone will be spending ton of time on learning one skill and then there is a new skill to learn by the time they finish another one amazing and uh when it comes to Leading because you are a data science manager and you have gone through these different steps already and you are leading teams What in your opinion uh is that helped you uh to balance the two so The technical side and leading people how do you manage projects and
people and uh at the same time uh you driving your career yeah that's really hard I could see but this is like something that I also learned during my corporate time because leading and then try to become an individual contut I could already said like it's the different things on one side like uh in the technical part we already understand but Become a leader become a to the people we need to understand how to delegate how to delegate what kind of the task what kind of project and then trust the other that they could actually
done it and then by to trusting each other I trust myself and I trust you as a college I would say like you are also my friend if feel like you are uh like iing but it's also are friend that in we are working together in here we are not just like boss and okay you do this you do That and you are okay bye no but we are trying to communicate together what are the problem in here what how we could try to work together in this here but I try to not to back
from myage like I try to jumping in there doing every stuff like do but trusting the people and the delegate to the people is actually really important but it's coming from the learning of course of course like as an individual contributor and I still really love Encoding I still love programming I do it right now I still do it right now I really want to jum get like doing this bordering and stuff uh at any time but trusting the others is like it's a learning learning to managing this project it could be you could do
this and then learning that I could actually uh doing this stuff uh much better when I actually already being in this sport right now but so yeah it's it's still AAR process for me I could say like it's Still alarming is uh never ending journey I could say but delegating to the people that I trust having a Trust basically to the other is like the important to balancing things on that note because um maybe some of our listeners who are inspiring uh tech people and who haven't worked in the field they don't know this concept
of individual contributor what you just mentioned because for anyone who knows who doesn't know this term uh we have Usually just two different career paths right in data science when you just join you can uh you usually join as an individual contributor so you work on the technical stuff and once you go through your career there's usually two ways to go one is the uh individual career and as an um uh individual contributor and the other one is the managerial path which I assume is the one that you are following because you are managing people
yes so um that's About that uh you mentioned working as an individual contributor and then now you are manager which means that you are managing people and supervising them and only note of the trust because that's something that you mention a lot when explaining your leadership skills the trusting in others but how do you build trust as a data scientist when you have just joined your uh data science uh job yes it's my proof I think because that's what I do to build the trust with My boss with my colleague I Pro that in my
job I could actually do it and then I try to present how my result is going to be and then I try to ProActive at my work trust is not built just like by you saying it but trust is built by you are having a proof of your work trust is coming from the things that you already did is from the action itself I know like communicating is maybe not that easy as well but just by but your work is your uh it's like a promise in there During that time so I know like as
a junior data scientist like yeah I I can do I can do it I can do it I said I can do it like that and I know like uh people that I it is like yeah I can do it but how's they work so it is that that's actually that that that that work that actually making that the the proof of the trust so as a jury I think the best way is like yeah just do the work if like a small like as you said previously youly like it might be boring Stuff it
might be an assum it might be not leading anywhere but those kind of work show that you could actually do the work rather than just so does get the work done basically yeah get the work down right right that makes sense so then you can build the trust uh make sure that you help your supervisor because that's at the end of the day the job of the data scci Junior data scientist to help uh the senior data scientist and other leaders in the Team to make their life easier whether it's unfortunately data uh collection data
analysis or not and um on the note of uh leading the teams and becoming manager in your opinion um how can one decide whether they want to take the path towards individual contributors so become a principal data scientist and then stay like an individual contributor or they should consider the path uh towards becoming a manager data science manager what qualities uh do you need to Have to go towards One path or the other I think it's coming from yourself because every single person have like a different evaluation right but what they kind want to be
like for example for my site I love both side I already said like I love both side like as individual contributor as they leading I love both of them so I want to try to actually experience having like this experience as a leader as well so because I already have a experience as a different Contributor that's why I try to move to that site as well to try to do the leadership part but of course like it's coming back to yourself again what do you want to do in your future because like I cannot tell
like for as every single person that you need to be like become a manager is the best path because it's like to was money or like become is much better because it's like really coming to yourself or like your comfort zone where you want to be in the Future so it need to become to yourself I mean like every single puff car way have their pro and con it's always like that it's just like the decision that you already make you need to responsible for that that makes sense so what it is to be a
data science manager um walk us through your kind of daytoday your responsibilities for the listeners to get an idea what it means to be a data science Manager yes uh unique in maret say like still Ling the thing of course like having to having like a meeting like the themes and then having like managing the project what what's our project right now what the what's how how it going and then uh how is the progressing so it's like managing the progress of the work and in there and then try to balance it with the business
user what their expectation is and then translate it into our technical so it Could be actually making those kind of progress so it's like balancing with everybody uh from every side as well so I will say like those kind of work that I usually like the da day so managing uh the people like how what do you going to be like in this week or what going to this day or and then they try to when the business user coming how do we going to make this project successful right so that's dayto day basically that
makes sense so managing People lot of communication meetings that's definitely also part of the managerial position that's for sure so if you don't like meetings and you don't like communicating with product people and business people you should not choose for death path right that's true that's true that's yeah I could I would say like that the con because like I could say like if you see my calendar of course like the this meeting it could be like 8 n 10 12 it could be like this Lot and then it's like suddenly there like uhom me
what what do I need a j meeting there's a lot of many meeting yeah yeah for sure now I heard that a lot so that's something for our listeners to take into account when choosing which career path they need to go through and on the note of the projects because uh you have been leading projects and you have been also going through the corporate letter Successfully as your da science manager now at such a young age can you uh remember maybe there is like one project an impactful one that you end up uh going through
it was a tough project but you uh with your team have completed it and what were the main take ways from it um this like so many project but I think what the really impactful right now if I like still we still ongoing record is like uh we try to create this kind like a c fraud detection project so Basically it's like every company need it like a fraud detection model this is like a really really impactful to the business because like you know like truck usually not happening that often but when it happened it
could be Dage to finan Dage into the reputation Dage into the fbody on their business and then yeah this is like really ongoing project that need a lot of consideration like coming from this kind Of business like this customer like through the data set is coming from so it's like really involving a lot of OB buiness stakeholder and a lot of a technical person uh to be involved so this kind of project that we are doing uh I will say like taking a lot of people come and go because like if like this is like
really hard project but at the F list uh I would say it's like making an impact because like the business use use business user using it And then they try I try to trust our result but there's always a lot of room for improvement because like uh I I said4 a FR detection as a really that could be really damaging if could not do it uh properly so we need to present those result from this cl frud to be something that could be understable by the business so we try to create the model as good
as possible we need to have like the expandability as good as possible we Need to integrate all the business process as possible so those moving part really uh really right now are still inte uh the project that I really remember until now that I try to even uh try to what's it called uh in another project those kind of work that I done there I try to implement it in in the other because it's just like this kind of complexity really need to be uh taking consideration okay amazing sounds as a Tough problem fraud detection
um it if if you have a high error rate uh that can seriously impact your operations also uh this about money uh that we are talking about trol uh and on the node of money so uh you have gone uh through this different steps you start as a junior data scientist usually or as an intern in data science if you don't have any background at all and you haven't complete any projects and then you need to go through the steps usually you Become a metor data scientist and then you become um a senior data scientist
and you become then a data science manager like you are today so so what it takes to get a promotion is it only the technical side building that trust like you just said making an impact or you also need to actively promote yourself because I know it really is different from companies to companies but uh there are some common qualities that differentiate data scientists who stay In the same role for many years even if they have a very uh good skill set very Su the data scientists who very quickly like yourself go through these different
steps and get promoted a lot yes I think like you the corporate yeah I said before it's really depend on the business as well of course I because promotion only coming uh first if there's like a business set if you want like a business position to be field that's the first stage of course like You want like having like a really higher position it might not go as fast as mine if you have if your company doesn't even need it but for a promotion basically for a promotion that increase your money uh increase your financial
security increasing that I I believe that's the most important part actually the financial I think like for most the people like I know like some people doesn't want doesn't need to be promoted But they just want to have like a higher salary but they Bigg title but in C like my the bigger biggest quality that yeah of course like I said before like taking initiative I discuss it with my manager what my career I want to be as well I want to be like I said in like can I get a promotion I yeah us
like at the time like and then they try uh and then they the what's it called Uh negotiate basically yeah yeah we can do it like in this uh turn of time with your with your kind of pro proof of work we already been everything that already I done I already show it because like but I had thought just like six month no I just I not say that I in six month I said I can be promoted no but I try to build it during that time like but it's actually a year that I
to the med to here I I asked I asked so it's really like an initiative from Because you don't even know like yeah is it like a possible or not people are not asking I know like some company culture is like a little bit T do like right like like so like it's like really shy or like like some people I just to shy but it's is depend on culture in the company because like in my company C at the my B is like really open it doesn't matter if you want to you ask to
be promoted or not it's like we open discussion but for the company that a Little bit St like a little get a t in that part yeah you need to take a little bit proactive but showing you are the best in your work showing that if like compare to your colleague yeah I am the another one in there I I I would say this is a b competition I will stay if you are to be like in the Cent to be L it's still a competition the very of the day if you put it as
a competition but it's like a really healthy competition If you want to get up in dark okay amazing so basically don't be shy to ask because if you don't ask you might not get it read the room meaning you need to know who your bus is whether he is open or he is not open to u a conversation like that so understand a company culture whether it's fine to promote yourself and uh kind of build that negotiation skill set right to negotiate for whether it's higher salary or Better Business title because Sometimes that's also important
uh for for your position in the company but do that on time meaning before that show your work and only with right cards go and promote yourself right yes yes you need to have a strategy basically for that like a plan to that but yeah I know like it's uh take a time take a skill like uh to do that I know like not everyone will actually brave enough to do that or like uh having like thinking to do that Everyone is different but of course like you could try to copying a little bit people
strategy I know like it online like if like what I said right now if you want to copy my strategy like even after a year after you work hard be already taking initiative you ask your boss then when that strategy work you could try to copy mind if you uh company actually having a little bit like mind could be like a re open okay so that that's a very good advice I Think to not show away to ask and know your strategy plan in advance yes on the note on that planning and uh on a
personal branding so planning your personal brand now Cornelius you have a huge following you are top voice on LinkedIn and you have uh about 30,000 followers on LinkedIn right almost almost ,000 still almost yes if you listening now go and follow Cornelius maybe we can 30,000 yes so how important it is to Have a personal brand as a data scientist yes it's actually very important so I will talk a of my work because like my of course it's scho but off set up my work it give me a lot of chances it give me a
lot of opportunity like right now we do I'm talking with you because I already have a personal branding right you know yeah that it's opening a lot of door so I get a lot of new friends lot of new networking lot of a current Opportunity that uh offset of my job so try to a lot of freelances lot of writings I love writing so I try to focus in myself as in the writing as well because like some of the people that C content creator will actually do like a video so actually doing a Tik
Tok for example photo but I love article writing I have like a newsleter I try to make myself person branding and that as a data scientist not writing and does do uh make my career actually have It like a I don't say it's a side hustle anymore because it's already my hustle is become my personal hustle that b uh improve my income Financial Security so that's branding could have make you having like a more car choices it could make you a lot ofes because like like this uh I read a lot about uh people who
like a bus entrepreneur they usually not only have like one source of income they have like a multiple they building here and there and I see like This coming actually from the personal branding as well they could actually coming to have this kind of a source of inome the multiple because they promote thems like myself like I try to promote data scientist who know some of the stuff in the data science world and then yeah the opportunity keep coming that's because I promoting here and there and then try to having which uh card this choice
that could actually make in the Future I it's the door opening is uh more so uh in the case that maybe someday I could lay off or something you never know in the company you never know what happen if like it might like something like so secure right now maybe in the future something like this like a Cris financial crisis being L off like or even like a pandemic before like people that who since like Financial Security suddenly getting cut off Like I I see my personal B right now as a security measurement as well
that I could try to make it uh a runway in the case of something that happening as well so this really yeah for sure because like you just mentioned uh in many parts of the world also currently there are many data scientists that are being laid off just because of the economic structure and uh like you the message what you are also giving to our audience Is that your personal brand will not just be uh something uh for uh you know for for that scenario but also for uh for for for just in general it
opens many doors for you new opportunities new networks new contacts and also in case something happen so uh you're being terminated laid up for some reason then uh that will be a great way for you uh to have a source of income so not to put all your eggs in one basket that's the message you are giving right basically Yes very precisely and on the note of your newsletter cuz you have a popular newsletter uh and it is called non-brand Data can you tell us more about that yes at first yeah I just call it
non Branda because like is is really in the early year or I try to create a justlet and branding so I try to just not want to boxing myself in this of branding that I could to be a data engineering or like a data scientist like NLP or stuff to I Could try so this kind of us could talk about anything but and fin is related to data but at the moment right now I try to talk more about a things like first about more about a carer so like a business the carer and as
data scientist like those tips and that's like uh my opinion but and the second one is talking more about uh the technical stuff so I try to also still putting my technical uh knowledge uh in my Newsletters so try talking about the python ml op and then how to integrate it so basically I try to combine like I still understanding the business but understand understanding programming stuff like technical stuff in there it's still uh it a data science justl that try to make you as a curent uh data scientist to improve your career amazing so
if you are listening now make sure to check the non-brand data newsletter from Cornelius hit the Subscribe buttton and you will get a lot of help with uh your how to start a data science career and also the technical stuff like corneli just mentioned yes also let's talk about the coaching that you do because you have a topmate um uh link uh and also uh way to provide your knowledge so to coach other aspiring data scientists or people who are already in the field can you tell us more about what kind of coaching you do
yes so it's more about how you could Move as a data scientist the carer where do you want to move I I said before manager indid contributor but basically how do you want to move as data scientist in your career but of course it is also open open to coaching as like from a non uh data scien yet that you want to move breaking into data science Field opening for the coaching or that as well amazing so if you're looking for a coach then uh make sure to get in touch With Cornelius uh he might
be able to help you with your career but also the technical part yes he's on the top M so we will put the link uh in the description so on the note of the person branding let's say uh you have uh person who comes and ask for your coaching services or just in general your advice uh you have a huge personal brand at the moment uh H following on LinkedIn what would be your advice more in actionable way how to build a personal brand from Scratch yes I will come from my personal experience first I
first I'm networking first actually before I posted anything much about that but I already I don't know before I coming up with the plan of what I want to be I'm a data scientist I know what I know then I try to uh building that those going to be my brand those going to be about of data scientist in here then I tried to having like a networking like a so the first time that I posting it uh my social Media was of course like it's not going to be like having like a lot of
Follow That reading it and how like lot of people that Bing it but that's why I try to actually also approaching people that actually already having those big follower so I have like right now my friend like uh his name is Carin but he's already not that much active right now in the in but he's the first one that I am helping me going to the network in the data science Field in as a data scientist and then because of his follower I get another follower as well and then I start getting more and more
momentum in there so it's just like annoy my brand then try to network it with those people who are actually already uh having those number basically I would say it's a number game if you say it but I three consistent I'm consistently still posting it from my uh it's already been five years I think I still consistent Until now try to posting something stuff even if like I know like one of the things that uh make people kind of share away after they posting their first post is there doesn't have like a lot of people
that might be liking them or like a c comment to their content but no I mean like just keep posting it I mean like every single post that you have might be pable for someone you might thinking that this of course might not ah this is like just too easy why people Already know but no maybe someone is actually reading your content and understand okay this is how do you do it okay people uh do that I I think about that before okay like not much left but there's a people actually say thank you looking
my post and then it's like okay that actually make my M higher but the first that it might not showing that much but just keep consistent posting because it's like it take take a mental I think like for take a mental Fortitude to keep posting if like your situation is not that much but of course it's take a strategy still take a strategy to of your social media and personal branding so what I think going to be like people having out with your personal brand having the network and then keep C system I think those
TR is already being good enough if you want to building for some brand amazing so if you if you are someone who is listening and who who Doesn't yet have any personal brand uh following these steps will help you to uh gain that personal brand uh online but also offline um so on that note Cornelius um do you think that uh having uh like coaching um services having a newsletter and having a LinkedIn following uh is that enough for your personal brand and of course also networking or are there any other uh Media or channels
that you usually use in order to uh build your personal brand As a technical person I'm talking about for example GitHub you know um place to um store and uh create and showcase your code also to Showcase how you can tell a story about data yes yes that's true like GitHub is like a really those is the community for not even like a programmer to have like those portolio to be showed in there but also for myself because I am a writer previously uh I want just but really after writing I using medium to hosting
All my uh article I'm still quite active there if uh but right now is a little bit more in the my newsletter but depending on your PA I think like a strategy you need to yeah uh need to a little bit braning so for example like my side go to medium and GitHub but maybe for example like for some people that would like video maybe you could try to have like in the YouTube or like in the Tik Tok because I know like some friend in there uh some of my friend is Actually really active
in the Tik Tok compared to the uh Link in but right uh it's really need to understand as well what the audience what your audience want to be uh sorry what you want the audience you want the audience that you want so for example like I'm like l in because like it's like a really professional places like a soci so there's not a lot of professional uh Communication in there so that's why I really active there try to BU my Building personal brand there but maybe your personal branding want to Target more more casual people
then maybe X may be better places so I know like a lot of people in The X like a big data people in there as well there's a lot of follow I try to post here and there as well in X but it's mostly actually in basa Indonesia so maybe a little be harder Foria speaker but yeah uh think but if you are just starting try to focus on one platform first having like a one Platform social media then try to build from there then try to branch in there but like GitHub like medium I
think that's just the medium way just the places to show me portfolio so I think it's like more about where you can show your working but if you want to uh I think it's like uh complimenting complimenting your personal branding that you're already building in the social media like gith have medium or right some other uh it's like for but Starting uh your starting places then try to focus their C on there I think that's one important part I think that's an amazing advice to keep it simple because sometimes it can be overwhelming if you
have so many social media or different channels that you need to keep up with it will you will just run out of time or you will burn out so s simple basically that's what you are saying and then they will start to complement each other so either X or Tik Tok or GitHub Medium or um different other places LinkedIn uh Facebook so it really depends on your uh Target profile like you just mentioned yes amazing let's now dive deep become a bit more technical because data science is such a buz word there are people who
understand data science as data analytics there are people who understand decides as this machine learning engineer and um with the uh revolution of AI now there are many people who understand data Scientists as someone who is dealing with lops large language models deep learning machine learning so many parts that um many data scientists across the world are learning and are implementing now in your opinion as a data science manager with the wealth of knowledge and experience in the field what is for you the data scientist and what it takes to be a data scientist yes
this is still a question that I actually still asking because Data scientist previously I said it's someone who working with the data to bring a Val into the business but rightor is moving much more than that as well so it's like data cyes is someone who bring the value to the business and making the decision for to battle any business but I always said a business because like a data scientist I always need a business to working with because like even if you're a Sol prer a solst your product will be cently to be a
Promoter in some way then they have like some business value so that's why I always said that right now data science is not just something that Mis value but also someone who bring a better decision in the bus but yeah to become a Delta scientist I would say it takes a lot of uh mental for so like take consistency taking a lot what to learn all the way every day every year every time always cons system learning right and then of like programming business I Already said it before so I don't need to repeat it
again but yeah just keep want to keep learning and keep up with the new latest technology this technology latest technology I believe in myself I believe all that it's going to CH to work it's going to CH to work like I right now it's going to CH to work just day so as a data scientist try to keep following that technology because if you we not following that technology we will be yeah as I said Being pleas basically scientist we have like an advantage because we are working for creating that AI complimenting that AI right
so Che on that note I think you already answered my question because I wanted to ask you do you think that AI is a password many people think that this new era of generative AI chat GPT you know Cloud LMS it's a hype it will just go away but you just answered that In your opinion it's not going to uh going away anytime soon and it's going to make a huge impact so can you uh walk us through some of the recent developments that you are aware of and you believe are going to make a
huge impact and in which industry specifically in your opinion yes uh if you want to talk about technology like now we we already have like a lot of te the open I already had like a lot of The model that really to look like the like the cloud is also like we have a really great you see like L day and then I to I feel like like Sora right now already try to create just image the the video but right now what I really see is like the business as so like the non technical
people because right now it's like the impation is like just like recording or for a little simple stuff but right now like a business people start to see how useful is that like a Lot of not just like technical people but like a non technical people who are try to build a product based on that so yeah it just for my so uh this P two weeks three weeks I already did like a lot of uh stakeholder basically this IND like a big St on actually they want to try to building this kind of product
and they try to using this uh AI to simplify their business process basically get is like if feel like in internet in Indonesia they already see The potential that it could be used but of course like right now it's our job as well like as data scientist to prove that this kind of AI to be actually useful in the it's not just like the business want to use it and try to uh make it effort from their side but we as a data scientist need to make an effort as well that yes this AI could
be useful for your business and that it could be like us approv that this two things that combine Together business and is like become big it could be like become a game changer so but yeah that's why I really quite really confident that like a I could actually change the world because it's not just uh I personally using it and then try to get a benefit from that but also everyone that I I like from my is actually already starting using it and they try to implement it in their business and then it's actually useful
well that's amazing to hear Because I am trying when when I'm hearing talks about AI will replace data scientists I feel like um that's something that uh can is highly arguable so um first I want to take uh your uh opinion on this do you think AI will replace data scientists and um how can you future prove yourself as a data scientist yes I think it's not going to replace holy I think it's more like about some of the job that be not yes some of the Task that we done for example like maybe about
detection or like uh Co generation it could be like delegated to do AI but of course like restructuring all the C where is The Bu is going to be using it and that where is going to be managed it's still taking a data scientist to do that but that's why it's is the data set just become evolving right because like those task could be replaced by AI we as a data scientist need to for as well we are not just need to understand how to Coding it well but you need to understand how to manage
this Cod better we need to actually document it better we need to understand where is going to be in the business going to be used which of us or data scientist is going to be really using this AI so all this latest technology we need to understand as well so but is what I want say like data need to become a full step and that is like cannot be avoidable on Basically like like like my PA as well like I try to learn a lot of ml Ops operationalization because I don't think it's going to
be placed by uh as well to the structure could be be that yeah I couldn't said better because um being able to become this fullstock professional that you know both the data side the machine learning side the Deep learning but also the recent developments in AI kind of at least high level to know what LM is what LM Ops is Those Cloud Technologies how they can be used right yes and have also the business acument and the communication because if the uh product um there's always communication right business versus data scientist there's always a need
for the translation and um as long as you are able to do so and continuously develop yourself with the technology there is no way that you will be replaced because by the way the other day I was reading that um currently uh There is an 85% gap between uh the demand for data science and AI professionals versus the supply so unless you are doing a manual job I always tend to say that unless you do something repetitively that can be replaced by AI you are good to go yes so you should definitely have a motivation
to get into data science if you like it on the note of getting into data science because you are now in the position of hiring data scientist so what are the Skill set that you are paying attention at because with this recent buz llm generative AI many aspiring data scientists uh instead of starting with fundamentals they start um with a difficult stuff training naral networks understanding RNN attention mechanisms Transformers diffusion models and then uh try to show with this project that they are an experienced professional what do you pay attention at when you are hiring
and you are getting this Different resumes okay so the first thing that I don't pay attention much first thing that I don't actually pay attention much is actually where did you force CP is or where the GPA is or their age is or there How High background is like if you have that but I the fin list those kind of a discriminative yeah I the do so I try to make everyone qu in the in the de but of course like when you are in a business There's like a business n so the first thing
of course like filling up the business needs uh when it's like a hiring and as a data scientist that what I I D that I see for a junior data scientist I want to see if you could actually do the work so basically have you already at uh least do some data science project already created some data project data science project and then for this data science project how's your uh thought process Going how's your what what uh motivated you to do this kind of project what motivated you to do uh this kind of code
what motivated you to present this kind of result so those data science project is actually what I really see of course like uh having like an internship experience having like uh job experience previously really uh you I see that as well but of course like uh I know like this a little bit harder for jior f to uh sorry uh aspir data science to become Up because like the position is like really hard to F so that's that's why I try to add a finist taking a look at the data science protol project so that's
the those the the most important part that I try to Tex in for like like a communication like a business equipment like for a translate into the business I think the kind of a skill that I think will be you learn when you are already being in a business is the business but when you Are to enter in business you need to at least show that you can work for the that's that you are going to be hiring for and understanding but but of course like understanding a little bit about the bus is uh really
helpful still really helpful it's a plus yeah there is a plus it's a CL going but my standard point is like showing your data science portfolio showing that you could actually the work that's what that my right so a strong data science portfolio Basically you learn the data science communication translation skills B acument all plus but extra but for you as a hiring manager during the hiring process you pay attention at the projects that they have completed unless they have already an experience yes and so on that note uh what would you suggest like couple
of examples of such projects that um you would suggest an aspiring data scientist who have zero experience just fresh out Of college maybe even not and data science Ed education to put on their resume to impress you there's a lot I mean like a lot of people that are impress me because they so smart but it's like uh you think like uh I there like a lot of thata but uh that you could try to find it with the K or like UI was of a little bit complex data science project but what would really
impress me if you actually could formulate a business Problem from those data s data set and then formulate why you develop this type of modeling and then this model that you develop is actually solving or even not using a modeling are you actually using the data set no you don't need a model but I have you can formulate a business problem from this data set and then you try to using the data science technique maybe just a clustering technique maybe just a customer Dimension reduction Technique but Co actually showing that this is how I could
solve the problem from my business problem that I already formulate with this data set and then I showing you that this is like how I do it so that's the one that I really impress me because like what I want to see in your data science project portolio is basically the top process and usually the top process will coming instantly when you already have like this business problem that you want to Uh solve but of course like uh uh the the data set that already available in the public is maybe a little bit limited but
of course like within this lied data set if you could be creatively thinking about a problem and then try to solve it it will be inass Miss right so from end to end basically and also have kind of this extra skill set than just solving the problem in technical way yes soling a problem in a technical way you can say that amazing And one last question because we spoken about many important topics that I believe uh many aspiring data scientists would be interested in uh what do you see as the future of data science in
the upcoming Five Years well we going to change a lot I think like with this all AI development and like do data science technical stuff that everywhere I know like in five uniters data size will be invaluable to the business like I said before AI is Like it's like a password previously but is a password that used by the business and if the business could actually use using it bets in the company in this five year if it is five year they will try to hire as much as data scientist just to making that sure
that business are going uh smoothly and then I'm pretty sure I'm pretty sure like business in this five years will'll be using a lot of automation from our data uh from data science uh Tech Amazing now Cornelius thank you so much for joining us today I think uh your insights and all your tips were uh invaluable for our listeners who are interested in Tech and in data science uh for our listeners make sure to follow Cornelius on LinkedIn but also to check his newsletter non-brand data and if you're looking for a coach then uh definitely
go for Cornelius he will be able to help you thank you so much Cornelius was really pleasure to have You here thank you so much thank you that that so they're gonna buy a b a business and that'll be the base and then on top of that base we're going to add other you know other businesses to try to help it grow you know exponentially faster and and I used 100% Bank debt to buy those 23 companies and you know the net result is you know a tremendous amount of shareholder value created Tesla could have
come crashing down if if the lenders started saying You know enough is enough joining us today is Adam coffee who brings over 21 years of experience in building businesses having served as a CEO for three major companies supported by nine different private Equity sponsors Adam has man transactions worth over $2.5 billion and advised top Fortune 500 companies during his time Adam organized 58 business deals and significantly increased company values achieving five-fold returns for investors he grew One company's value from 10 million to over a billion dollars earning him recognition as one of the most influential
leaders by the Orange County Business Journal Adam is also a bestselling author a popular speaker and a mentor to aspiring leaders his extensive background includes role in healthcare manufacturing and Beyond his diverse skills said also includes being a licensed contractor a pilot an army veteran and a former executive at G for 10 years today we will dive into the proven strategies that aspiring Tech entrepreneurs and fresh graduates need to drive in today's competitive landscape we will uncover invaluable insights on how to navigate the tech World cut through the noise build investor trust and secure funding
as well as Forge lasting Partnerships finally we will learn how to plan and execute a lucrative exit that maximizes your hard-earned success the podcast Will be hosted by vah asan experienced software engineer and a tech entrepreneur the co-founder of lunar Tech that is on the mission to democratize data science in AI so without further Ado let's get started welcome Adam we are excited to having you join us today now Adam you arep a big deal in private equity and in business done deals over like 2.5 billion um you have also advised top 1400 company and
written bestselling Books on business Adam could you please share your journey and how you got to where you are today with you with our audience happy too happy too and and hey by the way good to see you good to uh good to be here with all your listeners out there um you know I think for all of us life is a journey and it's a building of a set of experiences that make us who we are as a young person I served in the US military um service in the military taught me something about
discipline Teamwork leadership uh engineering uh engineering made me a meticulous planner um I'm a pilot Pilots don't take off unless we know where we're going so that taught me how as an entrepreneur to plan an exit from the beginning you know and always have an exit and a destination in mind uh I spent 10 years working for Jack Welsh in the uh Camelot era of GE I call it um GE was the world's number one largest company Fortune number one on the Fortune 500 list company was growing So fast it was doubling in size every
three years and that that really informed my thinking about growth and GE taught me how to run a business then I spent 21 years as a CEO building three different National companies for nine different private Equity firms um bought 58 companies a buy and build guy a turnaround guy and uh you know I've got $2 and half billion dollars in uh in CEO exits under my belt um that kind of led me to writing books you know about how To do this I I wanted to educate I'm turning 60 here shortly and I I I
wanted to start thinking about you know Legacy and how how you know I wanted to teach the next generation of entrepreneurs and business owners how to excel at this game at this thing you know that that's been so influenced by private equity and you know that kind of kind of led me to hanging up my CEO cleats a couple years ago I started a Consulting business I've got clients all over the globe uh I Helped them with uh with scaling uh with doing m&a teaching them the tricks that that the big institutional investors use to
create shareholder wealth and then um you know I help people exit I work with private Equity firms I work with individuals and Founders I'm having a ball I work more hours now than I ever did when I was a CEO so so much for uh for for slowing down I I think I've actually speed it up awesome and um can you share a few Stories on how you acquired a new businesses and grew them and sold them like the top businesses you worked on well so so usually in my world again I've been doing this
with large institutional shareholders and so they always start with what's called a platform company so they're going to buy a b a business and that'll be the base and then on top of that base we're going to add other you know other businesses to try to to help it grow you know Exponentially faster so if I take my last company as an example um the the company that I was hired to run so private Equity Firm buys a company it's a platform company it has 200 million plus in Revenue uh they buy it with a
combination of debt and Equity from their fund uh and then they they bring me in the company has not done well you know and time to bring in the guy to turn it around to fix it to get it scaling again and then start doing a a Buy and build and so I then bought 23 companies over a five-year period I bought I bought eight total in the first hold period 15 in the second and uh you know and started bolting on these other businesses to go from being Regional to National National to International depending
on which company I was building at at the time and in addition to to Growing through m&a also then would focus on on improving the business that I started with so usually investing In technology trying to do my best to increase the revenues the profitability um you know and and you know of the base business and then also a lot of effort around organic growth to get the business that that was underwhelming and not not doing really well to grow like it had never grown before organically and so I've learned how to build I I'll
say a very balanced growth oriented company but no question that m& is the largest component of shareholder value Creation and on that example those 23 companies that that we bought you know on average I paid five times for each one of the companies they were small they were plentiful or smaller plentiful and and I used 100% Bank debt to buy those 23 companies and I used the cash flow of the 23 businesses to you know service the debt while I'm collecting them buying them and then when we go to market we sell it and we
sold for the first time a multiple of around 14 times And so things I was buying at five times I'm now selling at 14 times and you know the net result is you know a tremendous amount of shareholder value created then you add in the organic growth you add in the margin Improvement and that's kind of my recipe for the perfect exit you know in that case in my first exit the the the three-year period uh it was a 4X multiple of invested Capital so shareholders were happy investors were happy management team was thrilled we
Made a ton of money you know and uh when it when when things go well and um so as you know like in the tech industry the creating uh value wealth is the the opportunity is immense but it's also like very competitive like we have many fresh graduates coming straight out of University and trying to make a new business a new startup but they have no idea how to do this so what strategy or what mindset would you recommend them for our ambitious Tech Entrepreneurs who want to get their food in the field yeah so
Tech is an entirely different world you know you have to get to different concepts um you know software as a service you know or uh a tech enabled platform definitely brings a higher valuation call it an exit um but often times from a tech start startup perspective I'd say that a lot of people out there are trying to create the new next best thing and often times I tell people it's like instead of Trying to create something new that doesn't yet exist um potentially solve uh an old problem or or put a new spin on
something that's all already out there and you know I think too often it's like we we try too hard as entrepreneurs to create something new differentiated that the world's never seen before and sometimes boring old problems still need help and still need solving you know and they can be updated and solved in a new modern fashion and And so I I think sometimes entrepreneurs overthink complexity um and so when when I when I am talking to people about what constitutes a great company you know I tell them to think about human basic needs think about
needs versus wants in a bad economy in a down economy if my business is focused on needs I'm not going to get hurt as bad my revenue streams will be be still fairly consistent um but if my if my product or my service is is a want then if I'm laid Off or I'm unemployed or I'm feeling a pinch from high interest rates you know I can slow down I can avoid or I can completely ignore that spend for a you know for an extended period of time until the economy comes back and so we
have to be concerned with just the cyclicality of of of the broader economies you know in the world we go through up Cycles we go through Cycles you know and and the world can throw us curveballs like Co and so we we have to Be very thoughtful around if we're going to start something I want it to be needs-based so you know it's like if if my if my roof was leaking and it's raining outside and I'm in my house you know and water is pouring on my head I have to fix that whether I'm
broke or not you know but if I wanted to put new fancy you know accessories on my big monster truck out there um you know if I'm unemployed and I don't have any money then I just look at the magazine And dream about what I would do but I don't have the money and so I don't do it you know it's a it's a discretionary spend so needs versus wants then we want subscription based versus I'll call it Project based we want some type of a a product that customers are going to pay us a
monthly fee for it's shocking to me if I go to my credit card statements and I look for all the monthly fees I'm paying you know for Adobe acrat and for Google cloud and for Apple this or that And it's like I spend you know a a fortune every month in just these recurring you know contracted type charges and you know that's also the key to entrepreneurial success once I find a customer I want to create a recurring Revenue stream you know even in games people might have a free game but there's in in inapp
purchases to help augment you know and so if I'm thinking from a tech perspective needs versus wants contracted Revenue stream versus One-time use or Project based and uh you know and and then I'm thinking you know in a perfect world low capital expenditure um not a lot of money to to further develop or refine a product once it's created and uh you know and it it's it's creating a profile like that that leads to High profitability High free cash flow and with high free cash flow comes the ability to service a lot of debt which
means buyers who want to use debt as a primary funding source they Can uh they can service a lot of debt because there's a lot of cash flow so if you can build a business with high-f free cash flow you know that's focused on needs not wants you know and uh and as a recurrent contracted Revenue stream you're going to do much much better yeah now that's that's a great advice like often times entrepreneurs start working on some kind of a new project they think they are like working on solving a problem in a certain
way But actually they're like not solving like any form of new problem and they end up hitting like a wall where where they are like but is isn't already someone else doing that when they talk to an investor yeah sometimes boring you know Industries and we we solve a problem there but they're a a staple you know or a Mainstay of an economy um we we we can get a lot better traction you know when we solve a a common problem for common People rather than create a new problem that someone doesn't know they have
yet and then have to convince them they need our product to solve that problem yeah 100% And with um startups like many startups like need human capital they need like some kind of um Capital to be able to employ uh new people to be able to uh invested in marketing or in other resources now build building trust is very important with investors because They w't know that they are investing in someone that's trustful and they're able to not only get their money back but also get a multiple returns so how would you advice on new
people entering the field on building um trust with investors well this is the age-old problem and the ageold question right so chicken or the egg I have no Revenue but I need people you know Venture Capital investor says I don't want to give you a bunch of capital Capital that you're Going to waste you know on the come you know I need you to be able to prove you know proof of concept and prove that you can can actually create these revenue streams so it's a very delicate balance and it makes startups a very difficult
place to to to to be you know and and oftentimes I I I ask myself do I want to build or do I want to buy and I'll I'll look at the existing Market in place you and I'll say look if I start from scratch I have a very high probability Of failure um I I have a a lot of hurdles that I'm going to have to to cross and I might ask myself is there an existing company that has the existing technology or has the existing product that I can buy that's pre-existing and as
a result I've got a company that has Revenue customers a history of profitability and then it's a different game you know so in the startup world we use things like Founders equity and and we we we try to attract people by by Telling them you know how rich they're going to be one day down the road in the future and that's a hard sell you know and I I got to tell you I get contacted constantly with people who want to offer me Founders Equity to to help them and you know what I work for
cash flow I don't work for Founders Equity uh and when I'm sitting on the boards of companies they give me stock anyway so I I get I get stock in an existing company with real Revenue real customers you Know and I get cash flow and so I personally won't work in a tech type startup world where there's Founders Equity involved so I I think we have to be realistic and and we we have a we have to profile so anytime I need people you know my goal and objective is to hire the best people I
can find for the company that I want to be in five years not the company I am today you know and part of my tenants include I have to pay a fair market wage but if I can't Because I'm I'm cash constrained then the only tool I've got is incentive Equity to try to attract people and then my profile might change you know I I may not be looking for an established executive who's used to making seven figures a year because I have no money to pay them and so I'm looking for a different profile
it's a younger person it's an upand cominging person it's a person with great skills but they live in their mother's basement you know or They live in an apartment and their cost structure is low they don't yet have kids they're not yet married and as a result of that you know I can attract them with the equity potential and the lack of of cash flow because their their needs are lower you know it's like I'm a seven figure you know eight figure guy every year and so it's like I I I don't work for free
I don't work for Equity that may or may not pay off in 10 years I work for cash and Equity you know and So you know we we have to think about the talent that we need and where are we going to find it and how are we going to attract it and retain it and so we have to to build a profile for the type of person that we think would be uniquely qualified to go on this entrepreneurial Journey with us um especially when we're cash constrained in the beginning and we just don't have
the right level of capital so I need Brilliance on a budget and I'm going to look for a profile of a Person who's got low cash flow needs to where my small poultry salary will in in you know at least cover their basic needs because they have cheap basic needs but brilliant skills and they're trying to become you know call it the next tech you know Tech billionaire or multi-millionaire they'll believe in the journey and they'll take and call it use Sweat Equity to get there um now in your experience how do um successful companies
balance Innovation with Su sustainable growth like for example we have like love businesses that are innovating but they like they keep on innovating and in an unsustainable way a small Tech entrepreneur who's trying to create something that has you know a a legs I'll call it something that has longlasting ability to build a sustainable Revenue in future um at some point we have to shift entrepreneurial gears and say it's good enough it's good Enough for now and our Focus needs to be scaling and then the Innovation or investment you know we we we we don't
necessarily want to stop but we do need to throttle back so if I've gotten to a proof of concept you know I'm out in the marketplace you know there is a point where we have to be thinking about well if I continue to spend money I don't have you know innovating innovating innovating while this is important I'm never going to build a sustainable Business if I don't also keep my eye on the ball and the fact that my investors need to see a return and I need to create you know revenue and so as I
get out of the gates I get out of the market when I start to start to see Revenue coming in it's like we really have to drive Revenue hard and show sustainable high levels of Revenue growth and the high margins that that that were we were hoping for and we have to demonstrate this and and so our you know we have a Lot of initial effort to to call it on the technology side to innovate and create the product once we get out and launch that needs to to scale back and our efforts need to
be replaced by focusing on marketing and sales and building the revenue stream we have to remember in order to build the best business in the world it still has to be fed with cash and investors eventually will run out of patience and pull out the rug from under us if we can't prove That we've got revenue and and so I I think back to like Elon Musk in the early days of Tesla you know or Jeff Bezos at at Amazon you know on any given day Elon Musk could have you know Tesla could have come
crashing down if if the lenders started saying you know enough is enough I'm not loaning you anymore it's time for you you know to either make money or or shut down and you know he he was able to to to navigate that as was Jeff when he was building Amazon but The typical small entrepreneur isn't going to get that kind of treatment you know they are not going to be able to sustain Innovation and investment in in a hope in a prayer if they cannot prove that money is there so don't forget that while we
may be interested in technologically changing the world there's the commercial aspect of we got to make money and before I worry about making money big I need to prove to people I can make money small and once I've got my product kind of to a stage to where it's re ready to revenue I need to turn down innovation turn up marketing and really focus on driving Revenue creation and customer adoption so that I can then start generating cash which will let me then go back to innovating you know at a future time so we we
have to be balanced a lot of times entreprene forget the commercial aspect and the commercial aspect is we got to make money we're so busy innovating we Forget that we have to make money and before long it's like investors get tired of us because there's a thousand other things for them to invest in they pull the rug out from under us and we crash and burn and so the best technology on the planet does not guarantee you Commercial Success you know we have to drive commercial success as soon as we're able in order to prove
the sustainability of our business no 100% And I I feel like ego Has some kind of role in that where like an entrepreneur is like very convinced because of certain reasons but also their ego on why this Innovation will only cause growth although in reality it's only U hinders their growth like what do how would you that's why that that's why dreamers dream and doers do you know so there there's I call it the accidental arrogance of success it's like PE people get so into their own you know Self-promotion that that you know I I'm
God's gift and what I've done is going to change the world and you know I I see those pitches every day from people who are out there Adam my idea plus your wallet equals you know the best thing the planet has ever seen and I'm like first of all if you're talking to me about money you don't understand my value because my value is what's up here not what's in my wallet money is a commodity there's trillions of it out There looking for Investments right now you know and so all you have to do is
know where it is go go go get it treat it well give it an outsized return and you'll get funded you know and so money is a commodity money's not the issue people who are focused on money is my problem don't understand how money works and so in addition to being you know call it a tech genius they need to have a business Acumen you know Andor partner with someone who understands business And and they can be the the the strange person who's locked in the the dark room for 20 hours a day innovating and
creating something great but they still need some business guy out there to be the front end and so when we get arrogant you know and and and keep in mind most of these people have not created anything yet you know and so if they have an arrogance of success and they have it before they're actually revenu and and building something Special then boy that's that's a an entrepreneur who's going to have hard time finding capital and finding money there's a fine line between arrogance you know and confidence and we need to be confident we shouldn't
be arrogant and if we're Arrogant with no money and we're Arrogant with an idea but no Revenue you know then investors just simply walk away you know I I that's not a not an adventure that I'm going to back so we we have to be careful about Letting the arrogance of our genius Cloud our thinking and and ultimately investors see right through that and uh if there's you know what it's it's okay to be arrogant you know if you're the richest man on the planet and you have you know you you have arrived when you
have an idea and no revenue and you're arrogant and you're arrogant with investors that's not a good recipe for success and we are almost uh hitting the time and could um Tell us about your services for example we have new startups but they have no idea how to do business they don't have the business acument so maybe they can talk to you or yeah so I do Consulting work um you know people can read my books they're cheap you know I donate my royalties to charity um and you know all three of my books have
been number one bestsellers so so thank you to everybody out there who reads my books I've been on hundreds of podcasts just like this I Do these freely so that they you know if you go to listen notes.com and type in my name in the search window you'll find hundreds of podcasts that I've been on talking about these different you know different Concepts and those are free and uh you know from from there I teach seminars globally you know those are relatively low cost and I'll do boot camps where we spend two to four days
together and I I I really get in-depth about about all things around growth and Raising capital and selling businesses and uh and then I I do work you know one-on-one with uh with dozens of entrepreneurs I I have a peer group we call the chairman group I do that with my business partner JT Fox um you can reach out to me on LinkedIn you can go to my website Adam e coffee.com um you can also uh you know in in I'd say LinkedIn is where you'll find me the most I'm most active it's Really the
only social media platform I'm on Twitter I post some things once in a while I'm not on Instagram or Facebook there's a fake Adam coffee out there believe it or not I guess you know you've arrived when there's people who are intimidating you and so on Facebook and Instagram you'll find fake Adam coffe is trying to take money from you um you know I I'm trying to help people not buk them for uh for money um so I'm I'm a consultant you know and I I do Consulting work with uh all kinds of different people
private Equity firms you know family offices Etc so thanks for having me on I appreciate you appreciate your listeners out there good luck take care of people and uh and revenue will happen thank you Adam the next question is what is gradient descent so gradient descent is an optimization algorithm that we are using in both machine learning and in deep learning in order to minimize the Loss function of our model which means that we are iteratively improving the model parameters in order to minimize the cost function and to end up with a set of model
parameters that will that will optimize our model and the model will be producing highly accurate predictions so in order to understand the gradient descent we need to understand what is the loss function what is the cost function which is another way of referring to the loss Function uh we need to also understand the the flow of neural network how the training process works uh which we have seen as part of the previous questions and then we need to understand the idea of iteratively improving the model and why we are doing that so let's start from
the very beginning so we have just learned that during the training process of neural network we first do the forward pass which means that we are iteratively Computing our Activation so We are taking our input data we then are passing it with the corresponding weight parameters and the bias vectors through the hidden layers and we are activating those Neons by using activation functions and then we are going through those multiple hidden layers up until we end up with uh Computing the output for that specific forward paths so the uh predictions why head so once we
perform this in the for our very initial iteration of training the neural network We need to have a set of modal parameters that we can start uh training process in the first place so we therefore need to initialize those parameters in our model and we have specifically two type of model parameters in the naral network we have the weights and we have the bias factors as we have seen in the previous questions so then the question is well how much error are we making if we are using this specific set of weights and Bias vectors
cuz those are the parameters that we can change in order to improve the accuracy of our model so then the question is well if we use this very initial version of the model parameters so the weight and the B vectors and we compute the output so the Y hat uh then we need to understand how much error is the model making based on this set of model parameters that's the loss function so loss function or the cost function which Means the average error that we are making when we are using this ways and the bias
factors in order to perform the predictions and uh as you know already from the machine learning we have regression type of tasks and classification type of tasks based on the problem that you are solving you can also decide what kind of loss functions you will be using in order to measure how well your model is doing and the idea behind Naro Network training Process is that you want to iteratively improve this moral parameters so the weights and the bias factors such that uh you will end up with the set of best and most optimal weights
and the bias factors that will result in the smallest amount of error that the model is making which means that you came up with an algorithm and with neural network that is producing highly accurate predictions which is our goal our entire goal by using neural net n works so loss Functions if you are dealing with classification type of problems can be the cross entropy which is usually the go to Choice when it comes to the classification type of tasks but you can also use the F1 score so F beta score you can use the Precision
Recall now beside this in case you have a regression type of task you can also use the mean Square eror the MSC you can use the rmsc you can use the MAA and those are all the ways that you can measure The performance of your model every time when you are changing your model parameters so we have also seen as part of the training of neural network that there is one fundamental algorithm that we need to use which we called and referred as a back propagation that we use in order to understand how much is
there a change in our loss function when we apply a small change in our parameters so this is what we were referring as gradients and this came From mathematics and as part of the back prop what we were doing is that we were Computing the first order partial derivative of the loss function with respect to each of our model parameters in order to understand how much we can change each of those parameters in order to decrease our loss function so then the question is how exactly gradient descent is performing the optimization so the gradient descent
is using the Entire training data when going through one pass and one iteration as part of the training process so for each update of the parameters so every time it wants to update the weight factors and the bias factors it is using the entire training data which means that in one go in one forward path we are using all the training observations in order to compute our predictions and then compute our loss function and then perform back propagation compute our first order Derivative of the loss function with respect to each of those model parameters and
they use that in order to update those parameters so the way that the GD is performing the optimization and updating the model parameters is taking the output of the back prop which is the first order partial derivative of the loss function with respect to the moral parameters and then multiplying it by the Learning rate or the step size and then subtracting this amount from The original and current modal parameter in order to get the updated version of the model parameters so as you can see here this comes from the previously showcase simple example from neural
network and here when we compute the predictions we take the gradients from the backrop and then we are using this DV which is the first order gradient of the loss function with respect to the weight parameter and then multiply this with the St size the EA and then we are Subtracting this from V which is the current weight parameter in order to get the new updated weight parameter and the same we also do for our second parameter which is the bias Factor so one thing you can see here is that we are using this step
size the learning rate which can be also considered a separate topic we can go into details behind this but for now I think of the learning rate as a step size which decides how much of this size of the step should be when we Are performing the updates because we know exactly how much the change there will be in the L function when we make a certain change in our parameters so we know the gradient size and then it's up to us to understand how much of this entire change we need to apply so do
we want to make a big jump or we want to make a smaller jumps when it comes to iteratively improving moving the model parameters if we take this learning rate very large it means that we will apply a Bigger change which means the algorithm will make a bigger step when it comes to moving towards the global Optimum and later on we will also see that it might become problematic when we are making too big of a jumps especially if those are not accurate uh jumps so we need to therefore ensure that we optimize this learning
paramet which is a hyper parameter and we can tune this in order to find the best learning rate that will be minimizing the loss function and will Be optimizing our neural network and when it comes to the gradient descent the quality of this algorithm is very high it is known as a good Optimizer because it's using the entire training data when performing the gradients so performing the back propop and then taking this in order to update the model parameters and the gradient that we got based on the entire training data is the represents the true
gradients so we are not estimating it we are not making an Error but uh instead we are using the entire training data when calculating those gradients which means that we have a good Optimizer that will be able to make accurate steps towards finding the global Optimum therefore GD is also known as a good Optimizer and it's able to find with high likelihood the global Optimum of the loss function so the problem of the gradient descent is that when it is using the entire training data for every time updating the model Parameters it is just sometimes
computationally not visible or super expensive because training a lot of observations taking the entire training data to perform Just One update in your model parameters and every time stor storing that large data in into the memory performing those iterations on this large data it means that when you have a very large data using this algorithm might take hours to optimize in some cases even days or years when it Comes to using very large data or using very complex data therefore GD is known to be a good Optimizer but in some cases it's just not feasible
to use it because it's just not efficient [Music] the next question is what is loss function and what are various loss functions used in deep learning so loss function is used in order to quantify the amount of overall error that the model is making whether it's a deep Learning model but also in general the traditional machine learning models in all these cases we need a way to measure the amount of error that the model is making and in order to do so we are making use of this idea of loss functions so loss function is
a way to measure the amount of loss that the model is making which means the amount of overall errors that the model is making when performing the prediction so we can have a loss when we are dealing With classification model we can have a loss when we are dealing with regression model at the end of the day we know that independent of the type of problem we are solving we we are always going to have this this errors that we will have as part of the predictions so we can never get predictions which are exactly
equal to the True Values that we want to get therefore we need to know what are this this errors that the model is making and what is the overall error That the model is making such that we can then know how we can edit and adjust our model in order to improve it such that the model will make less uh loss therefore the idea behind the optimization techniques such as gradient descent SGD RMS prop is to minimize the loss function but to be able to do that we first need to have a proper loss function
that is measuring the overall error that the model is making when it comes to the different examples of loss Functions depending on the type of problem we are dealing with we can use different sorts of loss functions if we are dealing with regression problem we can use the mean squared error the root mean squ error the MAA which is another measure that is commonly used as part of evaluating the regression type of problems so as a loss function we can use this different Matrix in order to compute what is the overall error uh in the
predictions for that specific model Type and here we are using as an input the actual values of y's which are usually numeric values given that we have regression type of problem and the estimated values which come from our machine learning or deep learning model so once we have perir iteration the predicted values then uh we can use this predicted values this numeric values of uh y head and compare it to the actual y that we have as part of our validation set training set or Testing set in order to compute the amount of losss the
model is making so there's always is this two set of input values the Y head and the Y and then using mean squared error which is the mean of the square of all the sum of the errors that we are making as part of the modal training and then take the average of it therefore it's also called the mean of the sum of squared of the errors so uh when it comes to the classification type of problems then the Class for the classification type of problems we can use the cross entropy which is also known
as a log loss in order to evaluate the performance of The Deep learning model this is handy when dealing with binary classification when it comes to other type of L functions that we can use for classification type of problems we can use the Precision we can use the recall we can also use the F1 score or the F beta score which is more General version Of F1 score when we know specifically what is more important for us the rec call versus Precision whereas in case of F1 score we don't know or we don't care and
it's more that we want to have a good balance between the Precision and the recall then the F1 score basically provides a 50% importance to each of those two [Music] the next question is what is cross entropy and why it is preferred as a Cost function for multiclass classification type of problems so cross entropy which is also known as log loss it measures the performance of a classification model that has an output in the terms of probabilities which are values between zero and one so whenever you are dealing with classification type of problem let's say
you want to classify whether an image is of a cat or a dog or an the house can be classified as a old house versus a new house in all Those cases when you have these labels and you want the model to provide a probability uh to each of those classes per observation such that you will have as an output of your model that the house a has 50% probability of of being classified as new 50% probability of being classified as old or the cad has 70% probability of uh uh being a cat image or
this image has 30% probability of of being a dog image in all those cases when you are dealing with this Type of problems you can apply the cross entropy as a loss function and the cross entropy is measured as this negative of the sum of the y log p + 1 - Y and then log 1 - P where Y is the actual label so in binary classification this can be for instance one and zero and then p is the predicted probability so in this case the P will be then the value between 0 and
one and then Y is the corresponding label so let's say a label zero when you are dealing with cat image and label one When you are dealing with the dog image and the mathematical explanation behind behind this formula is out of the scope of this question so I will not be going into that details but if you are interested in that make sure to check out the logistic regression model this is part of my machine learning fundamentals handbook which you can check out and this one includes also logistic regression which explains step by step how
we end up with this log Likelihood function and how then we go from the products to summations after applying the logarithmic function so we get the log as and then we multiply it with the minus because this is the uh negative of the likelihood function given that we want to ideally minimize the loss function and this is the opposite of the likelihood function um and in this case what the showcases is that we will end up getting a volue that tells how well The model is performing in terms of classification so the entropy then will
tell us whether the model is doing a good job in terms of classifying the observation to a certain class [Music] the next question is what kind of loss function we can apply when we are dealing with multiclass classification so in this case when dealing with multiclass classification we can use the multiclass crow entropy Which is often referred as a softmax function so softmax loss function is a great way to measure the performance of a model that wants to classify observations to one of the multiple classes which means that we are no longer dealing with binary
classification but we are dealing with multiclass classification so one example of such case is when we want to classify an image to be from Summer theme to be from Spring theme or from Winter them given that we have three different possible classes we are no longer dealing with binary classification but we are dealing with multiclass classification which means that uh we also need to have a proper way to measure the performance of the model that will do this classification and softmax is doing exactly this so instead of getting the pair observation two different values which
will say what is the probability Of that observation belonging to class one or class class two instead we will have a larger Vector pair observation depending on the number of classes you will be having in this specific example we will end up having three different values so one vector with three different entries per observation saying what is the probability that this picture is from Winter scene what is the probability of this observation coming from Summer theme and the third one what Is the obser what is the probability that the observation comes from a spring theme
in this way we will then have all the classes with the corresponding probabilities so as in case of the Cross entropy also in case of the softmax when we are when we have a small value for the softmax it means that the model is performing a good job in terms of classifying observations to different classes and we have well separated Classes and one thing to keep in mind when we are comparing cross entropy and multi class cross entropy or the softmax is that we are usually using this whenever we have more than two classes and
you might recall from the uh Transformer uh model introduction from the paper tension is all you need that as part of this architecture of Transformers a soft Max layer is also applied um as part of the multiclass classification so when we are uh Computing our activation scores and also at the end when we want to transform our output to a values that make sense and to measure the performance of the Transformer [Music] the Transformer the next question is what is SGD and why it is used in training narrow networks so SGD is like GD an
optimization algorithm that is used in deep learning in order to optimize the performance of a deep Learning model and to find a set of model parameters that will minimize the loss function by iteratively improving the parameters of the model including the weight parameters and the bias parameters so the SGD the way it performs the updat of model parameters is by using a random selected single or just few training observations so unlike the GD which was using the entire training data to update the model parameters in one iteration in case of SGD the SGD is using
just single uh randomly selected training observation to perform the update so what this basically means is that instead of using the entire training data for each update SGD is making those updates in the model parameters perir training observation and there is also an importance of this random component so the stochastic element in this algorithm hence also the name stochastic gradi in descent because SGD is randomly sampling from training Observations a single or just couple of training uh data points and then using that it performs the forward pass so it computes the Z scores and then
computes the activation scores after applying the activation function then reaches the end of the forward path and the network computes the output so the Y hat and then computes the loss and then we perform the back prop only on those few data points um and then we are getting the gradients which are then no longer The exact gradients so in HGD given that we are using only a randomly Selected Few data points or a single data point instead of having the actual gradients we are estimating those true gradients because the true gradients are based on
the entire training data and in SGD for this optimization we are using only few data points what this means is that we are getting an imperfect estimate of those gradients as part of the back propagation which means that the Gradients will contain this noise and the result of this is that we are making making the optimization process much more efficient because we are making those uh uh parameter updates very quickly based on pass by using only a few data points U and training a Neal Network on just few data points is much faster uh and
easier than using an entire training data for a single update but this comes at the cost of the quality of the SGD because when we are Using only a few data points to train the model and then compute gradients which are the estimate of the true gradients then this gradients will be very noisy they will be imperfect and most likely far off from the actual gradients which also means that uh we will make a less accurate updates to our model parameters and this means that every time when the optimization algorithm is trying to find that
Global Optimum and make those movements per Iteration to move one step closer towards that Optimum most of the time it will end up making wrong decisions and will pick the wrong direction given that the gradient is the source of that choice of what direction it needs to take and every time it will make those uh oscilations those movements which will be very uh erratic and it will end up most of the time discovering the local opum instead of the global opum because every time when it's using just A very small part of the training data
it's estimating the gradients which are noisy which means that the direction it will take will most likely be also a wrong one and when you make those wrong directions and wrong moves every time you will start to ciliate and this is exactly what SGD is doing it's making those wrong decision Choice when it comes to direction of the optimization and it will end up discovering a local Optimum instead of the global one and Therefore the HGD is also known to be a bed Optimizer it is efficient it is great in terms of convergence time in
terms of the memory usage cuz toring model and that is based on a very small data and storing that small data into the memory is not computationally heavy and a memory heavy but this comes at the cost of this quality of the optimizer and in the upcoming interview questions we will learn how we can adjust this SGD algorithm in order to improve the Quality of this optim the next question is why does fastic gradient Des sense or the SGD oscillate towards local minimum so there are few reasons why this oscilation happens but first let's discuss
what oscilation is so oscilation is the movement that we have when we're trying to find the global Optimum so whenever we are trying to optimize the algorithm by using an optimization method like GD SGD RMS probe Adam we are trying to Minimize the loss function and ideally we want to change iteratively our model parameters so much that we will end up with the set of parameters resulting in the minimum so Global minimum of the loss function not just local minimum but the global one and the difference between the two is that the local minimum might
appear as of it's the minimum of the loss function but it holds only for a certain area when we are looking at this optimization process Whereas the global opum is really the mean as the real minimum of the loss function and that's exactly uh that we are trying to chase when we have too much oscilations which means too much movement when we are trying to find the direction towards that Global Optimum then this might become problematic because we then are making too many movements every time and if we are making those movements that are opposite
or they are towards the wrong direction Then this will end up resulting in discovering local opum instead of global opum something that we are trying to avoid and the oscilations happen much more often in the SGD compared to GD because in case of GD we are using the entire training data uh in order to compute the gradient so the partial derivative of the loss function with respect to the parameters of the model whereas in case of SGD we learned that we are using just randomly sample single Or few training data points in order to update
the gradients and to use these gradients to update the model parameters this then for HGD results in having too many of these oscilations because the the random subsets that we are using they are much smaller than training data they do not contain all the information in training data and this means that the gradients that we are calculating in each step when we are using entirely different and very small Data can defer significantly one time we uh can have One Direction the other time an entirely uh different direction uh for our movement in our optimization process
and uh this huge difference this variability in the direction uh because of the huge difference in the gradients can result in a too often of this oscilations so too many of uh bouncing around towards the area to find the right direction towards the global Optimum in this case the minimum of the Loss function so that's the first reason the random subsets the second reason why in HGD we have too many those oscilations those movements uh is the step size so step size the learning rate can Define uh how much we need to update the the
weights and or the bias parameters and the magnitude of this updates is determined by this learning rate which then also plays a role how many of this how different this movements will be and how large uh the The jumps will be when we are looking at the osili so the third reason why the SGD will suffer from too many of oscilations which is a bad thing because it will result in finding a local opum instead of the global opum too many times is the imperfect estimate so when we are Computing the gradients of the loss
function with respect to the weight parameters or the bias factors then if this is done on a small sample of the Training data then the gradients will be noisy whereas if we were to use the entire training data that contains all the information about the relationships between the features and just in general in the data then the gradients will be much less noisy they will be much more accurate therefore because we are using this the gradients based on small data as estimate of the actual gradients which is based on the entire training data this introduces
a noise so Imperfection when it comes to estimating this true gradient end and this imperfection can result in updates that do not always Point directly towards the global opum and this will then cause this oscilations in the HGD so at high level I would say that there are three reasons why Sgt will have too many of these oscilations the first one is the random subsets the second one is the step size and the third one is definitely the imperfect Estimate of the gradients [Music] the next question is how is GD different from HGD so what
is the difference between the gradient descend and the stochastic gradient descent so by now given that we have gone too much into details of HGD I will just give you a high level summary of the differences of the two so for this question I would answer by making use of four different factors that cause a difference between The GD and HGD so the first factor is the data usage the second one is the update frequency the third one is the computational efficiency and the fourth one is the convergence pattern so let's go into one of
this into each of these factors one by one so gradient descent uses the entire training data when uh training the model and Computing the gradients and using this gradient as part of back propagation process to update the model parameters however SGD Unlike GD is not using the entire training data when performing the training process and updating the model parameters in one go instead what HGT does is that it uses just a randomly sample single or just two training data points when performing the training and when using the gradients based on the Su points in order
to update the model parameters so that's the data usage and the amount of data that SGD is using versus the GD so the second difference Is the update frequency so given that GD updates the model parameters based on the entire training data every time it makes much less of this updates compared to the HGD because SGD then very frequently every time for this single data point or just few training data points it updates the model parameters unlike the GD that has to use the entire training data for just one single set of update so this
causes then SGD to make those updates much more frequently uh When using just a very small data so that's about the difference in terms of update frequency then another difference is the computational efficiency so GD is less computationally efficient than HGD because GD has to use this entire training data uh make the computations so back propagation and then update the model parameters based on this entire training data which can be computationally heavy especially if you're dealing with a very large data And very complex data and unlike GD SGD is much more efficient and very fast
because it's using a very small amount of data to perform the updates which means that it is it requires less amount of memory to sort data it uses small data and it will then take much less amount of time to find a global Optimum or at least it thinks that it finds the global Optimum so the convergence is much faster in case of SGD compared to GD which makes it much more efficient Than D GD then the final factor that I would mention as part of this question is the convergence pattern so GD is known
to be smoother and of higher quality as an optimization algorithm than SGD SGD is known to be a bed Optimizer and the reason for this is because that the efficiency of HGD comes at a cost of the quality of it of finding the global Optimum so HGD makes all the all these oscilations given that it's using a very small part of the Training data when estimating the true gradients and unlike SGD GD is using the entire training data so it doesn't need to estimate the gradients it's able to determine the exact gradients and this causes
a lot of oscilations Ina in case of SGD and in case of G we don't need to make all these oscilation so the amount of movements that the algorithm is making is much smaller and that's why it takes much less amount of time for HG to find the global opum but unfortunately Most of the time it confuses a global opum with the local opum so SGD ends up making this many movements and it end up discovering the local opum and confuses it with the global opum which is of course not desirable because we would like
to have the actual Global opum so the set of parameters that will actually minimize and find the minimum value of the loss function and SGD is the opposite because it's using the true gradients and it is most of the time Able to identify the true global optim [Music] so the next question is how can we use optimization methods like GD uh but in a more improved way so how we can improve the GD and what is the role of the momentum term so whenever you hear momentum and then GD uh try to automatically focus on
the SGD with momentum because SGD with momentum is basically the improved version of HGD and as far as you know the difference Between HGD and GD it will be much easier for you to explain what is the HGD with momentum so uh we just discussed that the GD he suffers from oscilation so too many of those movements and a lot of time because we are using a small amount of training data to estimate the true gradients this will result in having entirely different gradients and too much of this different sorts of updates in the weights
and of course that's Something that we want to avoid because we saw and we explained that too many of those movements will end up causing the optimization algorithm to mistakenly confuse uh the global opum and local opum so it will pick the local opum think that is a global opum but it's not the case so to solve this problem and to improve the uh SGD algorithm while taking into account that HGD in many aspects is much more much better than the GD we came up with this SGD with Momentum algorithm where HGD with momentum will
take basically the benefits of the HGD and then it will also try to address the biggest disadvantage of HGD which is this too many of these oscilations and the way HGD with momentum does is that it uses this momentum and it introduces this idea of momentum so momentum is basically a way to find and put the optimization algorithm towards better Direction and Reduce the amount of oscilations so the amount of of this random movements and the way that it does is that it tries to add had a fraction of the previous updates that we made
on the model parameters which then we assume will be a good indication of the more accurate Direction in this specific time step so imagine that we are at time step T and we need to make the update then uh the what momentum does is that uh it looks at all the previous updates and uses the More recent updates more heavily and says that the more recent updates most likely uh will be better representation of the direction that we need to take versus the very old updates and this updates in the optimization process this very recent
ones when we take them into account then we can have a better uh a better way of and more accurate way of updating the moral parameters so let's look into mathematical representation just for a quick refreshment so what the SGD with momentum try to do is to accelerate this conversion process and instead of having too many of the movements towards the different direction and having two different open gradients and updates it tries to stabilize this process and have more constant updates and in here you can see that as part of the momentum we are obtaining
this momentum term which is equal to VT +1 for the update at the time step of t+1 what it does is that it Takes this this gamma multiplies it by VT plus the learning rate ITA and then the gradient where you can see that this inflated triangle and then underneath the Theta and then J Theta T simply means the gradient of the loss function with respect to the parameter TAA and what what is basically doing is that it says we are Computing this momentum term for the time step of T +1 which is based on
the pre updates through this term gamma multi by VT plus the the common Term that we saw before for the HGD and for GD which is basically uh using this ITA learning rate multiplied by the first order partial derivative of the loss function with respect to the parameter Theta so we then are using this momentum term to Simply subtract it from our current parameter t t in order to uptain and the new version so the updated version which is TAA t+1 where TAA is simply the modal parameter so in this way what we are doing
is that we Are performing more the updates in more consistent way so we are introducing consistency into the direction by waiting the recent adjustments more heavily and it builds up the momentum hence the name moment momentum so the momentum builds up this speed towards the direction of the global opum in more consistent gradients enhancing the movement towards This Global opum so the global minimum of the loss function and this then on its turn will improve of Course the quality of the optimization algorithm and we will end up discovering the global opum rather than local opum
so to summarize what this SGD with momentum does is that it basically takes the SGD algorithm so it again uses a small training data when performing the modal parameter up dates but unlike the SGD what SGD with momentum does is that it tries to replicate the gd's quality when it comes to the finding the actual Global optim and the way it does that is By introducing this momentum term which helps also to introduce consistency in the updates and to reduce the oscilations the algorithm is making by having much more smoother path towards discovering the actual
global the next question is compare badge gradient descent to mini badge gradient descent and to stochastic gradient descent so here we have three different versions of the gradient descent algorithm the uh traditional badge Gradient descent uh often referred as GD simply uh the second algorithm is the mini badge gradient descent and the third algorithm is the SGD or the stochastic gradient descent so the three algorithm are algorithms are very close to each other they do differ in terms of their efficiency and the amount of data that they are using when performing each of this model
training and the model parameters update so let's go through them one by one so the bch gradient Descent this is the original GD uh this method involves the traditional approach of using the entire training data for each iteration when computing degre gradient so doing the back prop and then taking this gradients as an input for the optimization algorithm to perform a single update for these model parameters then on to the next iteration when again using the entire training data to compute the gradients and to update the model parameters so here for the badge Gradient descent
we are not estimating the true gradients but we are actually Computing the gradients because we have the entire training data now B gradient descent thanks to this quality of using the entire training data uh has a very high quality so it's very stable it's able to identify the actual Global Optimum however this comes at a cost of efficiency because the bch gradient descent uses the entire training data it needs to every time put this entire Training data into the memory and is very slow when it comes to performing the optimization especially when dealing with large
and complex data sets now next we have the Other Extreme of the bch gradient end which is SGD so SGD unlike the GD and we saw this previously when discussing the previous interview questions that SGD is using uh stochastically so randomly sampled single or just few training observations in order to perform the training so Computing the gradients Performing the backrop and then using optimization to update the model parameters in each iteration which means that we actually we do not compute the actual gradients but we actually are estimating the true gradients because we are using just
a small part of the training data so uh this of course comes at the cost of the quality of the algorithm although it's efficient to use only small sample from the training data when doing the back Prop the training we need you don't need to store uh the entire training data into the memory but just a very small portion of it and we perform the modal updates quickly but then uh we found find the so-called Optimum much quicker compared to the GD but this comes at a cost of the quality of the algorithm because then
it starts to make too many of these oscilations due to this noisy gradients which then ends up confusing the global opum with the local opum and Then finally we have our third optimization algorithm uh which is the mini badge gradient descent and this mini badge gradient descent is basically the Silver Lining between the badge gradient descent and the original HGD so stochastic gradient descent and the way mini badge works is that it tries to strike this balance between the traditional GD and the SGD it tries to take the advantages of the SGD when it comes
to uh the efficiency and combined With the advantages of GD when it comes to stability and consistency of the updates and finding the actual Global Optimum and the way that it does that is by randomly sampling the training observations in two batches where the batch is much bigger compared to SGD and it then uses this smaller portions of training data in each iteration to to do the back propop and then to update the model parameters so think of this like the kfold cross validation when we are Sampling our training data into this K different folds
in this case batches and then we are using this in order to train the model and then in case of naral Networks to use the mini mini badge gradient descent to update the modal parameters such as weights and bias vectors so the tree have have a lot of similarities but they also have differences and in this interview question your interview is trying to test whether do you understand the Benefits of one and two and what is the purpose of having mini batch gradient descent [Music] the next question is what is RMS prop and how does
it work so we just saw that RMS prop is one of the examples that that can be defined as an Adaptive optimization process and RMS probe stands for the root mean squared propagation and it is like GD HGD and HGD with momentum an optimization Algorithm that tries to minimize the loss function of your deep learning model to find the set of model parameters that that will minimize the loss function so uh what RMS probe does is that it tries to address some of the shortcomings of the traditional gradient descent algorithm and it is especially useful
when we are dealing with Vanishing gradient problem or exploring gradient problem so we saw before that a very big problem during the training of Deep neural network is this concept of Vanishing gradient or exploring gradient so when the gradients start to converge towards zero uh so they become very small they almost vanish uh or when the gradients are so big that they are exploding so they are becoming very large and they result in a large amount of oscilations now uh to avoid this what RMS prop is doing is that it is using an Adaptive learning
rate it's adjusting the learning rate and it is using for This for this process has this idea of running running average of the second order gradients so this is related to this concept of haian and it is also using this DK parameter uh which takes into account and regulates uh what is the magnitude the average of the magnitudes of the recent gradients that we need to use when we are updating the model parameters so basically what is the amount of information that we need to take into account from the recent Adjustments so in this case
this means that parameters with large gradients will have their effective learning rate to be reduced so whenever we have uh large gradients for parameter we will be then reducing the gradients this means that we will then control the exploding gradient effect uh and of course the other way around holds true in case of RMS probe for the parameters that will have a small gradients we will be then controlling this and we will be Increasing their learning rate to Ure that the gradient will not vanish and in this way we will be done controlling and smoothing
the process so the RMS prob uses this DEC rate which you can see here too this beta which is then a number usually around 0.9 and it controls how quickly this running average forgets the oldest gradients so as you can see here uh we have this running average VT which is equal to Beta * VT minus 1 + 1 - beta * GT ^ 2 so This is basically our uh second order gradient so then what we are doing is that we are taking this running average and then we are using this to adapt and
to adjust our learning rate so you can see here in our second expression the Theta t + one so the updated version of the parameter is equal to the Theta T so the current parameter minus so we're subtracting the learning rate divided to square root of this VT square root of this running average and we are adding Some Epsilon which is usually a small number just to ensure that we are not dividing this ITA to Z in case our running average is equal to zero so we are ensuring that this number still exists and we
do not divide a number to zero and then we are simply multiplying this to our gradient so as you can see depending on our parameter we will then have a different learning rate and we will then be updating this learning so by adapting this learning rate in case Of RMS prop we are then stabilizing this optimization process we are then preventing all these random movements these oscilations and at the same time time we are ensuring smoother convergence we are also ensuring that our Network especially for deep naral networks it doesn't suffer from this Vanishing gradient
problem and from the exploding gradient problem which can be a serious problem when we are trying to optimize our deep neural network [Music] the next question is what are L2 or L1 regularizations and how do they prevent overfitting in theiron network so both L1 and L2 regularizations are shrinkage or regularization techniques that are both used both in traditional machine learning and in deep learning in order to prevent the model overfitting so trying to make the model more generalizable like the Dropout so you might know from these Traditional machine learning models what this L1 and L2
regularization do L2 regularization is also referred as Rich regression and L1 regularization is also referred as Lasser regression so what L1 regularization does is that it adds as a normalization or as a regularization or as a penalization factor that is based on this penalization parameter Lambda multiplied by this term which is based on the absolute value of the weights so this is different from the L2 Regularization which is the reach regularization and this regularization adds to our loss function the regularization term which is based on the Lambda so the penalization parameter multiplied by the squared
of the weights so you can see how the two are different one is based on what we are calling the L1 norm and the other one is based what we are calling L2 Norm hence the names L1 and L2 regularization so both of them are used with the same Motivation to prevent overfitting what L1 does different from L2 is that L1 can set the weight of certain neurons exactly equal to zero so in some way also performing feature selection whereas L2 regularization it shrinks the weights towards zero but it never sets them exactly equal to
zero so in this aspect L2 doesn't perform feature selection and it only performs regularization whereas L1 can be used not only for uh shrinking the weight and Regularizing the network but also for performing feature selection when you have too many features so you might be wondering but how does this help to prevent overfitting well when you uh shrink the weights towards zero and you are trying to regularize this small or large weights then this method such as L1 or L2 regularization they will ensure that the model doesn't overfit to the training data so you will
then regular Regularize the weights and this will then on its turn regularize the network because the weights will Define how much of this Behavior tic Behavior will be prevented if you have two large weights and you reduce them and you regularize them it will ensure that you don't have EXP floating gradient it will also uh ensure that the network doesn't heavily rely on certain neurons and this will then ensure that your motto is not overfitting and not memorizing straining Data which might also include noise the next question is what is the curse of dimensionality in
machine learning or in AI so the curse of dimensionality is a non phenomena in machine learning especially when we are dealing with this distance-based neighborhood based models like KNN or k means and we need to compute this distance using distance measures such as alodine distance cign distance Manhattan distance and whenever we have u high Dimension data so we have many features in our data then the model starts to really suffer from the curse of that dimensionality so the complexity Rises when the model needs to compute these distances between the the set of pairs but given
that we have so many features it becomes problematic and sometimes even infusible to obtain those distances and obtaining and calculating these distances in some cases doesn't even make sense because they no longer Reflect the actual pairwise relationship or the distance between those two pair of observations when we have so many features and that's what we are calling the curse of dimensionality so we have a curse on our ml or AI model when we have high dimension and when we want to compute this distance between pair of observations and this can introduce data sparsity this can
introduce computational challenges can introduce a risk of overfitting for our problem the Model becomes less generalizable and it will also become problem in terms of picking a distance measure that can handle this high dimension of our data