What's going on everybody welcome back to another video today we're going to be learning my sequel from beginner all the way to Advanced in under 4 [Music] hours sequel is one of the most important skills in my career and so I'm really exited that you are going to start learning SQL as well because SQL is super important in the data World in this really long lesson I'm going to Help you get your environment set up we're going to be walking through all of the basics so just selecting data how to query basic data but then
as we go through it we be working towards more advanced things like CTE temp tables and more at the very end we'll have two complete projects one for a data cleaning where we take a messy data set and we clean that data which is a very important skill to know how to do and then lastly we'll be doing exploratory Data analysis on our real data to really dive in and see how you can use SQL to dig into the data and understand it better so without further Ado let's jump on my screen and get started
all right so let's get started by downloading MySQL we're going to come right over here to dev. myql.com downloads sliner and I will have that link in the description so you don't have to write all that out but you should be seeing this page right here Now we have to select an operating system I'm using a Windows machine and if you aren't if you're using Linux or Mac or something else it should populate it for you but if it doesn't just select this drop down and select your operating system next we have two different downloads
we can install the MySQL installer Community or MySQL installer web Community this one is very small but then you actually do have to to download the installer it just gets it from the Web this one I'm going to download the actual installer it's larger but this is the one I'm going to do so I'm going to go ahead and select download it's going to ask me if I want to log in or create an account and I don't I'm going to say no thanks just start my download I'm going to save this in this desktop
folder doesn't really matter where you save it we're going to save that and it's going to download it should be done in just a few seconds I'm Going to go ahead and click on it and it's going to open it up when it's finished and we should get the interface or the UI for the actual installation for MySQL so here is the MySQL installer and first thing we need to do is choose a setup type now we're going to keep the developer default unless you really know what you're doing and you can select the server
only the client only full which is literally everything MySQL has to offer or custom so we're going to keep This developer default just installing the things that we kind of need so let's go ahead and select next and for whatever reason on my computer it's saying this path already exists you probably won't get that but I'm just going to go ahead and select next and then I'll select yes it keeps doing that I can't explain why but it keeps doing that for me even though I've deleted it from my computer completely um but it just
remembers it somewhere in its Memory now the next thing we need is to check requirements now I just have this one it says I need to download this uh Visual Studio I'm not going to do that but on your screen you may have multiple multiple requirements typically you're looking at something like this Microsoft Visual C++ redistributable package what you need to do is download this all you have to do is Click download once you download and install that on your computer and then we go back all of Those should be gone that's the one that
I see the most when I'm actually working with these requirements I had to install it myself when I got this new laptop so go ahead and install that if you need to but if yours looks like mine we don't need this visual studio for what we're going to do so we're going to go ahead and select next is giving us a prompt that we haven't satisfied all the requirements but that's okay we're going to go ahead and select yes as well now We're ready to install all of these things these are all things that myl
wants you to install the most important are the server and the workbench but it does not hurt to have all these other things as well some of these connectors are also important so we're going to go ahead and execute this will take just a few minutes I'll skip ahead uh when they're all done but this should take just a few minutes and then we'll continue on Installing my SQL so everything just completed and now we're going to select next and now we need to actually configure our product now really the only one that we actually
need to configure is the server the router says we need to configure it and the samples and examples say we need to configure it as well but really it's just the server let's go ahead and select next now we're not going to change anything for this type and networking unless you know what You're doing with the port the X protocol Port uh we're not going to change any of this we'll go ahead and select next next thing we need to do is Select an authentication method I'm going to be using us a password I'm not
going to be using the Legacy authentication method so I'm just going to go ahead and create a password now for you and I keep getting this error and I can't explain why right here for you you should be creating a password at The bottom it's remembering my password somehow and I I really can't explain it but I'm going to create my password or check my password uh this is one that I already created uh before I deleted it off my computer but it's still there um so it's saying my password is still good but if
you need to you should be entering a password and then confirming your password and saving it and then you should also be checking it as well and then we're going to configure this as a MySQL server as a Windows service I'm going to keep that checked uh and we're going to start the MySQL server at system startup I like that automatically being there I don't want to mess with that so I'm going to keep it as it has it we're going to go ahead and select next and the last thing we do is just need
to execute this and then everything we put in there is going to actually go so let's run this and execute it and that's just finished so let's go ahead And select finish now it's say config configuration complete for the server but we also need to configure these other two let's take a look at these really quickly we're not going to do anything on this it even says we really don't need to do this we just need to click finish and configuration not needed next we'll do samples and examples and we can input our password and
all this is really going to do is put in some sample databases for us in Our database which if you want you definitely can do that uh I just connected that worked I'm going to hit next and execute and it's basically just going to put in a database or two some sample ones for you to look at and the configuration is complete you don't have to do that one but we'll see that in just a second we're going to select next and now the installation is completely done and we can start my SQL workbench after
setup And start my SQL shell after setup now I'm not going to do the shell so I'm going to actually uncheck that and we're going to select finish now my SQL just popped up for us and this is exactly what you should be seeing now there's a lot of things in my squel to learn and know how to do we're not going to be taking a look at all of that stuff today but in future lessons we'll walk through a lot of these different things that kind of correlate With different lessons or things that we're
working on in my SQL the first thing that we're going to click on is right over here this is our local instance this is local to just our machine it's not a connection to you know some other database on the cloud or anything like that it's just our local instance we're going to go ahead and click on this so this is what you should be seeing right here this is where we're going to actually write all our SQL code And I'll show you all this in just a second but this is where we can actually
create our database and our database is going to go right over here on this left hand side this silicon one is actually a sample database it has a bunch of tables and Views St procedures functions has all these things in here if you want to go ahead and mess around with that what we're about to do is create our own database that we're going to be using throughout this entire series both the Beginner intermediate and advanced we'll use a lot of this and sometimes we'll import some other ones for different us cases but this will
serve for most of what we're trying to do throughout this entire series now what I'm going to do is I'm going to go ahead and I'm going to say open a SQL script file in a new query Tab and right here it opened up to a folder that I already created this MySQL beginner series folder within it we have this right here the parks and Rec create DB now in order to get this you just have to go to the GitHub and download this file that's all you have to do we're then going to open
this file so let's click on it we're going to say open and what you're now seeing is basically the query editor this is where you can write your code now we're not importing a database we're actually creating it by running code now because this is the first lesson in the beginner series I'm Going to assume that you don't know a ton about my SQL really all this is doing is creating the database name and then we're inserting a few tables into that database and then we're inserting data into those tables so this is all of
our data that will go into these tables that we create we only have 1 two three different tables that we're going to be using so all you have to do to run this is click this lightning button right up here we're going to go ahead and execute This if we come down to the bottom and we pull this up this is our output this says six rows affected and we have a bunch of other things like create table create table insert insert create table insert into these things are all working perfectly so now if we
go ahead and click refresh in our schemas with with this refresh button right here this Parks and Recreation table is populated if we go under the tables we see all of these things so now that we've actually Created our database and our tables that's really all we were trying to do in this lesson but I just want to open up a table really quickly show you what it looks like show you how we can run code and then in the next lesson we'll start actually learning how to query this data so let's go up to
employee demographics we're going to rightclick and select row limit 1000 this is going to open up a new window right up here and it's going to say select everything From this database dot this table employee demographics and it ends with a semicolon now right down here we have this output window this is the actual data that sits in our table we have columns right here so employee ID first name last name age gender and birth date and then here are all of our employees on each row so these are all separate rows so we have
Leslie nope Tom haford and it goes on and on so this is all of our data the most important things to Know when we're actually working with this and I'm going to zoom in is if we hover over this query right here and we run it we can select this execute which is this lightning bolt with this I we're going to execute this and it's going to run this because we're highlighted over it now if we have two queries let's say this one right here but let's change it to employee uncore salary we'll do underscore
salary let's say we want to query this Table so now if we highlight over this and we go up and select the lightning bolt with the I now we're looking at a different table but if we select even if we're hovering over this if we select this button we're going to execute everything in this editor window so let's run this and now you can see at the bottom we have two outputs the employee demographics and the employee salary so this button is going to run everything in this editor window whereas If we select this lightning
bolt with the I we're doing everything that's just under where we have the cursor where we have it highlighted the very last thing that I want to mention is that right over here you may have this up and you probably don't want that we're not going to do any SQL additions in this series you can get rid of that by clicking this button right here hello everybody in this lesson we're going to be learning about the select statement in my SQL the Select statement is used to work with columns and specify what columns you want
to see in your output the first thing that we need to do is open up a tab or an editor window so let's come right up here to the left hand side and we're going to create a new tab and I'm going to zoom in just a little now what we need to do is we need to select the actual table that we're going to be querying off of if you remember from the very first lesson when we set everything Up we came over here and we right-clicked and did select rows limit 1000 we're not
going to do that we're going to actually write it out so what we need to do to select that table the employee demographics table is we need to select everything that's what the star means the star means everything all tables all rows now we do have a limit on here we have a limit to 1,000 rows so if we had a table that had 50,000 rows This limiter would be an issue it would still limit it to 1,000 rows we would have to change that to 2,000 5,000 probably all the way up to 50,000 if
we wanted to view everything if we had say a million rows we would need to come up here and say don't limit and it would give us a million rows the reason they do this is mostly to keep the processing time low if you have a million rows it's going to take a long time for the output to actually appear so let's come right Back here the next thing that we need to do is we need to say select everything and now we need to say where we're selecting it from so we're going to come
right down here and we're going to say from and now we need to specify what table and we're going to say employee uncore demographics and at the end we need a semicolon now why do we need a semicolon this is going to tell my sequel that this is the end of this query so if you Write another one down here which we will in just a second it'll be able to distinguish between the two queries we're going to go ahead and we're going to run this and we'll just use this execute right here instead of
this one and there we have our entire table so we were able to get our table now there is one thing that is potentially wrong depending on what you're using it for but what we didn't do is we did not specify the actual database before it we Only specify the table and this works perfectly fine because if you look over here on this left- hand side we have Parks and Recreation it's in Black it's bold that means that we're hitting off of this database what's going to happen though if we come down here to the
Cy database and we double click on it now this database is highlighted so now when we're selecting this table we're trying to select this table from the CIS database let's go ahead and try This if you notice we have no output let's come right down here and pull this up it's going to say employees sis. employee demographics doesn't exist so it's assuming that we're highlighting this CIS database that means we're trying to pull from that database now we can still have this highlighted and still select the correct database by saying parkscore andore Recreation and let
me spell that right Dot so now we're selecting everything from Parks and recreation. employee demographics if we run this we do get the correct output that's just something to consider especially when you're working with a lot of databases and a lot of tables it's usually best practice to actually put the database in front of the table name although throughout this l we probably won't be doing that every time since we're only going to be using this Parks and Recreation database let's Go ahead and double click this so we have this highlighted again and let's slick
all of this let's copy all of this we're going to come down just a little bit right here now so far we've only selected everything but we don't have to do that we can actually just select one column if we would like to for example if we got rid of that star we say first uncore name we're selecting the first name column from this table if we highlight this query and we hit the Execute button with the I now we are only going to return in our output all of the first names and we can
add a lot more let's actually look at all these we can separate multiple columns with a comma so we can do first name comma lastore name and then we could do birth uncore date so now we have three separate columns let's go ahead and run this and now we have first name last name and birth date in our output now the way we just wrote it is All on one line and that's perfectly acceptable because my SQL is going to read it the exact same as if we did it in a different format as long
as it's still in this order but sometimes you'll see it like this where it select first name comma last name comma birth date all on different rows now there's a lot of different use cases for this or reasons for this but it typically can be easier to read also if you're doing any type typ of functions or calculations in The select statement it's easier to separate those out on its individual row now again we won't always be doing this but it does help sometimes if you're doing that it just makes it easier to visualize for
example if we added the age so let's add age in here let's run this let's say we were doing a calculation where we wanted to add you know 10 years to their age so we'll say age and we'll actually create a new row for this or new column we'll do age + 10 So now we can easily see that we're doing plus 10 here and this is another thing that you can do in the select statement things like calculations so if we go up here and we run this we'll now have an age column but
we'll also have an age plus 10 column where it just adds 10 to the age and we can at least visualize and really easily see this when we're doing these calculations now something really important to know about any type of calculations any math within My SQL is that it follows the rules of pemos now pemos is written like this it's pmde s now what I just did right here with this pound or this hashtag is actually create a comment so this code isn't going to actually run but it's just for note-taking or seeing things in
your actual editor window I'll come back to comments in just a second but just wanted to explain what that was now what pemos is is the order of operations for arithmetic or math within my SQL this Stands for for parentheses exponent multiplication division addition and subtraction so this is the order that these calculations are going to run in the execution engine that my SQL has so if I do age + 10 and we'll put that all in parenthesis and then we come over here and we add times 10 so we're doing plus 10 here and
then a times 10 here what's going to actually happen is it's going to say age + 10 so 44 + 10 = 54 then we're mul Ling time 10 the Parenthesis executes first because it comes first in this order parenthesis multiplication comes next because it's this one and then anything else after that if we did you know plus 10 could run this and you'll notice that it still follows the logic 10 was just added at the very end to all of these outputs now let's go right back up here let's select everything again from this
table let's pull up this table so we can see it a little better and let's go down because The last thing that I want to show you is something called distinct now this is really really useful and you use this a lot in my SQL what distinct is going to do is it's going to select only the unique values within a column let's go ahead and copy this employee demographics bring it right down here let's say select and let's do first uncore name so now we're just selecting the first name let's come right down here
There we go so now we're selecting just the first name from this column now these are all unique values so if we come right here and we say distinct nothing should happen to this table because these are all unique values let's go ahead and run this as you can see the output looks exactly the same but what if we were to do something like gender so let's come here let's do gender let's run this keeps going down I don't know why it's Doing that um but now we have male and female now these are not
all unique we have female female female and female and the rest are males so there's only two unique values here so if we come right here and we say distinct gender now there should only be two in the output male and female let's go ahead and run this so now we get male and female in our output now this works perfectly in one column but what happens if we have two columns So let's do first underscore name comma gender let's go and run this now the combination of first name and gender are no longer unique
now Leslie and female are being grouped together and it's taking the distinct between both of these columns so when we're only working with gender it's only looking at this one column for both male and female so it reduces it down to the only two unique values but because we added the first name all of these values Are unique so therefore the name plus the gender combination is always going to be unique the very last thing that I want to show you in this lesson doesn't actually pertain to the select statement but I want to save
this code let's say we wanted to update this or upload this into our GitHub or save this and send it to somebody we can do that we can save it by clicking this save button right here I'm going to go ahead and click this and now we're in our MySQL beginner Series folder I'm just going to save this and I can save this as anything I want so I'm going to say two. select statement tutorial so now when I save this you'll notice that the name gets changed up here to two. select statement tutorial let's
exit out of this I'm going to open up and now I'm going to come here to the select statement tutorial I'm going to open it and now I have our code again exactly as we had it written before Hello everybody in this lesson we're going to be taking a look at the wear Clause the wear Clause is used to help filter our records or our rows of data whereas the select statement is used to help filter or select are actual columns so when we're using the wear Clause we're only going to return the rows that
fulfill a specific condition let's take a look at exactly how this works let's say we come right up here we're going to say where and let's go down with that One let's say where and now we need to specify what column we're about to create this condition for so we're going to say first uncore name so we're saying where the first name we'll say is equal to and let's do quotes and let's say Leslie so we're saying the first name has to be equal to this value right here which is Leslie for Leslie nope if
we run this there's only going to be one row that's returned because Leslie is the only Leslie in this entire table now We just used an equal sign and that's actually called a comparison operator and there's a few other comparison operators that you can use let's take a look at some of these other ones let's pull this down right down here and and let's actually highlight the select from and we're going to run it with this one right here it's going to only select everything from the whole table so we didn't select that wear Clause
let's go right down here and let's look at this Salary field so I'm going to say wear the salary and I'm going to do a different comparison operator called greater than so when the salary is greater than 50,000 now one thing I want to note before we actually run this is that right down here we have haford who makes exactly 50,000 and I think there's one more Jerry gach which also makes exactly 50,000 if we run this you'll notice that both Tom and Jerry are not in this Output but in the salary field everything is
greater than 50,000 the reason for that is that Tom and Jerry made exactly 50,000 what we're saying right here is where the salary is only greater than if we want to include Tom and Jerry we have to say greater than or equal to and now we'll select 50,000 or above whereas right here before when you're doing just this it was greater than 50 it didn't include the 50,000 let's go ahead and include it and run This and now you'll notice that Tom and Jerry were both included because they had exactly 50,000 and we said greater
than or equal to now we could do the exact same thing but with less than so we have less than 50,000 and now we only have two people who make less than 50,000 that's April and Andy and if we say less than or equal to and we run that now we include both Tom and Jerry who make exactly 50,000 so it's less than or equal to $50,000 now what we're going to do is head on over to a different table we're going to do the demographics table make sure I spell that right and let add
our semicolon let's run this and what we want look at is the gender really quick so we're going to say where the gender is equal to we'll do in quotes female and if we run this we get all the genders that are equal to female but we do have something called the not equal To and it looks like this it's an exclamation point and an equal sign this is going to say where the gender is not equal to female so if we run this you'll notice that the the gender is all male now now so
far we've worked with things like integers which are numbers we've worked with characters or strings like names but there's a different type of data type as well in here we have a date column for these birth dates now in the wear Clause we can also filter on birth Dates let's come over here and we'll say birth uncore dat let's say it's greater than and within quotes we'll say 1985 - 01-1 this is kind of the standard default date format within my SQL which is year month and day if we go ahead and run this we
can also take all the people who are greater than or born greater than 1985 so all of these dates are greater than 1985 now the next thing that I want to take a look at is logical operators In the wear Clause so logical operators are things like and or and not now these are called and let's add this logic operators so logical operators allow us to have different logic and let's take a look at how this works exactly let's copy this down because we already have this one written out we're saying where the birth date
is greater than 1985 we can also say where the gender is equal to male so we can say and the gender is equal and then we'll say male So we're adding a different complexity or an additional conditional statement within our wear clause let's go ahead and run this so now we're only selecting birth dates that are greater than 1985 and where the gender is equal to male only the rows that fulfill both of those are returned now the and says both this and this have to be true but we could change this we could say
or what this means is is either this one has to be true or this one has to be true in order For it to be returned so let's go ahead and run this you'll notice that Jerry gurit Was Born much before 1985 but since he has a male gender he is in our output and we could also use the not Operator by saying or not gender equal to male so now what this is saying is the birth date could be greater than 1985 or it could not be equal to male which is female so if
we look at Leslie nope she was born before 1985 but because she is female she is in The output now like we talked about in the last lesson there is something called pemos and that actually applies to these logical operators as well so if we run this entire table let's go ahead and run this if we're looking at this entire table let's say we want to get someone very very specific let's say we're going to do uh where the first uncore name is equal to Leslie and their age has to be equal to To 44
that's extremely specific and we can actually just do it like this we don't need quotes um for integers we could just do the number if we'd like to this is very specific this is only one person but if we put this in parentheses we can add an or over here we could say or the age is greater than let's just do 55 let's go ahead and run this and then we'll take a look at it so Within These parentheses we have an and operator what that means is both this Condition has to be met and
this condition has to be met and that's only one person that's Leslie no but then outside of these parentheses we have another conditional statement or the age is greater than 55 so what we're saying Within These parentheses is that this is an isolated conditional statement Within These parentheses if this is true then in our output it'll be returned but then we have an or condition which says or someone with the age of greater than 55 Can also be in the output so these parentheses can be really helpful when you're actually using it in the wear
Clause with these and ores and Knots now I want to take a look at just one more thing and let's bring this down here and let's get rid of this entire thing now the last thing that we're going to take a look at is a like statement now the like statement is super unique because we can look for specific patterns we're not necessarily Looking for an exact match like here if we said where first uncore name is equal to Jerry if we're looking for Jerry it has to be exactly Jerry but if we take this
out say J and then we run it we get no output it has to be an exact match but here's where the like statement comes in because we can actually say like Jer and we can add two special sequences or special characters within our like statement so those special characters are the percent sign and the Underscore the percent sign means anything and the underscore means a specific value let's see how that actually works so what we're going to do is we're going to say like Jer percent sign that's the first one in this like statement
what this says is the first name is like starting with Jer but then has anything after it doesn't matter what it is as long as it has Jer at the very beginning it will be returned let's go ahead and run this now the only Person who starts with j r is Jerry but what if I took the J out of here now it's saying it starts with E and that's not anybody what we can do is we can add another percent at the beginning this is going to say anything comes before anything comes after all
we're looking for is er somewhere in their name let's go ahead and run this there still is only one person and that's Jerry now let's come up here and let's get rid of this and let's say We're looking for everyone's name who starts with a we can do that really easily by saying a perent sign all that says is it starts with a we don't have a percent sign before it which would say this string just has to have an A somewhere in it if we have it like this this means an A has to
come at the beginning let's go and run this in our output we have April and an Andy now let's take a look at the underscore if we get rid of this percent Sign and we do two underscores one two this is going to say it starts with an A and then it has two characters after it no more no less so if we run this an is going to be the only person who's returned because she has an A and then two characters after it now if we want Andy we can specify that by doing
another underscore that's 1 2 3 and now Andy is the only one in our output now there was also April in there but she had more than three characters But we can actually get her in our output by doing a percent sign so we can combine both the underscore and the percent sign and this is going to say it starts with an A has one two three characters and then it can have anything after that so it just has to have at least an A and have one two three characters after it so let's run
it now you can see April comes into here because she does have a the p r and I are the three next characters but then We have a percent sign that allows that L to be in the output as well now we don't just have to do this with strings or text like April and Andy we could also do this with birth dates for example Andy's birth date is 1989 we could say where the birth undor date is like let's say we want to look at everyone who who is 1989 or born in 1989 let's
go and run this and Andy's the only person born in 1989 but again we looked at the year at The very beginning so that is how the like statement works it looks for a specific sequence within that column that you can search for so it doesn't have to be an exact match as long as it has that specified sequence that you've put in there anywhere within that cell or that column hello everybody in this lesson we're going to be taking a look at group by and order by in my SQL now when you use the
group by clause in my SQL it's going to group together rows That have the same values in the specified column or columns that you're actually grouping on once you group those rows together you can run something called an aggregate function on those rows let's see how this actually works let's go ahead and copy this right here we'll bring that down and let me go back up one let's go ahead and write gender right here now we want to group on this gender column and we're going to say Group by gender let's go Ahead and run
this we'll see what we get and so we have male and female now we could get the exact same output by saying select distinct gender from this table what is groupy doing that the gender actually isn't doing well it's actually rolling up all of these values into these rows so later when we run aggregate functions like average min max we'll do it based off of these rows and all those rows are rolled up into these two rows and we'll see that in a little Bit now what if I was to come up here and in
this demographics we have a first undor name what would happen if I'm selecting the first name but I'm grouping by the gender let's go ahead and run this if we come right down here we pull this up you can see that the select list is not in group by clause and contains nonaggregated column what this means is that when you are selecting a column if it's not an Aggregated column like say average of something if we're not using the aggregate functions in the select statement it has to be in the group ey these have to
match so this gender has to match this group by if we're not performing an aggregate function on it let's go ahe and run this and now it works properly now let's go back up let's run this query because I want to select everything again but let's say we wanted To take a look at the average ages for gender so what we're going to do is we're selecting gender we're also grouping by gender but what we're going to do is add a comma and we'll say the average that's AVG that stands for average and then we're
going to put in here age so now this right here is an aggregate function this does not need to go in the group by we're just grouping on the gender and then we're performing this aggregate function or kind of a Calculation based off of those grouped rows for gender so let's go ahead and run this and take a look at the output so what this is telling me is that for the males all of the male rows that were grouped the average age is 41 let's say three and for female the average age is 38.5
so super quickly you can tell that the average age of females is lower than the average age of males now we'll take a look at aggregate functions more in just a little bit let's actually go to a Different table let's come right down here we're going to go to the salary table and just select everything for now let's go ahead and run this now what we're going to actually be grouping on is this occupation right here now there's a lot of unique values it's um not as distinct as the gender which only had two values
you'll notice we do have a few that are the same we have ones like office manager so when we come up here going to say Occupation and of course we need to group Group by the occupation as well now let's run this you'll notice that office manager only has one row let's say we also want to group on the salary let's say salary now we can group on multiple so we're going to say salary like this so we're grouping on the occupation as well as the salary now let's run this you'll notice that we have
two rows for office manager now this is because this salary and this Salary for those two employees are different we have 50,000 and 60,000 for this I just wanted to demonstrate that if these had both been 50,000 there would only be one row office manager 50,000 but because this is a unique value different than 50,000 they have their own individual rows which we would then perform our aggregate calculations on let's go and get rid of that because we will not be using that anymore I just wanted to demonstrate it really quickly So before we were
looking at gender and average age and we were also grouping on the gender we can perform other aggregate functions as well let's take a look at some of those we could look at the max age as well the max is going to show us the highest value within each of those groupings so we have a male and female the max age for those for the male is 61 and the highest age for the female is 46 we do the exact same thing except we can say men or the exact Opposite thing we can say the
minimum age so this is going to be the lowest or both the male and the female go and run this now we have female and male and the minimum age is 29 and 34 and there is one last one that I want to show you which is Count we're going to do count now count is going to count the actual rows within this age column so if we run this you'll see that we have four females per count and we have seven males it's just telling us a count of How many values is in this
column when we're actually grouping on the gender so that's how we can use groupy to actually roll up and group all of these similar values within a column or columns and perform our aggregate functions on them now let's come down here and what we're going to take a look at is order by so we're going to say order by now let's actually pull in this demographics table right here we're just going to say select Everything and run this really quickly after we had a semicolon so order by order by is going to actually sort the
result set in either ascending or descending order let's take a look at how this works at the very end we could say order by and we could order by the first underscore name so we're going to take this column we're going to order all of our rows based off of this one column let's go ahead and run this so it's going to do It based off ascend in order which means smallest to largest now this is a text column or a character column so we do it A to Z so Andy and April all the
way down to Tom Now by default this is in ASC order ascending order and if we run this it's going to be the exact same output but we can change this to do it the opposite highest to lowest or Z to A by doing descending so now if we run this you'll see that goes Tom all the way down to Andy now let's take a look At ordering on something like gender and age because we can do both at the same time so let's order by the gender first let's go ahead and run this and
you'll see that all the females are grouped together and then all the males are grouped together because that's just the order in which it is but we can do an additional column we could also do it based off of the age let's go ahead and run this so now within the female since that Came first in our order by we're ordering by the gender and then we're also ordering by the age after we've ordered by the gender so now it's 29 all the way up to 46 then 34 for males all the way up to
61 now we can change this just for the age let's say we want to do age descending so gender will stay the same in ascending order but now age will be in descending order let's go ahead and run this now female and male stay the same But now it starts at the highest EST down to the lowest now this is something that I would absolutely do in real life except sometimes you can make mistakes and sometimes you do the wrong column first let's do age and then we'll do gender now if we run this the
gender is not going to be used at all and this is because there are no unique values that are going to be on the same row so notice all these values are completely unique so the gender never is actually Used to order anything on because if there were things like 34 34 34 34 these would be ordered based off of the gender but since there's no unique Fields this is really pretty useless that's why the order of the order bu or the columns that you place in the order bu are actually quite important now the
last thing that I want to show you and I'll just go back to gender and age is that you don't actually have to use the column names we can use the column Positions now I will preface this by saying I don't recommend doing this but I sometimes do it in shorthand for just a quick query um if I know the column position and I don't want to write out the whole name so sometimes I do it although it's not best practice but let's take a look at it so gender is the one two 3 4
fth column I'm going to replace this with five and age is the 1 2 3 4 column so these are the positions of the fields but not the names of them If we run it we're going to get the exact same output because these represent these columns appropriately but again I just don't recommend it it's kind of a slippery slope that I've fallen down myself uh many times and when you get to more advanced SQL and you're creating things like store procedures and triggers and all these things this can actually cause a lot of issues
if you were to add any columns or remove any columns then you'd be Ordering by the wrong column because let's say this last name got removed we didn't want it for some reason then the gender is 1 2 3 4 now we're ordering on the wrong column and that would be a big mistake so just by best practice it is better to do gender um a age but I just wanted to show you that in case you want to be like me and kind of go down the wrong path hello everybody in this lesson we're
going to take a look at the difference between having and where now In the last lesson we looked at group by and order by the most obvious thing to do would be to come right here and say where and we're going to say this column which is actually named this we'll say where the average age let's say is greater than 40 which would only be the males so let's go ahead and run this and as you can see we're not getting any output let's bring this up and take a look at the error it says
invalid use of the group by Function what's actually happening is something to do with this group by gender right here when we're selecting ing gender and then we're performing an aggregate function this occurs only after the group by actually groups those rows together so when we're trying to filter based off of this column right here of average age it really hasn't been created yet because this group I hasn't happened that's where the having Clause comes into play so let's go ahead And what we're going to do is we're going to get rid of this we're
going to come right down here and instead of where we're going to say having now having was specifically created for this exact example it comes right after Group by and after Group by we can filter based off of these aggregate functions so now if we run this we're going to get an output that only has where the average age is greater than 40 now let's take a look at just one more Example and I'm going to show you how you can use both in one query so instead of demographics let's look at the salary table
and let's run it now in this salary table we have this occupation and remember we have this office manager that happens twice and this is going to be our main example so we're going to say occupation and then we'll say the average salary now we'll need to come Down here and we'll say Group by now we're going to say occupation so this should look pretty similar because right here we have our office man manager and one of the office managers made 50 one of the office managers made 60 so the average is 55,000 now I
can use the where by saying where then I'll say occupation like and let's see people who are managers so I'll say percent manager percent and close that quote so they're Like a manager and then I want to see where a manager makes more than let's say 75,000 so I won't actually say where I'm going to say having an average salary and I need to add a space there having an average salary greater than let's say 75,000 and let's run this so now I filtered at the row level right here in the wear Clause but then
down here I filtered at the aggregate function level This having is only going to work for aggregated functions after the group ey actually runs so that is the difference between the having clause and the wear Clause the wear claw you're most likely going to use a lot more but if you do want to filter on those aggregated function columns you have to use the having Clause hello everybody in this lesson we're going to be taking a look at limit and aliasing limit is just going to specify how many rows you want In your output if
we take this table for example if we come right here and we say limit let's do three if we run this it's only going to take the top three that we have let's go and run this as you can see we have employee 1 three and four Leslie Tom and April now this seems super straightforward really really easy but it can be combined with order bu to actually be really powerful for example let's say we wanted to take the three oldest employees all we'd have to do is Come right under here we say order by
and we'll order by the age in descending order so we're going to order on age descending and then it's going to take the top three so if we run this and very quickly we have the top three oldest people in this table now there is one additional parameter that we can use in limit and all we have to do to access it is have a comma here now what this is going to do and I'll put a one here what this is going to do is it's now going to Say we're going to start at
position three and then we're going to go one row after it now I actually want to take one of these people so let's start at position two and select the next one after it which should be Leslie nope so we're going to start at position two and we're going to select the one right after it so we're going to start at position two and then one means we're taking the next one row let's go ahead and run this and as you can see we got Leslie nope in our output now let's come right down here
we are going to now look at aliasing now aliasing is just a way to change the name of the column for the most part and it can also be used in Joins but we're going to take a look at joins or aliasing joints in the intermediate Series in a previous lesson we looked at a group bu that looked like this we selected gender and we said from I believe it was Employee uncore demographics then we say Group by gender And we also had the average and I think it was age there we go go and
we'll add our semicolon let's go ahead and run this in our output we have gender as our gender column the same as the column name but then average age is average age and so if we want to actually do something like a having where we say having the average age let's say greater than 40 like we had it we have to actually use this aggregate function in our having clause and we don't want to Always have to do that we can actually change the name name of this column and subsequently use it throughout our query
with that Alias name so I'm going to say as and that's the key word to actually change it we'll say as and we'll do average uncore age so now we've changed this name to average uncore age and we can come down here to having and say having the average age greater than 40 and when we run this it works perfectly and you'll notice that the name of the Column was actually changed now this as isn't actually one 100% needed it's kind of implied even if we get rid of it it's implied there's like this as
in there somewhere um but we don't have to have it if we took it out and ran it like this it would still work exactly the same hello everybody in this lesson we're going to be taking a look at joins joins allow you to combine two tables or more together if they have a common column that doesn't mean the column name Has to be the exact same but at least the data within it are similar that you can use there are several joins that we're going to look at today like inner joins outer joins and
self joins these are the two tables that we'll be working with the most throughout this lesson we have the employee demographics table as well as the employee salary table now within the employee demographics table we do have this employee ID column and if we look at the employee salary we Also have the employee ID column so in this instance the column name is actually the exact same and of course the data inside of it is also very similar so let's start by writing out an inner join this is probably one of the most common joins
one of the most simple joins as well an inner join is going to return rows that are the same in both columns from both tables so let's see how we can actually write out this join let's come right down here and let's Copy this this will be the first table that we start with and then we'll join the salary table onto this demographics table so what we need to do is we need to come right here and we need to say join now by default join represents an inner join although we can write inner join
here to make it more explicit explicitly writing out inner join then we're going to come up here and we're going to say employee salary so we're selecting everything from the employee Demographics we're doing an inner join on the employee salary now we have to tell my SQL exactly what columns we're supposed to be joining on I'm going to hit enter and I'm going to hit tab now you don't have to hit tab it just looks more codel likee and it's easier to read and that's how I've done it for other programming languages as well so
that's how I'm going to show you how to do it what we need to do is say on now this keyword is going to allow us to say We're joining the demographics table to the salary table based on these two columns so from the demographics table we're doing the employeeid is equal to and then in the salary table it's also the employee ID let's do employee and you spell that right employee ID now if we try to run this and let's do this we're going to get an error and let's bring this up it's going
to say column employee ID on the in Clause is Ambiguous now what does it mean ambiguous that means that it doesn't know what table this employee ID is from is it from the employee demographics table is it from the employee salary table we don't know because it's ambiguous now what we can do is we can specify it by saying employee demographics do employee ID and then employee salary. employee ID now if we run this we're going to get the output that we're Looking for and let's take a look at this real quick let me bring
this up so we're pulling everything from the employee demographics that's right here all the way through the birth date then we're pulling the employee salary table that's the employee ID all the way let's scroll over through the department ID so we're basically pulling in all of the rows or all the columns from both tables but we're not pulling in all of the rows remember an inner joint is only going to Bring over the rows that have the same values in both columns that we're tying on so in this employee ID we're missing number two are
we missing any other ones no we're only missing number two let's go back up and I'm going to run both of these tables and we're going to take a look so let's run this so you'll notice in the employee salary table we have a number two right here and that's Ron Swanson but in the employee demographics table we don't Have that I believe that Ron Swanson did this that Leslie nope would not know when his birth date was because he didn't want to bring that information I think that makes the most sense although Ron was
not willing to give a comment on that now if we run this again you'll notice that two is not in there since two is not in the employee demographics table the employee ID 2 is not going to be populated or brought over into this output from the employee salary table Now really quickly this is honestly uh giving me some anxiety because this is so incredibly long something that I mentioned in the beginner series is that you can use something called aliasing when using joins and it's really helpful this is what I mean so right here
we have employee demographics we're going to call this DM you can also do as DM you can do as salal these are just short names for demographics and short name for salary and we can replace these and Say dm. employee ID and s. employee ID oh that looks so much better now we're going to run this and it'll be the exact same output but now we're using these aliases which just makes it so much easier to read now one last thing that I want to show you while we're just looking at the inner join is
selecting the actual columns let's say we wanted to select the employee ID and we wanted to select age and then we wanted to select their Occupation if we try to run this we're going to get an error and it's going to be almost the exact same error that we got before which is column employee ID in field list is ambiguous so in our field list which is right up here in the select statement we have this employee ID it does not know which employee ID to pull from whether it's the demographics or the salary so
we have to tell it which one to pull from so let's pull it from the demographics by saying dm. Employee ID now when we run this we're able to get information from both tables in our output without having all of the information and if there are columns that are similar in both tables we have to denote that by using this Alias or the table name all right so that is inner joins now let's move down here I'm going to copy this and we're going to come right down here and we're going to look at outer
joins next and let's put that right here now for outer joins we Have a left join and we have a right join or a left outer and a right outer join a left join is going to take everything from the left table even if there's no match in the join and then it will only return the matches from the right table the exact opposite is true for a right join let's see how this actually works let's start by changing this to a left join or a left outer join they're both the same and you can
use them uh similarly I'm just going to say Left join and we're joining it on the exact same things and I'm going to take everything so I think that'll be easier to visualize and I'm going to run this now you may notice that this looks exactly the same and that's for a very good reason it's because in the left table which is the employee demographics table the from statement that's our left table and then the join where we're actually joining on that's our right table so this is our right table so Since we're doing a
left it's taking everything from the employee demographics now remember the employee demographics didn't have Ron Swanson it had no information so everything in the right table had a match let's change this to a right join and what this is going to do and I want to make it all cap make it all the same what this is going to do is it's going to take everything from the employee salary table but if there is Not a match in the employee demographics it just will have nulls let's go ahead and run this so now looks a
little bit different now we're taking everything from the employee salary so we're taking Ron Swanson but if there is not a match it will still populate that row but it'll have all nulles in it then any of the information that is overlapping are the same it will bring over so employee ID is matched to employee ID 1 then we'll bring all that information over From the left table and that's essentially what a left and a right join is with a left table you're taking everything from the left table and then matches from the right table
if you do a right join you're taking everything from the right table but only matches on the left table and again it populates it with nulls now let's go down and look at our last type of join that we're going to look at and this is a self let me spell that right a self join now what is A self join it is a join where you tie the table to itself now why would you want to do this let's take a look at a very serious use case let's do select everything let's do this
from uh employee unor salary and let's run this now let's say it's December 1st and the employee and Rex Department decided to do a secret Santa and they wanted to assign based off of their employee ID the person who they're going to have as a secret santa we can help orchestrate This very easily using MySQL well very easily is subjective I guess but let's take a look at how we can do this so just like any other join the first thing we're going to do is Select everything from employee salary and then say join and
then we're going to say employee salary again so we're tying it to itself now when we come down here and let me do this when we come down here and we say on we have to specify which table we're pulling from are we pulling from the Left table which is like the first table we're pulling from or we pulling from when we're joining on the right table we need to be able to distinguish these two tables because they are the same so I'm going to say emp1 and I'm going to say emp2 just to say
this is employee table one and employee table two so we're going to tie them based off the employee ID because we know those will be the exact same because we're pulling from the same Table so we'll do EMP p1. employee and just so you know if it populates like this you can hit Tab and it'll Auto finish that for you and EMP 2. employee ID now if we run this let's do this the output that we're going to get is literally just a one for one match it's all the columns and all the rows because
they all match exactly but now what We're going to do is we're going to assign an employee ID to the next employee ID and that'll be their Secret Santa so just keep it really simple the next highest person with an employee ID that is their Secret Santa so let's do an employee ID + one is equal to employee 2. employee ID so we're adding one over here and we're saying that's equal to this employee ID over here let's run this so now you can see Lesly nope is Now going to be assigned to Ron Swanson
who has an idea of two Ron Swanson is going to be assigned to Tom haford which I'm sure he's really happy about and so on and so forth now let's bring this down here and what we're going to do is try to simplify this and simplify this output a little bit because this is a little bit chaotic down here so we're going to specify what columns we want in our output what we're going to want is the employee ID first name last name and Then employee ID first name last name of the person who they
got for Secret Santa so we're going to start with emp1 employee employee ID and we can call this we'll just say as empore Santa then we'll do a comma and we'll come down now I need to spell m employee right so we have our employee ID and now we need our first name and last name now remember Leslie nope is going to be the secret Santa for Ron Swanson I don't know if I made that clear but that's I guess how it works so now we need to do uh emp1 Dot and we'll do first
uncore name and we'll do as we do as first unor namore Santa we can do a comma we'll do the exact same thing except for the last name so lastore name as lastore name Santa now all we have to do is do a copy all this bring it down here and change this to two we're pulling from the second Table and it'll look just like this and get rid of this comma and this is done let's run it and let's bring this up so we have employee Santa first name Santa Leslie last name Santa nope
then employee Santa and we actually need to change these names that is one thing we need to do we'll just change it to employee name first name employee and last name employee and now when we run this we Have our Santa and then we just have the employee who this person is going to be the Santa for now this is kind of a silly way to look at it but in essence this is exactly how a self jooin works now the very very very very last thing I promise you the last thing that I want
to show you is how we can join multiple tables together so we're going to say joining multiple you spell that right multiple tables together now not just one table together to another table I'm Talking about one table to another table to another table so let's go all the way back up we're going to take this right here and bring it all the way down and what we're now going to do is we're going to tie in this table right here the parks department let's actually look at this table and let's select everything real quick and
let's run this now let's go down here and we're going to say select everything we'll do this from Parore departments and let's run this now this is something called a reference table this is not a table that most likely you'll ever add a bunch of information to it's there to reference that we have these Department names tables like the salary table or employee demographics table are going to change pretty often as people get raises or as they get older with their age those are going to be updated fairly often whereas this parks department table is
just There for reference now if we look down here in the columns we have a department ID then we have a department name so we have the ID and the name of that ID if we run our join and we scroll all the way to the right you'll notice we have a d ID this stands for Department ID that's in the salary table so what we want to do is join this department ID to the department ID from the parks and wreck so what we can do is we're going to say enter join and now
we're going to Join let's scroll down just a hair there we go now we're going to take this and do it off this so we're going to call this PD for short and we're going to say we're joining it on now we cannot join this park Department to the employee demographics table why is that well the employee demographics table only has employee ID all the way through birth date there's no common column that we can tie to this parks department the Only table that has a common column is this department ID in this salary table
so what we need to do is actually take s so we'll say Sal Dot and then we're going to say d and we'll say Department ID is equal to the PD Dot and we need to take the department ID now notice these are not the exact same name they are a little bit different but they have the same values one thing I forgot to mention is that in this parks department there's no repeating that's why it's a Reference whereas in the salary the department ID repeats several times because multiple people are in the same department
so this reference table also usually does not have duplicates uh just one other thing to note but we have now tied it successfully let's try to run this and and if we come down here go all the way to the right we now have the department ID 1111111 and the department ID and Department name Parks and Recreation Healthcare Public Works Finance Public Works and Parks and Recreation so this worked perfectly so this is how you can tie multiple tables together if you have common columns between them even though employee demographics has no column that's related
to the parks department table we can still tie them together based through this employee salary because employee demographics can tie to employee salary employee salary can tie to the parks department and that really Is the majority of what you need to know in order to use joins well hello everybody in this lesson we're going to be taking a look at unions in MySQL a union allows you to combine rows together not like columns like we were doing before with joins where one column is next to the other a union allows you to combine the rows
of data from Separate Tables or from the same table it's up to you but you do that by taking one select statement and using a union To combine it with another select statement let's see how this actually looks so what we're going to do is right after this select statement we're going to come here and say Union then we're going to go right below the union and we're going to do another select statement so we're going to copy this place it right here instead of the demographics table just for example we'll do the salary table
now if we look at the demographics table let's say we Want to take age and gender and let's go and take a look really quickly at the salary table and let's say we want to take first name and last name so we'll do first uncore name and lastore name now let's go ahead and run this and see what it looks like and let's pull this up so as you can see we have age and gender that's from the very first select statement and that's also the column names but then we have all of the data
For the age and gender and then below if we move this over a little bit we have the last name and first name from the employee salary table it's just down here now what I just demonstrate is that this doesn't always work for everything you can't just combine random data together because this is bad data we shouldn't have age and gender mixed with first name and last name really when you're using this you need to keep the Data the same so for us we should take the first I'll actually just copy this the first and
last name from the employee demographics as well and let's run this and now we have all the names from all of the tables now you may be thinking where did all the other data go before we had a lot of rows but now we only have a unique Row for each one well by default this is actually a union distinct and if you remember distinct is only going to take unique values so when We're doing this Union is going to remove all the duplicates and the first name and last name from salary overlaps a lot
with the employee demographics table so when we ran this the only one that's actually somewhat unique to one table is that in the employee salary table we have Ron Swanson whereas we don't have that in the employee demographics now if we wanted to show all of them without the distinct there is something called a union all if we Run this now we're going to get all of the results without removing any of the duplicates so if we scroll down we're going to have duplicates in here but we're just showing all of the results from this
table and from this table now that we know how to actually use a union let's look at a use case so let's go right down here and let's copy this why not and let's put it right down here now let's say in the employee demographics We wanted to take the first name and last name where the age is greater than 50 and let's run this there's only one person but let's label them let's add a label we're going to say comma old so this person is old and if we run this it says first name
last name and old and we can even call this as label and if we run this the label is old so Jerry git is the only old person in this demographics table now why are we doing this well the parks department Is trying to cut their budget a little bit they want to identify older employees that they can push out and they also want to identify high paid employees who they can reduce their pay or push them out to save money so we just identified someone who's older who we're going to want to try to
push out but let's in the same output find people who are also highly paid so now we can come down here we can say Union and let's do this like this I need to spell This right all right Union and let's take this we're not going to be using this exact same query but we actually need to pull from the salary table so the employee salary so we also want the first name and last name but let's say where their salary is greater than let's say 70,000 that's a lot of money if you're making more
than 70 uh for sure the parks department is going to try to get rid of you but for the label we're going to change it to a highly paid Employee now let's go ahead and run this so now we have Leslie nope and Chris Trager they're both labeled as highly paid employees Now 50 I think is just a little too low um if I'm being completely honest I think we need to change this and we should do a union and then add another select statement let's bring this down I think the 50 is too low
let's change it to 40 and let's add one more thing let's say And the gender is equal to male and then we'll go down here and say where the gender is equal to female because we want to separate this out so we want to know who's the old man oh that's actually old lady this is the female one and for up here where it's male we'll say old man so we have three different select statements using two separate unions we're selecting the first name and the last name in all of them keeping the data consistent
and Then in our third column we're labeling it either old man old lady or highly paid employee let's go ahead and run this and let's look at our output now you may notice something really quickly that Chris Trager and Leslie nope are an old man and an old lady and Leslie nope and Chris Trager are both highly paid employees so these people meet multiple criteria so let's actually order by and then we'll do first underscore name comma lastore name because we want to Order by these to see so let's run and now we can easily
see that Chris Trager is both of these Donna is just an old lady J's just an old man and Leslie is both an old lady and a highly paid employee so now we can send this to whoever we need to send it to to make sure that these people get looked at first so that our job is still secure the job market is tough these days you got to do what you got to do so that is how we use Union and let's just take one More look at it there we go go so this
is how we can use unions it's kind of a real use case I've done something very similar to this in my real job but you know this is just an example of how you can have multiple select statements all combined or combining the rows using a union hello everybody in this lesson we're going to be taking a look at string functions now string functions are built-in functions within my SQL that will help us use strings and work With strings differently now we're going to look at a ton of different ones they all have different use
cases but I'll try to walk through some of those as we go along but we'll look at a lot of different string functions in this lesson we'll start off with one that's really simple this one is called length so if we select then we say length and let's say we put in and I don't know why it's popping up like that let's say we put in something like sky or Skyfall or Really anything if we run this it's going to give us the length of how long this string is so if I come down here
we say select everything from employee unor demographics and let's add a semicolon here let's run this one right down now what we can do is we can look at how long each person's name is what we can do is just take the first name but then we'll also do the length of the first uncore name so if we run this now we get Leslie Tom Jerry Donna and it gives us The length of their name if we wanted to we could even order by this so we could do order by and we could just do
two for now and we can order by the length from the shortest name all the way to the longest name now one use case that I've used length for in my actual job was when I was working with phone numbers I wanted to make sure that they were exactly 10 characters long otherwise something went wrong somewhere in the daa cleaning process so I would go and Look at the length and I would make sure they're all 10 if anywhere 10 I would go and specifically look at those and try to clean those and fix those
up now let's go on to the next one and these next ones are pretty simple ones at least I think they're fairly simple we're going to look at upper first and it's doing the same thing as the other one we'll do upper and let's say we're going to do Sky if we select upper Sky it's going to give us an all uppercase Or we can copy this and we can do lower so now let me add semicolons otherwise it's going drive me crazy uh let's try this lower now it's going to do all lower even
if I make it all capital so if I say all capital Sky it's going to make it all lower so if we come back up let's copy this and instead of doing the length now we'll do upper let's go ahead and select this so we have Leslie and then we have the upper first name so upper all case Leslie now this is actually really good this is really really helpful especially with standardization is what I found a great use case for it because sometimes it'll be all capital Tom and sometimes they'll put it in as
T lowercase om uh and just making them all uppercase or all lowercase can help correct those really simple standardization issues within a single column the next one that we're going to look at is trim now there's multiple trims we have trim left Trim and right trim trim is basically going to take the white space on the front or the end and get rid of it which is really helpful so what we're going to do is we're going to come right here say select and we'll start off with trim and let me add a semicolon every
time so then we'll do trim and for our actual string we'll do something a little bit odd we'll do some spaces and then we'll do a sky and then we'll add some spaces now let's run this and add our semicolon That's going to be the end of me in this lesson just adding semicolons now it fixes it completely now what if we don't add sky at we'll just keep it like this well you can see that there's spaces before and there's spaces after but that's what trim does trims gets rid of the leading and the
trailing white spaces now if we come up here and we just do the left trim it's only going to remove from the left hand side so we're only getting rid Of the left-and side white spaces this right hand side as you can see is really long it's still there and if we do our trim we go ahead and and run this one it gets rid of the white space on this side but it doesn't get rid of the left space on this side now let's keep going we have a lot to cover still we're going
to move on to what I think is probably my most favorite string function if I'm allowed to have a favorite string function and that's substring but I'm Going to kind of work us into substring a little bit by looking at two smaller functions which is left and right so let's select everything and we'll do that from the employee demographics again add our semicolon now I'm going to run this now I want to get the first name and I'm going to do left of the first underscore name just like this now when you're using this is
actually going to be an error let's see if I highlight Over this if it'll tell me what the error is it says the parenthesis is not a valid position they're expecting something else and basically what they're telling us is that this is not how it should be written we're looking for a different value that different value is actually a number we're going to do comma and let's do four that's what it was looking for it didn't want this at the end it needed this comma four and what we're actually specifying Is how many characters from
the left hand side do we want to select so we're selecting the first name and we're going from the left four characters let's go ahead and run this and so we have Leslie Tom April all the way down you can see that there's only four characters in each one so someone like Chris the S is no longer going to be there because we're only looking at the first first four characters now we can do the exact same thing and let's actually copy this Uh down here so we'll come we'll go like this make this a
little more professional and we'll do right so now we'll do right if we do the right four it's going to go from the right hand side of the string and go left four so we're looking at the far for most right characters now this can be useful in certain instances but if I'm being honest I don't use these that much for the most part I'm pretty addicted to using substrings I love substrings I Think they're fantastic and let's look at substrings like this so a substring is going to allow us to do a few different
things let's do first underscore name the second thing that we put within this function is the position that we want to start at so let's say we want to start at the third position and then we specify how many characters we want to go so with this we specified four but let's just do two so now we're going to the third position and we're Going over to the right two characters let's go ahead and run this so with Leslie we get SL so we go one two and three we start at the third position and
then we take two characters the S and the L I have found this one to be extremely extremely useful let's take this for example let's do comma let's do birth uncore date and let's run this I'm keeping everything in here although it might be a bit much let's say we have this birthday and this middle column is The month and we'll running some you know query we want to find the month that everyone is born so we can do that very easily using substring and we wouldn't have been able to do this very easily using
left or right so now we're going to take this birth date we're going to use this substring and we want to select these middle characters so what we need to do since it's all standardized we do 1 2 3 4 5 5 six we start position six and we want to select One and two let's go ahead and run this and now we've pulled out all of the months so we can say as birth uncore month and now we could save that put it into a temp table add it as a new column in our
table whatever we want to do uh but now we have this information that we desperately desperately wanted to know so that is left right and substring again substring is it's fantastic now let's keep going the next thing that we're going to take a look at is replace Now replace will replace specific characters with a different character that you want so let's actually copy all this right here because I don't want to keep writing this out and we'll say select everything now what we're going to do is we're going to take the first underscore name and
then we're going to say replace and then we'll also do the first uncore name but we can specify what we want to replace and then what we want to replace It with so we have two more parameters that we need to put in this function so let's say a and let's replace it with a z let's just see what that does let's go ahead and run this and so now when we see the letter a and we are specifying a lowercase a like Mark that is replaced with a z so that's really all replace does
specifies what you want to replace and then what you're going to replace it with now let's take a look at the next one and we're going to take a look at a Function called locate so if I say select and let's do locate I'm going to give it a string I'll say Alexander that's my name and I'm going to specify what I'm looking for so let's close this parenthesis the string that we're actually looking for comes first so what we're going to do is I'm looking for the letter X in my name so we'll do
X and Alexander let's go ahead and run this and it tells us that it is in position four so we have 1 2 3 and four that's Where our position is that's where it locates that sequence that we're looking for now if we pull this down here place this right here and we'll change this locate now let's say we're still looking at the first name but we want to locate people that have an a n like this in their name let's go ahead and run this and we get zeros for everybody except for and and
Andy so this might be something where we put it into a CTE or a temp table then we can filter Down Based off of these results to where it only equals one now the last one that we're going to take a look at and let's go right here we're going to do first name lastore name now this one is super super useful because what we can do is have a concatenation of multiple columns so let's go down right here so we have first name and last name but if we come down and we say concat
we can then combine these columns into one single Column so we'll do concat and we'll do first underscore name and then comma lastore name and if we run this it's going to be Leslie and nope combined into Leslie nope now this doesn't look perfect right we don't want it to look like that all we have to do is come in here and we could do a little space so we'll add a space in there and if we run that now we have Leslie and nope and we could call this as full uncore name and this
is something that I've done a Million times in my real job where there's multiple columns we want to create one column out of it or take two columns and create one column happens all the time so this concat is really really helpful to combine those columns really quickly hello everybody in this lesson we're going to be taking a look at case statements in my SQL a case statement allows you to add logic in your select statement sort of like an if else statements in almost all other Programming languages or even things like Excel let's see
how this actually works so let's bring this down and let's take this employee demographics table and let's take the first name and let's take the last name and let's add a case statement how we need to do this is we have to say case so that's going to signify that we're starting a case statement and then I'm going to go over here and say tab so this is where our Logic comes into play so I'm going to say when the age let's say is less than or equal to 30 then so I'm saying if the
age is less than or equal to 30 then what's going to happen we'll just keep it really simple for now we'll just say that this person is Young and then if we want to end the case statement we'll come down here and say end so this is a complete case statement let's go ahead and run it and let's take a look at the output so we have the first name we have The last name and then we have this case statement right here and if their age is less than equal to 30 they're young let's
actually add the age right here just so we can visually see that as well so we have the age so this person is the only person who's under or equal to the age of 30 that's April so she has a label of young the great thing about case statements is you can add multiple when statements so we can come down here and say when and then we could do Something like when age and maybe we'll say between so I don't know if in previous lessons we've looked at between but between just says between this number
and this number so we'll say between 31 and 50 if they're between 31 and 50 well good night uh that person is old so we're going to have it just like this we're going to run it and now we have a lot of people who are old these are all people between the ages of 31 and 50 but we still have more People outside of the age of 50 or older than 50 so we could do when the age and now we can say greater than or equal to 50 and we're going to say then
and then we're going to say on death's door uh because good night if you're over 50 my parents are going to love me for this one so let's go ahead and run this and then if we look at this we have on death store right there now this is huge this is massive so let's actually name this and we'll just say as At the end of end so right after end we'll say as a agage uh bracket and let's run this and this looks a lot better so now we have this age bracket just signifying
kind of where people are at and most people are quite old or Jerry uh you know can't catch a break that guy now let's go down and let's take a look at a different table so let's select everything we'll do from employee uncore salary now that we have our employee Salary table here is the scenario that we are given the pony Council sent out a memo of their bonus and pay increase for end of year and we need to follow it and determine people's end ofe salary or the salary going into the new year and
if they got a Bonus how much was it so the first thing we need to do is we need to get the pay increase and bonus and their pay increases look like this so if they made less than 50,000 then that equals a 5% raise very Generous and if they made greater than 50,000 that equals a 7% raise very very generous lastly if they work in the finance department that equals a 10% bonus just cash that goes into their bank account very very generous but only the finance department gets it so these are the guidelines
that the pony Council sent out and it is our job to determine and figure out those pay increases as well as the bonuses so let's come right down here we're going to have our salary Employee I actually want to be able to see these let me pull this up just a touch there we go so we want to be able to write this out so first thing we should do is just select the columns that we need first name last name probably salary as well and now what we can do is determine this first one
which is if they make less than 50,000 they get a 5% raise so let's say case and I'll also add end in here and we're going to say when their salary is less Than 50,000 what's going to happen then we say then salary so we're taking their initial salary and we're saying plus and we're going to do salary times 0.05 and if we run this should work let's pull this up really quickly so April lgate she made under 50,000 so she got a raise and her new salary is 26,2 250 we can actually call that
we'll say uh as pcore actually let's do newcore salary that's their new Salary and let's run this so the new salary is 26,250 Andy DWI is now making 21,000 now this calculation you can do it different ways we can do it exactly like this or we can just do times 1.05 should be the exact same thing uh just however you'd like to write it out it's just you know adding it or multiplying it by this so let's take this and now we're going to say when it is greater than 50,000 so let's say Greater than
50,000 they get a 1.07 so this is the 7% increase this is a 5% increase this is a 7% increase and let's run this and let's put this up here so now if they made greater so 50,000 that's 75,000 they got a 7% increase now unfortunately we did not make the rules the pony council did and the people who made exactly 50,000 unfortunately were not part of those brackets uh and that just wasn't up to us we couldn't control That so unfortunately Tom haford and Jerry gach just didn't get raises this year and that's not
our fault okay that's not our fault now the next thing that we need to do is determine the bonuses now let's come right back up here really quickly and let's just copy this because what we need to determine is how we know that somebody is in the finance department because if they're in the finance department that means they get a 10% bonus that's really Important now it's not in the employee demographics don't have anything about the department but if we look in the salary and we run this we do have the department ID now let's
open up and let's pull this up right here we'll look at the parks department and in the parks department here we go the finance is Department ID of six so if we're looking at the salary there's only one person who's in Uh Department ID equal to six so what we can do is another case statement we can say comma we'll do case and n and we'll do another one we're going to say when DTI so when the department ID is equal to six then we're going to give them a bonus so we're going to say
salary times10 and we'll call this as bonus let's go ahead and run this and let's pull it up so he gets a $7,000 bonus this year that's Ben Wyatt Uh because he was part of the finance department that just did uh an exceptional job this year apparently according to the pony Council so that is how case statements work they're really powerful really useful I honestly use them quite often and they're just a way to really add some logic and some you know labeling or even do calculations like we did right here with the salary hello
everybody in this lesson we're going to be taking a look at subqueries In MySQL now subquery is basically just a query within another query we can do this in a few different ways and I'm going to try to show you a lot of the different variations within this lesson the first way that we're going to use a subquery is in the wear Clause then we'll take a look at the select in the from clause also let's take this demographics table that we have down here what if we only wanted to select the employees who worked
in the actual Parks and Rec Department well we could do that if we had a few joins we have this salary table and one actually represents that they work for the parks and wreck if we come over here and we open this up we can see that parks and wreck is the department ID of one so we do have that option we could just join these two tables together but sometimes we don't want to do that and we'll use a subquery let's see how it works in the wear Clause so let's go ahead and get
Rid of this so what we're going to do is we're going to say select everything from employee demographics where and now we want to pull because this is the salary table we we want to pull employee IDs where the department ID is equal to one but remember we're querying off of this table so let's actually pull this up this is what we're working with so we want to say where the employee ID that's referencing this column in the demographics table is in What we're going to do is we're going to do a parenthesis here and
we can even come down and put a parenthesis down here so what we're going to do now is write our query which is our subquery and this is our outer query so now we're going to write an entirely other query within this so we'll say select and now we're going to say employee ID and let's just bring this over I usually have it something like this and I'm going to try to bring this down a little bit so Select everything and then we'll do from then instead of employee demographics we'll do employee salary and let's
just form about this a little better so select the employee ID from employee salary and remember we wanted to do where the department ID is equal to one now let's bring this back up and this is what the query is going to look like now just by itself let's run this subquery or this inner query when we run this it's going to create This list of just employee IDs where the department ID is equal to one so when we say where the employee ID from the employee demographics table is in it's going to try to
match those employee IDs to this list of employee IDs so just remember 1 2 3 4 5 6 and 12 let's go ahead and run this entire query now we have 1 3 4 5 6 12 if you remember from previous lessons the two is Ron Swanson and he's only in the salary table so since we're doing just The employee demographics table he's not in here so what we're doing is we're selecting everything from the employee demographics where the employee ID in this table matches or is in the select employee ID from the salary table
where the department ID is equal to one in essence this is what a subquery is it's a query within a query now what would happen if we have the employee ID but we also wanted to say the department ID because we just wanted to view this Let's go ahe and try to run this we are going to get no out output and we're going to get an error that says operand should contain one column the operand referring to this entire thing right here because this is an operator so this is our operand and we're returning
two columns in here which is saying we cannot do we have to only have one so now if we run this it works perfectly well and let's bring that down now we can also use the Subquery in a select statement so let's take a look at that next let's go down here and let's say we want to do select everything from employee salary and let me spell that right let's say we want to look at all the salaries just like how we have it now but in a column next to it we also want to
compare it to the average salary for everyone so we'll be able to see you know whether somebody's salary is above average or below average so what we Would try to do potentially is do something like uh first name salary and average salary and we Tred to run this and of course we're going to get an error it's going to basically tell us that we need to group bu if we're doing this so let's go back down and let's actually add that group bu and we'll say Group by first name and salary and we'll look at
this output and this is not looking good at all it's just looking at the average Salary for each unique row which is Leslie 75,000 so the average is 75,000 this is not what we're looking for this is not what we want here's what we really do want we want to just take the average salary of this entire column regardless of group buy or anything else so let's get rid of this and let's see how we can do that so let's come right down here we're going to say select select the average salary and then we're
going to say from at our Parenthesis because this is our subquery from the employee salary table just like that now if we run this we should get the exact output we're looking for so the average salary is 57,200 and we have our salary right here so we can compare really quickly just like that and we can also use a subquery in the from statement so let's go down here and let's say select everything from Employee unor demog Graphics let's have it autocomplete for me so we have the employee demographics table now let's create a group
by based off the gender column it add some aggregated functions and I'm going to show you how you can use this as a subquery so let's go up here say gender and then we'll go ahead and add our group by so we'll say Group by gender as well now let's add a few things we'll do average we'll do average age and then we can do let's just do all Of them based off the age we'll just do AG age men of age and count of ageo when I try to write fast it doesn't always go
right so we have this let's run this and this is what our output is going to look like now what if we wanted to get the average of the oldest age or the average of the smallest ages or you know see what the average count is for males and females well we can't do that Given this table but let's do something right here select everything and we're going to say from and in our from statement we're going to have a parenthesis we're going to paste our select statement and then close the parenthesis so we're going
to select everything from this output that is right down here so if we run just this we're going to get an error and forgot this was going to happen but every Drive table must have its own alas so you have To name a table I forgot it does that all we have to do to fix this is just name it so we'll say as and we'll say aggregated table we'll just call it aggregated table so let's run this and we get the exact same output but here's the neat thing is we can now select we
can do gender and these are actually the column names now so I can do the average of this column right here but I can't do it just like this because it's going to Give us an error and I'll show you why in just a second says unknown column age in field list so what it's saying is is we're trying to perform an aggregated function on the aggregation of an age column but we don't have an age column in our table right here let me run this again we have a column named this exact thing so
we actually need to do is do this back tick and back tick this is the actual name of the column it's not an aggregation anymore the back tick on my Laptop is right above the tab on the far left hand side um right under the Escape that's where mine is uh so these back ticks it's not a quote like this it's a back tick so you just need to find that on your keyboard but now if we run this it looks like we encountered another error says in aggregated query without Group by that's right now
we need a group by so now we need to group by gender sometimes you got to figure this out on the Fly and it should work There we go so now we can perform aggregations on this table now this doesn't actually work perfect because we're still grouping by the female male but let's get rid of this for a second and we'll get rid of this groupy entirely and if we run this we're now looking at the averages of this column right here max age now when you're doing something like this it's actually really smart to
rename these we'll say as average Age we'll say as maxor age and it makes it so much easier you won't have to do these back ticks anymore um and as Minore age and so on and so forth and I would probably format this better and stuff like that we don't have to go through everything right I'm just kind of giving you an example but then when we're using this table these columns are actually named this so I don't have to do these back ticks anymore I can just take this whole Thing oops get rid of
that back tick now I can just take this column because this is the column name so let's go ahead and run this and it's still going to work perfectly so this one's pretty cool because you're basically creating this kind of like a temp table um you're just creating your own little output then you can query off of it and you can do you know more advanced calculations this way it's actually really useful but there are better ways to do something like This uh like a CTE or a temp table that we'll look at in the
advanced series but this is at least how you can do it and you can actually try it out using subqueries so hello everybody in this lesson we're going to be taking a look at window functions now window functions are really powerful and are somewhat like a group by except they don't roll everything up into one row when grouping window functions allow us to look at a partition or a group but they each keep Their own unique rows in the output we're also going to look at things like row numbers Rank and dense rank at the
end of this lesson so before before we jump into writing a window function and seeing how the syntax Works let's actually write out a group ey and then we'll compare the two when we actually do write the window function let's say we want to take this demographics table and we want to take this gender and compare it to the actual salaries so What we actually need to do we need to say join and we're going to join on the employee salary let's go like this get rid of all of this and we'll do salary and
we're going to say on and and let's do demm and Sal for the aases we'll see dm. employeeid is equal to sal. employeeid now we're going to come up here and we're going to say gender comma and we want to look at the average salary and we need to get rid of this Right here and we need to come down to the bottom and say Group by gender now let's go ahead and run this query see if it works and it did so we have our gender and we have our average salary from our salary
table and we can rename this as average and we'll do average uncore salary just like that so this is how groupy works it rolls everything up into one row now let's try doing something pretty similar except we're going to use a window function let's Come right down here and let's paste this and let's start writing out our window function now we don't have to use use the group ey we're going to go ahead and get rid of that and right here for gender we can keep that the exact same all we're really going to change
is this part right here we're going to say average salary and that is part of creating a window function typically with a straightforward window function all we have to put is over with a closed Parenthesis this is going to say we're looking at the average salary over and normally in here you'll specify something and we'll get to that in a little bit but we're just going to look at an average salary over everything so let's go ahead and run this output so this is going to look a little bit different right so the male and
female all have their own individual rows which is not the same as group ey and this average salary is looking at the average Salary of everybody we're not breaking it out by the gender like we did up here here we rolled it up now we're looking at the average salary for the entire column now what we can do do is actually Partition by now Partition by is going to separate it out kind of like grouping it so let's say partition Partition by and we'll say gender so just like when we did the group by the
group by rolled everything up into one row this is not going to Roll everything up but it is going to perform this calculation based off of the different genders the unique values in this column let's go ahead and run this and if you'll notice the female is 53 ,750 the maale 574284 name so we'll do dem.or name we can do last or dem. lastore name so we can add other information and it doesn't affect this column at all because we're using a window function if we try to add these Exact things I'm going to go
up here and do it if we try to add these exact things to this let's see if um yeah that works and then we also have to group by this if we run this query now it's going to be completely different because we're using a group by we're grouping by the first name the last name and the gender we're breaking everything out based off of the unique values in these columns whereas down Here it's completely independent of what's going on in these other columns all we're doing is we're doing a window function just based off
of that column so I think that's pretty amazing and there's a a lot of additional functionality that we can do with these window functions and we're going to take a look at a lot of those things in just a little bit let's try another example really quickly let's literally just copy this paste it down Here and all we're going to do is we're going to change this to sum so now instead of the average salary we're looking at the sum of salaries and we're still partitioning by the gender let's go ahead and run this and
let's pull this up so all the men together make $42,000 all the females make $215,000 now what we're about to do is something called a rolling total if you've never heard of a rolling total a Rolling total is super cool and can be done within my SQL a rolling total is going to start at a specific value and add on values from subsequent rows based off of your partition so all we have to do is add an order buy and we're going to order by let's say the employee ID let's go ahead and take a
look at this and it looks like the employee ID is ambiguous I had a feeling so I just need to say uh dem. employee ID let's try this one so now we have Something called a rolling total I'm I'm going to actually name it as rolling unor total because this is super cool that window functions can do this and this is something that a lot of people in like Finance do I did it myself when I worked in healthcare and it partitions based off the female and you can't see the employee ID but there's an
employee ID that we're kind of ordering on in the background now what it's doing is it's starting with Leslie nope and she made 75,000 then the next person April she made 25,000 which equals 100 and just to actually see this better I'm going to add salary and so Leslie no had 75,000 then we're adding this 25,000 to the 75 and we get 100 then we're adding the 60,000 to 160,000 then we're adding 55,000 to 25,000 so we're adding every single time we're adding this salary to the already existing total all the way up to our
grand total which was 215,000 the exact same thing happens with the males so we start with 50,000 then we add 50 then we add 90 then we add 70 all the way up to 402,000 now you you can do this in a lot of different configurations on a lot of different columns but in essence this is exactly what a rolling total is that's how it works and we were able to partition based off of this column we don't have to use Partition by we could do this completely regardless of the partition But I thought it
was interesting to at least Break It Out by female versus male so now that we know how to use a window function let's look at some special things that you can really only do with window functions or window like functions so we're going to bring this down and what we're going to do is get rid of this entire thing and we're going to look at something called row number then we're going to look at Rank and then we'll look at dense rank so let's Look at row underscore number and this is just like an agregate
function like we doing the average age or average salary or something like that this is what we're doing we're doing a row number now we're going to do this over and we'll just do everything for right now so let's go ahead and run this and just see what it looks like let's bring it up and what we're doing is we're saying okay we have first name last name gender salary that's all great but then We get to row number and we're doing a row number based off of everything it doesn't matter what it is so
we're starting at one which is the very first row and we go all the way down to the bottom just like an employee ID so let's actually add that let's do dm. employee ID just like this so we have this 1 2 3 4 5 6 7 8 9 101 now if you remember on this table we are missing Ron Swanson so it kind of Skips that but it's basically like an Employee ID we're kind of giving it its own unique value and these row numbers are not going to repeat itself if you do it
like this now they can repeat themselves if we do a partition and let's do a partition on the gender again because we know how to do that one we'll do Partition by I spell that right Partition by the gender now we're going to add a row number based off the gender but again it's broken out or partitioned By gender let's look at this now it goes for the females 1 2 3 4 then for the males it restarts 1 2 3 4 5 6 7 now this is just in a random order based off how
the you know data was stored in the table itself now what if we wanted to kind of rank these based off of the highest salary first down to the lowest salary you nailed it we just add an order by will order by salary and if you want to do it from highest to lowest giving the highest salary the Number one and the lowest salary you know later down we'll do descending and let's run this and you'll see that for female we're still partitioning by gender for female the highest salary is one next is two three
and then four then for males the highest salary is one all the way down to seven so that's what row number does it just gives a row number based off of whatever your partition by or ordering by in your window function now let's go over here And add a comma and we're going to add and let's go down just a hair let's add rank so we want to do Rank and we'll do our parenthesis now rank is going to give it more of an official Rank and let's see how this works so we'll do Rank
and we'll do over Partition by salary descended the exact same thing and while we're here I'm going to rename name these I'm going to say as as row num and we'll call this one uh rankor num so let's go ahead and run This and it looks very very very similar except for one small thing this right here so when we're using the row number whatever we are partitioning by it's not going to have duplicate rows within that partition it just won't so even if there's 50,000 right here it's just gonna automatically assign it based off
of something that is running in the background whether it's the order of how the data is stored in the table or some other order by that you are using on the Table now rank is a little bit different because rank is going to take it just like it did the ronom except when it encounters a duplicate based off of the order bu which is the salary it's going to assign it the same number so this is five and five what's unique about rank is that the next number is not going to be the next number
numerically it's going to be the next number positionally so this is 1 two 3 four five this is kind of like a six and then it goes to Seven so it skips number six now there's another one let's copy this rank there's another type of rank called dense Rank and we'll do denore rank so we'll do denore Rank and let's run this and let's pull this up there we go now now dense rank is ever so slightly different than rank in the fact that when it gets down to duplicates it's still going to duplicate them
so it's going to have a five and a five but it's going to give the next Number numerically not positionally that is the only real difference between Rank and dense Rank and again row number is just not going to have duplicates it's going to give it its own unique within that partition so I know I just threw a lot at you but that's row number rank and dense rank in a nutshell and you can review this mess around with it all of these things because you know these are actually really really useful hello everybody and
welcome to the first Lesson in the advanced MySQL tutorial Series today we are going to be looking at CTE now CTE stand for Common Table expression they're going to allow you to define a subquery block that you can then reference within the main query now that may not make perfect sense but we've looked at subqueries in the past or in previous lessons in the intermediate Series so you kind of understand that it's kind of like a query within a query except we're going To name this subquery block and it'll be a little bit more standardized
a little bit better formatted than actually using a subquery let's take a look at the basics of writing a CTE let's pull this down really quickly and all we're going to do is we want to create this as a CTE so we'll say with and that is our keyword to Define our CTE so we're going to say with and then we're going to name our CTE and we'll just call it CTE and we'll do underscore example and then We're going to say say as so this is how we Define it and now we need to
actually put it in parentheses now you can do this in several different ways I'm going to do it kind of like this just to really emphasize that this is within the CTE now CTE are unique because you can only use the CTE immediately after you create it so if we come right down here and we come right below it we'll say select everything and we're going to say from CTE example so we'll say from CTE Example and let's bring this back up now if we run this we're going to get the exact same output now
this should seem pretty familiar almost like we're using a subquery and within our subquery we have this right here we're kind of building our own little table and then we can query off of it down below so we can come down here and let's actually change the names in here we're going to say average uncore cell and we'll change all of these real quick Just because don't uh I don't like having to actually put the you know uh the tick marks I don't like doing that so here we're going to say Max then we'll say
Min and then we'll say count and let's go ahead and run this again and so now we have these different names and when we come right here we can say select and then we'll just do something really simple we'll just do the average of aage average uncore cell So the average salary and let's run this and so this is the average between both the males and the females kind of the purpose of these CTE is to be able to perform more advanced calculations something that you can't easily do or can't do at all within just
one query another reason to use a CTE is just the readability you can absolutely write this using a subquery and let's do that really quickly and it's just going to be a little bit tougher to read and look at So let's come right up here we're going to say from and we'll do right here we'll say select everything we'll do select average s from here and we're going to need to name this so we'll say um do example unor subquery we'll get rid of this and then we just need to get rid of this and
we can run this query and we get the exact same output now if I formatted this Exactly the same just like this and the names down there if we look at this the syntax is just a little bit more difficult to read we're selecting the average of average s from and then we have our subquery right here and then we're naming it at the bottom if we scroll up and compare this this one just looks a lot better now when you're writing in MySQL sometimes it doesn't matter if it looks pretty or not as long
as it gets the job done that is true Especially if you're just going to be using it yourself but in a more professional environment when you're using this in your actual job there going to be people who've been using this for 10 20 years and they're going to expect you to write it well they don't want it to be really messy they aren't most likely going to want it to be written like this I've been using it for quite a long time and I much prefer CTE over subqueries just visually and it Makes it a
lot quicker to actually read through so that is just one of the reasons although you get the exact same output now there is some additional functionality within CTE as well now one thing that I mentioned just a second ago is that when you build a CTE you can only use it immediately after you can't use it right below it so let's go ahead and let's copy this query and we're going to bring it right here if we try to run this and let's do this we're Going to get an error and let's pull this up
it says table Parks andre. CTE example doesn't exist so we're looking for a table called CT example in our database but it's not there now the reason this happens is because you're creating a CTE you're not creating a permanent object like a temp table which we'll look at in the next lesson and you're not creating a real table and you're not creating a view you're really not creating anything it's just a Common Table expression to create this table right here this basically almost like a temporary table almost but then you're just using it to query
off of it you're not saving it you're not storing it in memory you're not really doing anything with it it's it's just like writing a regular query so this is why you can only write it immediately after creating the CTE you can't write it down below and reuse it because it's just like calling a query that you wrote before it Just isn't going to work now the next thing that I want to take a look at and let's copy this down here next thing I want to take a look at is that you can actually
create multiple CTE within just one so if you wanted to do a more complex query or joining more complex queries together we can do that all within one CTE so let's come right here let's get rid of all of this and we're going to say uh from the demographics table we're going to say Where birth underscore date and let's just do is larger than 1985 D1 D1 so we have one query and we'll take just a few columns from this table so we'll take let's say the employer or employee ID we'll take the gender and
the birth underscore date so this is one query and we're filtering just based off of this birth date now when we create this this is the CTE example but we can have a comma here we come down below and then we can say Ctor example 2 and I need to combine that so two and then we can say as and then we have another query so then right here here we could say select everything we'll change that in a second from employee say salary and in the salary we'll just do a simple one we'll do
where salary is greater than 50,000 and we'll actually just take the employee ID and the salary now if I come right down here I can say select Everything from CTE example which is and let me scroll up so we can see everything that's our original our CT example is this first query right here then we're creating our second one right here and we can join basically on these two common table Expressions so now we can say join then we'll do ctor example 2 and Need change that X and then we'll say on then we're just
going to do TT example employeeid is equal to DG example 2. Employeeid and not an equal sign but a DOT there we go now if we run this it should work and we can pull this down and look at our output now this is just an example this isn't a real use case because of course we just join these two tables together normally but you can imagine you have a much more complex query or you're doing a lot of functionality within this table and you just want a certain subsection of this table and you're wanting
to combine Those this is how you can do that with a c so now we have all of our information right here and that can be extremely extremely helpful now one last thing that I want to show you we're going to go all the way back up really quickly right here let's run this one one more time and let's actually take everything and let's run this so here we have our G average salary Max salary men salary and count salary the last thing that I want to show you and this is more Of something that's
just somewhat helpful you don't have to actually do it in your main query is before we went in here and we changed all the column names by doing an alias by saying as and then saying the average salary and the as is just implied here but we're changing this via an alias we don't have to do this in fact we could come right here and we could do a parentheses we could call it gender we could call it it average uncore salary maxcore salary Minore salary and let's do count uncore salary so now if we
were to run this let's change this up we'll do capital on this one if we wanted to run it like this when we run this it'll change all of those names to what we have it right here so this will be the default this will overwrite the column names that you have in your actual CTE expression or the query that you have within your CTE so that is all we're going to take a look at in this lesson on CTE these are Very very helpful definitely help with more complex queries and they're just really easy
to read and understand which is why I personally use them a lot hello everybody in this lesson we're going to be taking a look at temporary tables now temporary tables are tables that are only visible to the session that they're created in so if I create a temp table right now and I exit out of my SQL and I come back back in it's not going to be there anymore and we'll look at that in Just a little bit now temporary tables can be used for a lot of things but how I've mostly used them
especially as a data analyst is for storing intermediate results for complex queries somewhat like a CTE but also for using it to manipulate Data before I insert it into a more permanent table so let's take a look at how we can create a temp table there's two ways that you can do it I'll show you the first way which I don't think is as popular and then I'll show You the second way which is how I typically use at the most now the first way to create a temp table is to create a temporary table
I need to sound it out like that it's the only way I can spell uh so we're going to do tempore table so this is our name now if we just took this out and we created a table this would create a table in our Parks and Recreation database but we don't want that we don't want to create a temporary Table that just lives inside of our memory or the memory within our computer now we're going to create this temporary table much like we would a regular table and we're going to need to name the
columns as well as the data types so let's do first underscore name and our data type can be varar let's say 50 and we'll do a comma then we'll do lastore name we're going to keep this really simple we'll do varar 50 again and then for our last one we'll do favorite unor Movie and for this one it needs to be longer so we'll do varar let's say 100 now let's get rid of this and let's actually run this after we do our semicolon let's actually run this and nothing's going to happen let's click refresh
nothing's going to happen at least you can't see it happening let's pull this up and you can see that create temporary table Says Z is affected but it was created now in order to actually see it we can do select everything and We'll do this from our temp table and we'll add a semicolon and then we run this and we have this empty table right here now what's really great about these temp tables is then you can insert data into it and it basically is like a real table except it just lives in memory and
they go away after a while but you can reuse this temp table over and over and over again now let's insert some data into here and then we'll take a look at this again so let's come right down here Let's insert data we'll do insert into and we want to insert that into the temp table and we're just going to say values now we just say values I'll use myself for this one we'll do Alex freeberg and what's my favorite movie give me a comma that'll be Lord of uh I think it's like that Lord
of the Rings The Two Towers uh it's probably my favorite movie of all time now let's go ahead and insert this data and let's pull this down here and let's run it all the way Down here after we add our semicolon and when we run this you'll notice that now we have data in here so now we can use this table much like any real table so that's the first way to create a temp table not my personal favorite way although there have been some use cases where I've done it like that I'm going to
show you the way that I typically do it and for this let's select everything from the employee salary Table let's run this now let's say I just wanted a subsection of this data to sit in this temp table where the salary is greater than let's say 50,000 I could easily easily do this I'm going to say create temporary table and let's do this one as salary overcore 50k now one thing about naming either temp tables or CTS or subc or any of these things where you need to name something I try to typically name it
something that actually makes sense so The salary over 50k is something I would actually name it in my real work I wouldn't normally name it something like temptable the reason for that is because when you're in a work environment you have lots of temp tables you're creating really Advanced story procedures really Advanced queries you have hundreds or even thousands of tables and different databases it gets really complex so naming conventions are actually pretty important so they become more important Uh the more you get entrenched in this stuff so just something to think about now we're
creating this temp table now we don't have to really insert data into it more than we're just going to select data from an already existing table so I'm going to say select everything from I'm going to say employee salary then we're just going to say where the salary is greater than 50,000 now I want uh Tom and or Jerry I want them to be included as well so I'll actually say Greater than or equal to so now we're creating a temporary table based off of an already existing table and we're just selecting data into this
temporary table so when we run this now we can select the salary over 50k and let's run this and it works perfectly now the great thing about temp tables is they last as long as you are within that session meaning if I copy this query let's go to a new window and let's paste this in here let's zoom in a Little bit and let's run this it still works even in a new window but if I'm to exit out and come back in then it is no longer going to be working now let's exit out
of this let's come back in and we'll see if these temp tables still work let's go ahead and exit out oh jeez I'm embarrassed all right let's go to my SQL let's come over here to the local instance so now it pulls right back up zoom in once again on both these and let's try to pull up our salary over 50k Temporary table let's run this and we're not getting an output let's go back it's going to say error code the table salary over 50k does not exist so it only lasted as long as we
were within this session so that is how we create our temp tables and that's how we use our temp tables now in the last lesson we had looked at CTE CTE and temp tables both have their own use cases within my SQL for temp tables this is usually for the more advanced things so I'm usually Using these in store procedures when I'm really manipulating data and I'm doing a lot more complex queries overall and often times I'll use multiple temp tables and I'm joining them together and I'm just doing a lot of more advanced stuff
with CTE it's typically more simple things because you can't make as advanced CTE or as complex CTE so with those I'm usually keeping it to just one level of transformation I have my base CTE or my base subquery or query however You want to call that and I'm changing it or doing one level of advanc thing on top of that query that's what it see CT is really great for temp tables you can just get a lot more advanced with it they also last within the session and if I'm using it multiple times throughout something
like a store procedure then it makes so much sense to use a temporary table hello everybody in this lesson we're going to be taking a look at stored procedures stor procedures are a Way to save your SQL code that you can reuse over and over again when you save it you can call that store procedure and it's going to execute all the code that you wrote within your store procedure it's really helpful for storing complex queries simplifying repetitive code and just enhancing performance overall so let's take a look at how we can create a store
procedure now we're going to start by just creating a really simple query we'll make it a little bit more advanced As we go along and take a look at the different things within store procedures that you can do now let's change this query let's say where the salary is greater than let's do 50,000 let's actually do greater than or equal to 50,000 we want to include Tom uh and Jerry as well so let's go ahead and run this now what we want to do is save this really complex code within a store procedure let's come
right down here and we can create a super super super simple Store procedure by just saying create procedure and pasting that now we just have to name it so we have create procedure and we'll call this large uncore salaries and then we do a closed parenthesis now this is as simple as it can possibly be it does not get any simpler than this so let's go ahead and run this and if we go down we pull this up you can see that it says create procedure zero row affected it looks like it worked and if
we come over here To this refresh button you should see now that under store procedures it drops down and we have our large salaries that's exactly what should have happened we wanted to save that into our Parks and Recreation now if you wanted to be careful you could say use parore andore Recreation this is not a bad idea but you don't have to but you can specify what database within your actual editor window sometimes that is helpful but now we've created it now let's see How we can call it all we have to do is
say call we're going to copy this entire thing including the parenthesis and let's end it with that's right a semicolon let's go ahead and run this and as you can see it worked because we got the exact output so we actually called this store procedure and this code ran so it's just a select statement so it worked perfectly now you can also come over here to large salaries and there's this little tiny Little button here that looks like a lightning bolt and if you click it it's going to open up a different window and we'll
say call Parks and recreation. llarge salaries so you can do it that way as well but uh we're not going to be doing it that way now what we've written right here is not best practice by any means and I'm going to copy this down here because there's a lot of different things that you need to take into account when you're creating a St Procedure for example this right here is most likely not what you're going to be putting into a store procedure this is super super simple typically you'll be having multiple queries and let's
see what happens if I try to put another query in here and let's get rid of this so we're going to select everything where the salary is greater than 50,000 then we'll select everything where it's greater than 10,000 which is everybody let's call this large salaries 2 so we Have two different statements in here and we want them all to be under this large salaries 2 let's select everything and let's run this and we're getting an output which is already not a good sign but we created the store procedure and then we selected everything so
what's actually happening here pull this back down what's happening is is this is creating the story procedure and this is just some other you know random query but that's not what we want what we want Is everything or both of these queries within one story procedure the best practice is to use something called a delimiter now this right here is a delimiter the semicolon so this semicolon separates our queries from one another it tells my SQL hey you know this is a different query don't be mixing these and cause errors you know that's essentially what
a delimiter does now we can change the delimiter by coming up here and saying delimiter and We can can change it to almost anything we want now in my actual job I've seen it done many different ways I've seen these forward slashes also seen dollar signs this is probably the one that I've seen the most when I worked with data Engineers data scientists database developers you this one I see a lot and then you'll come into the code and you'll say begin and let's go over here and let's tab all of this and then we'll
say end Now when we end we're going to end it with this dollar sign so here's what's happening we're changing the delimiter right here to dollar sign we're creating our store procedure and within it we are keeping all of this so all of this code is going to go into this one store procedure then at the end we are saying this is the end right here of this store procedure these semicolons no longer are the delimiter that's telling us when it is the end of the store procedure that's What the delimiter does now it is
best practice at the end to change it back right uh let me spell it right because if you don't then you're going to have to start using uh these dollar signs for everything and how do you spell delimiter oh man there we go now we've changed it back to a semicolon afterwards so then we can go and write other queries and it'll act appropriately uh let's go down so this is getting closer to best practice let's Go ahead and run this entire thing and if we pull this up we're not getting an output that's a
good sign if we pull this up it's saying we already created number two change that to three my apologies let's go down here now we've created the store procedure three now let's go over here we're going to right click on this we're going to say alter stored procedure and now you can see that we have both of these queries within the store procedure Let's get rid of this and we're going to go and call this so let's copy this large salary is three bring this all the way down and let's say call that's St procedure
if we run it you'll notice we get two outputs we have six and seven this result six is where it's greater than 50,000 or 50,000 uh or greater this one is where it's greater than 10,000 which is essentially the entire table now so far we've done Everything just by writing it all out and that's fantastic fantastic but you can also come over here to store procedures you can right click and say create store procedure now let's actually copy this we're just going to create the exact same thing we'll create store procedure and we can just
paste this in here and let's go ahead and do that there we go and sure we'll call it new procedure why not and if we say apply you'll notice that it generates This script right here and we can apply it and we can create it we will in just a second but let's take a look at it so we're going to use ARC and Recreation that's what I was mentioning before we're then going to say drop procedure if exists now this is something that I was going to show you later but I'll just show it
to you now sometimes it is really beneficial to write something like this before you create it in case you've already created a story procedure With that name that you're wanting to replace so it's checking if it's there and if that new procedure is already there it's just going to drop it then it comes down and let me see if I can zoom in on this and then it's going to create our delimiter which it uses dollar signs so my SQL is even you know validating what I was saying earlier we're going to use Parks and
Recreation again and now again we have to use instead of a semicolon We're using dollar signs then we're creating the procedure which is new procedure we're saying begin end and then it's even changing the delimiter back so basically everything that I said this is kind of doing it for you automatically now when I click apply it went ahead and executed that SQL statement and our new one is ready so we can go ahead and alter that store procedure and it looks exactly the same as this one Out here which was uh large salaries number three
so it looks exactly the same now let's go ahead and get rid of this get rid of this and let's go down below the next thing I want to take a look at is something called a parameter now before I actually get into this I'm going to copy all this down here because I don't want to rewrite all of it I'm being honest so let's paste this in here now parameters are variables that are passed as an input into a store Procedure and they allow the store procedure to accept an input value and place it
into your code let's take a look at what that actually means now before I do anything I'm just going to change this to uh number four so I don't forget so let's get rid of all of this we're going to keep it somewhat simple because we're looking at something new now when I say we're passing through a parameter I'm talking about when we're calling it so let's say we've already Created this one I'm not going to you know run this yet but let's say we've created it let's say I want to pass in an employee
ID I want to pass in a specific person and I want to retrieve their salary I know their employee IDs I just want it to pull up their salary for us so what we're going to do is we'll get rid of this and when we're calling it put this down when we're calling it I'm going to pass through value like one that's Leslie nope and then I want the Salary to be the output so so I'm going to select the salary so we're selecting salary from the employees salary but how do we know that this
one is the person we're looking for well when we're actually creating this parameter we create it right in here that's what tells the store procedure to accept an input value when we're calling it down below we're going to call this employeeid now after we call it after we name our parameter we need to then give It a data type so I'll call this an integer so we're telling the store procedure when somebody calls the store procedure they have to pass through an integer it can't be a string or it can't be a date it has
to be an integer now what we're going to go do is right down here we'll say where the employee uncore ID that's from this column in the actual table we'll say is equal to the employee uncore ID which is our parameter right here now you may be thinking that's Really confusing they name the exact same thing can I change it the answer is yes I actually encourage it so there are some naming conventions that are out there that I think are helpful ones that I personally use um but remember this is just kind of a
variable parameter name you can kind of call it whatever you want so if I wanted to say Huggy muffin I could uh and this could be Huggy muffin so let's try it with Huggy muffin I just came up with that off the top of My head so don't judge me um but we're going to create the story procedure and then when we call it later we want it to return the salary where the employee ID right here is equal to whatever was passed through that parameter that input parameter and we we're going to keep it
as one so it should return 75,000 let's go ahead we're going to create this and now let's go right down here and we're going to run it and we can see that that is the salary and it worked perfectly Now like I was saying that is not what I would actually name it uh there are some naming conventions likeor Pam at the end so you kind kind don't want to keep it at least I recommend you try to keep it similar to what you're actually looking for and you can either end it inore pram or
there's another way you can do it which is come right over here and do pcore and these are just ways that you can tell the code or you can just be able to visually see the difference in The code so this is just what I recommend then you put it right down here you say where the employee ID is equal to pcore employee ID saying this is the parameter that's being passed through and put into our actual query hello everybody in this lesson we're going to be taking a look at triggers and events a trigger
is a block of code that executes automatically when an event takes place on a specific table for example let's take a look at these Two tables now when a new employee is hired they're put into this table with their salary information and everything but sometimes people forget or don't add their information like uh you know who right here uh they're not put into this demographics table and we wanted change that because we want to have everybody in here so when somebody is put into this salary table we want it to automatically update with the employee
ID first name and last name into this Table right here for the employee ID first name and last name so we're going to write a trigger when data is updated into the salary it's going to also update the employee demographics for us now let's go right down here and we're going to take a look at how we can do that now if you watched the last lesson on store procedures we'll do a lot lot of the same writing style or same formatting for triggers and events so we're going to start with is the Delimiter we're
just going to do that right off the bat before we get into anything and we're going to change that to the double dollar sign now the delimiter again in case we have multiple lines of code which we're going to have if we have multiple lines of code when we're creating this trigger this delimer is going to help us have multiple queries within our create trigger statement so this is really important we'll just start out by doing that now Let's create our trigger and we do need to name this so we'll say employee insert and we'll
just call it like that did I spell that right yeah employee insert so we have our create trigger we've named it now we need to specify what event needs to take place in order for this to be triggered so we're going to say after an insert and I need to spell insert right after an insert on And we'll do the employee _ salary table so after we insert onto the employees salary table down below we're going to write what's actually going to happen now we're writing after because we're doing it where when new information is
put on the salary table it's automatically updated into the demographics table but you could also do before which means if data is deleted from the employee salary table something could happen but we're not doing any Deleting or any updating we're doing insertion so we're going to say after an insert on now the next part that we need to write is for each row now this for each row means that the trigger is going to get activated for each row that is inserted so if we had an insert statement that inserted four different people who were
just hired that means this trigger is going to be activated four times now some SQL databases like Microsoft SQL Server have things like Batch triggers or table level triggers that'll only trigger once for all four of them and in my opinion those are really really nice uh I've used those I like them the way that my SQL has it right here is not the most optimal way to do it unfortunately but we don't have access to the batch level or the table level triggers at this time so this is really just the setup for what
we're about to write so after it's inserted on the employee salary table for each row What is going to happen we're going to go down here and we're going to say begin and we'll have end now the code that we're going to write here is what's going to happen after this event takes place so what we're going to do is we want to take from this table let's bring this back up real quick when we insert a new person we want to take the employee ID the first name and the last name and automatically put
it into the demographics table so we want to say Insert and let me do tab insert into we're going to insert into the employee unor demograph table and we're not taking everything so let's actually specify what columns we're doing we're doing employee ID first underscore name and then the lastore name now we need to specify what the values are now from the employee salary table we're taking employee ID first name and last name but we don't want to take all of them right we don't Want to take every single employee ID every single first name
every single last name we only want to take the new Val vales that were just inserted well lucky for us there is something that we have for this so let's do values new parentheses we have something called new Now new is going to say we're only taking the new rows that were inserted there's also an old like this where it takes rows that were deleted or updated but of course for us we're going to be Using new so we'll say new. employeeid new. first unor name and then new . lastore name and we'll close that
then we'll come down here and we'll do our delimiter and we'll change this as well we'll say delimiter back to a semicolon now we're getting this error because we need this right here so let's recap what we've created then we'll actually create it and try it out so we're creating our trigger called employee insert after a row is inserted Into the employee salary table for each row here's what's going to happen we are going to insert into the employee demographics table the employee ID the first name and the last name those are the columns that
we're going to insert into then we're taking the values new. employee ID new. first name and new. last name now MySQL understands that when we say new we're talking about the event that takes place this is the data that's being inserted it just knows that So let's go ahead and create it we're going to run this and it should work let's pull this up says create trigger so that worked now the thing about triggers uh that's unfortunate it doesn't have its own little section under here right but it does have under the employee salary let's
go right here and then under the triggers so we can find it which is great so we have this employee insert if we rightclick we can't really do anything with it that's The unfortunate thing we can't alter it we can't change it we can't drop it we can't do anything um that's the unfortunate part but let's actually test it so now we're going to say insert into and we're going to insert into this employee salary that's how we're going to trigger it so insert into the employee salary and then we'll do employee ID these are
all the columns the first underscore name lastore name occupation uh bear with me for a second Then we have salary and then Department ID so this is what we're inserting into now we have to do our values now this should be should be shorter hopefully we'll do 13 we'll call him uh Jean ralphio there we go last name is sapperstein just like that and not actually just like that that's not spelled right so we have saprin his occupation is entertainment 720 CEO how much is he making uh let's say a million is that a Million
a million he's making a million dollars and he's really not part of any uh department so we're just going to have null so what we're about to do is we're only inserting on the employee salary table but we're putting all the values that we need into the appropriate places let's add a semicolon let's go ahead and run this make sure it worked it says insert into and then one row affected now let's come back up let's look at our salary table First and get rid of this if we pull this up you can see John
ralphio sapin he was added let's go over to the demographics is the moment of truth let's see if it worked and as you can see it worked perfectly we have Jean ralphio sapine now they do need to come back and fill in this information but it's already in here kind of queuing them up saying hey we need this person's age gender birth date all that other information so that is how we can create A trigger based off of a specific table and then when it happens it just automatically does it for us we don't have
to really think about it we just know that we've created a trigger and we can actually go and insert data on that table that trigger is going to work it's going to do what it's supposed to do and that's really really helpful in the real world when you're working with a ton of tables a ton of things need to be Automatically done and you don't want to have to manually do this so having these triggers can save you a ton of time now let's scroll down and we're going to take a look at events now
event is kind of similar to a trigger a trigger happens when an event takes place whereas an event takes place when it's scheduled so this is more of a scheduled automator rather than a trigger that happens when an event takes place these can be fantastic For a lot of things like when you're importing data you can pull data from a specific file path on a schedule you can build reports that are exported to a file on a schedule you can do it daily weekly monthly yearly really whatever you'd like it's just super helpful for Automation
in General now let's say the pwy council comes up with some new legislation they need to save some money especially in the Parks and wreck department we're just spending too much Or they're spending too much and what they want to do is retire people who are over the age of 60 immediately and give them lifetime pay so what we want to do is create an event that checks it let's say every month or every day and then if they're over a specific age we are then going to delete them from the table and they will
be retired this is a fake example so you know go with it so what we're going to do is come right down here we'll select everything from Employee demographics and let's run this and let's pull this up so let's say if they are over the age of 60 which unfortunately is Jerry gach like I don't make the rules but if they're over the age of 60 they're going to be automatically retired so let's come right over here we're going to say create event and we'll call this the delete delete uncore Retirees now before when we
were creating the trigger we were saying based off of a specific event but here we're going to schedule it we're going to say on schedule and then we're going to say every and we could do one month maybe we'll look every single month but here we'll do let's do every 30 seconds every 30 second now we're going to go down we'll say do and this is going to say here's what needs to happen every 30 seconds so we'll say begin and end now What's going to happen every 30 seconds is we're just going to start
with a select statement and I'll just copy this actually we'll start with a select statement then we'll update it to a delete statement but we'll come right here we'll say where the age is greater than or equal to 60 so if we just run this query right here that's only one person as Jerry gach now if we want to write this correctly we'll do the Delimiter we'll have the dollar signs we'll have the dollar sign right down here as well and we'll say the limiter back to a semicolon now every 30 seconds we don't want
to select people who are that age we want to delete so let's go right here we're going to change this CU now we know it should be deleting the right person and we're going to go ahead and create this event let's go ahead and run this and let's make sure it was created Properly it looks like create event zero is affected this should be working let's go back up to the demographics table and let's run this and let's pull this down and pull this up and as you can see unfortunately Jerry gach is no more
um you know he's just too old and the pony Council they recognize that and so it wasn't my rule that was unfortunately Pony council's Rule now really quick if that did not work let's say you couldn't create your Event at all let's go down here we're going to show variables and we'll run it just like this um I'm going to show you how you may need to fix this so we can say where variables is like and then we'll say uh event do it just like this so I have a vent scheduler where the value
is on if yours is off which sometimes that can happen your you're just going to update this to on now another issue could have happened and I just want to explore this For just one second you may not have permissions to delete things if you do not come right up here let's try to figure this out together it's actually edit references and I want to say it's right here into the SQL editor at the very bottom yeah so save updates rejects updates and deletes with no restrictions this needs to be unchecked so go to preferences
go to the SQL editor down at the bottom unclick this if that didn't Work now if everything worked perfectly you don't need to change a thing but if it didn't I just wanted to work through you know some uh troubleshooting that you may just have to Google or chat gbt or something to try to figure out so that is how we can create an event in my squel to run on a schedule now typically you wouldn't do it on every 30 seconds you would do something like every one month or every one year or you
know a longer time frame but you get the Picture of what we're trying to do hello everybody and welcome to the very first project in the MySQL Series today we're going to be focusing on data cleaning now if you don't know what data cleaning is it's basically where you get it in a more usable format so you fix a lot of the issues in the Raw data that when you start creating visualizations or start using it in your products that the data is actually useful and there aren't a lot of issues with it so that's
really What data cleaning is now what we're about to do is create a database we're going to import a data set this is a real data set and what we're going to do is we're going to clean the data so I'm going to show you and walk you through all the steps in order to clean the data the data set that we're going to be working with will be in the GitHub so you can just go and download that I'll have a link somewhere in the description but let's get started first thing we're Going to
do is create a new database so we'll go right over here to create a new schema and we're just going to call this one we'll do this is worldcore layoffs so if you can't tell already uh we're going to do World layoffs uh that's the data set they're going to be doing we'll just click apply and that creates our world layoffs right here now we're going to go into here there are no tables we're going to right click on tables and go to table data Import wizard now we haven't done this yet uh in this
series we have an import any data but that's what we're doing here we're going to show you how to import data so we'll go ahead and click browse and as you can see right here we have this layoffs data set let's open this up and we're going to click next and we're going to create a new table there's no existing table in this database you can drop it if it exists uh if you'd like to doesn't matter this is New we're going to go ahead and select next now right here is where you configure import
settings now my SQL is going to automatically assign a data type based off of the data in these columns so we'll take a look at the data later now there is one thing that you can take a look at real quick we have this date column now in here it assigned it as a text that's because of the format we are going to import this as the raw data we're not going to try to Change anything in the import settings we're just going to assume this is how the data was in the table so I'm
not going to change anything although this may be something that you would want to change to something like a date time and go and fix that but we're going to import this as the raw data let's go ahead and select next we're going to import it we just select next now this could take a little bit uh so while this is importing I'm just going to skip Ahead this should take just a few minutes to import all right this just finished let's select next and we imported 2361 records let's go ahead and select finish we
can get rid of this and let's refresh this perfect we have our layoffs table so we'll select everything and I'm going to go and double click on the world layoffs because I don't want to write out the whole thing every time so we're Going to say from layoffs and let's see what we get so let's take a look at the data that we're going to be working with in this data cleaning project so this data set is layoffs from around the world starting I think 2021 and we'll take a look at that in this date
column later but it has the company so it has the company that did the layoffs it has the location of where they are what industry they are part of how many they laid off The percentage that they laid off so the percentage of their company the date the stage which refers to the stage that the company is in whether it's a series B post IPO uh they don't know then there's the country and then we have funds raised million so we have a lot of information here and in the next project we're going to be
doing explor at data analysis so we're cleaning all of this data and then in the next lesson we're going to actually dive into it and try To find Trends and patterns and all these other things so what we are going to do is we're going to go through multiple Steps step number one is we are going to try to remove duplicates if there are any that is the first thing I typically do especially if I know this data shouldn't have any duplicates or it'd be you know repetitive or unnecessary to have duplicates the second thing
is going to be to to standardize the Data that just means that if there are issues with the data with spellings or things like that we just want to standardize it to where it's all the same as it should be number three is we'll look at the null values or blank values and there's a lot of null values in here there's even a blank value right here and we're going to see if we can populate that if we can and there are times where you should there are times where you shouldn't I'll kind of walk
Through that as well and lastly we want to remove any columns and rows that aren't necessary and there's a few different ways to do that uh this one is a little bit you know um let me write this actually real quick remove any columns so I'm just going to say there are instances where you can do this there instances where you shouldn't do this when you're working with massive data sets and you have a column that's you know completely irrelevant Completely blank you don't have any ETL process that is required for it um you can
get rid of it and it can save you time when you're querying your data now with that being said uh and we'll talk about this later in the real workplace often times you have processes that automatically import data from different data sources if you remove a column from the raw data set that's a big big problem so what we're going to do is something I would actually do in my real Work which is I would create some type of staging or raw data set let's say this one's our raw one and we could have even
called this layoffs raw we're going to create another one we're going to create a table so we'll say create table and let's call this one layoffs unor staging and we literally just want to copy all of the data from the raw table into the staging table so we can do that really quickly by just saying like layoffs and if we run this and we Refresh you'll see we have the staging database and let's copy this here we go and we'll do layoffs underscore staging and so now we have all of the columns and all we
have to do is insert the data so we're just going to say insert then we're going to say layoffs staging go right here and we'll select everything from layoffs and let's run this and if we select the table we now have all the Data over so super super easy and now we have these two different tables now again why do we do this is because we're about to change the staging database a lot if we make some type of mistake we want to have the raw data available this does happen this is something that you
do in the real workplace because you're not going to work on the raw data it's just you shouldn't do it it's not best practice so I'm going to show you what I would actually do in my you know like a Real job so that's what we're going to do now we're only going to be working off the staging database and we can copy this and make different databases for different things as long as we have our raw data we can really do anything we want going forward and that's what we're going to do so the
number one thing we're going to look at is to make sure that we are removing duplicates we want to make sure we don't have any duplicate data in here And if so we're going to get rid of it now really quickly if you did my Microsoft SQL Server project we did something very similar but we had an extra column over here that gave the unique row ID which made it really easy to remove the duplicates here there is no identifying Factor that's going to be easy for that so I'm just going to tell you up
front removing these duplicates is not going to be easy but we'll walk through it every step of the way so what We can do is try and do something like a row number and we'll match it against all of these columns and then we'll see if there are any duplicates now I'm just we're starting off strong okay we're jumping into kind of some of the more advanced things it does get actually easier as we go but this is the actual order that I follow so uh I'm going to keep it so let's try to identify
duplicates so let's copy this let's pull this down do underscore Staging there we go now what we can do is we can do row number and we'll do that Partition by basically we could do every single one of these columns that's kind of what we're doing so what we can do is we can say everything then we can do a comma and we'll say row underscore number and would be just like this and we're going to do this over and we want to Partition by all of these columns essentially we could just do a few
for now to see if we get any hits and then We can look at that but they're going to be multiple companies that have layoffs in the same location and Industry although their total laid off would probably be different the date would probably be different so if we do something like uh company let's do industry we will do total uncore laid uncore off comma percentage laid off and then let's do date now I'm doing date with the back ticks because date is A keyword in my SQL so if we do it like this it just
really makes it easy so we're going to Partition by all of these things so let's do Partition by and let's bring this down real quick so I'm just going to say over Partition by and we're going to call this as rowcor num now let's try running this let's see if it works really quickly that's important and over here you can see that we have our row number now these mostly are unique and these all look unique I'm Not going to scroll through all of them but we want to be able to filter on this so
we can filter where the row number is greater than two if it has two or above that means there's duplicates that means there's an issue so let's go ahead and we're going to take this we'll put it into either a uh subquery or a CTE I'll create a CTE for this uh cuz it's really easy so we'll say four or not four we'll say with then we'll do uh duplicate CTE as then we'll just do Our parenthesis we'll paste this in here get rid of that right there and now we're going to say select everything
from this duplicate CTE then we'll say where rowcor num is greater than one let's run this and let's add a semicolon let's run this and you you can see that these ones have duplicates so these are our duplicates actually and we want to get rid of these exact rows now just to confirm that these are uh the duplicates Let's look at this one I never heard of this company um but we'll take it really quick and let's select we'll say where company is equal to and we'll call this Oda so let's run this and it
looks like these no no no no these aren't duplicates that's a good thing we checked okay because it looks like um these aren't the exact same although they're very very close these technically are not duplicates so I'm Glad we checked this we need to do this partition bu over every single column that's what I'm realizing so we'll do company comma location I'm glad I'm genuinely glad we you know it's good to make mistakes um and figure things out as you go it really is important so company location industry total laid off percentage laid off date
then we'll do stage and then we'll do country and then funds underscore raised underscore Millions so We're changing the CTE to partition over everything so now let's run this okay Oda is not in there that's the only one we checked um but let's look at Casper I know this these are the um aren't these the mattress people didn't know they had layoffs poor guys um all right let's take a look it looks like this row and this row are duplicates these are our duplicates so we are going to want to remove only one of those
we don't want to remove all of those so um just Looking at this one example it looks like this uh query is working well so here's our duplicates now we need to identify these exact rows we don't want to delete both of them when we looked at C asper there's the real one that we want to keep then there's a duplicate that we want to remove we don't want to remove both that would be bad now in my squel it's a a little bit trickier to remove things than it is in something like Microsoft SQL
Server post Grace SQL Um they have different ways that they can delete rows for example in Microsoft SQL Server we could literally identify these row numbers in the CTE and delete them from it and it would delete it from the actual table we can't do that in my SQL and I'll show you uh let's actually copy this we'll go like this and we'll say uh let's say we want to delete these we'll say delete from We're deleting this from where the row number is uh greater than One what am I writing right here delete there
we go so delete from this duplicate CTE where the row num is greater than one that's all these duplicates we want to remove them let's try to do this let's run it let's go down if we look at the bottom it says the target table duplicate CTE of the delete is not updatable so you cannot update a CTE a delete statement is like an update statement um essentially so what we are going to do is we're going To do something a little bit different because this is how I would love to do it that makes
it super super easy to remove duplicates but that is not always the way that things happen in the real world I think what we should do is take this right here and let's run this we should take this right here and put this into let's say a staging two database and then we can delete it because we can filter on these R nums and we can delete those which are equal to two so it's Essentially like you know creating some type of table and then uh just deleting the actual column so we're that's exactly what
we're going to do so it's essentially just creating another table that has this extra row then deleting it where that row is equal to two so you know somewhat fairly straightforward but um let's try it and let's see what happens so we're going to come down here and do is create our table uh let's try doing that with here let's uh let's copy To clipboard a create statement let's see if this works perfect that's exactly what I wanted now all we're going to do is say we're creating the table layoff staging two now this is
a create table statement and we're naming the columns and then we're also assigning the data type so we have all these things but we want one more do a comma and we want to add row underscore num and I need underscore num and that should be an integer data type so we're just keep it Just like this let's go ahead and copy this and let's run it see if it worked bring this up looks like it worked properly uh and let's say let's go back up I want to rewrite things that I don't have to
let's run this so now we have this empty table so we want to insert this information right here so we're going to insert into so we'll insert Into and then we'll do this right here so insert into staging two now let's try to run this see if it works and let's run it and let's select that table and now we have it so let's pull this back up and I'll walk through what we just did cuz I know I'm going quick but we have so much to cover um in this lesson so we just inserted
basically a copy of all these columns but in this new table we added one more the row num so now we can filter we can say where I need to Spell that right where row num is equal to two or we should should say greater than one because some might have multiple duplicates and there you go here are our duplicates now we're going to delete these so all we have to do is come right back down where did I go copy this come right back down here and we're just going to say delete from we
just did a select statement I always recommend doing that to identify what You're deleting then you change it to delete and now if we run this go and I'm actually going to keep this um let me see there we go and let's run it again and now they're gone and if we say um just the whole table this looks wonderful now this row num is going to going to be a column at the end that we probably don't need anymore right it's a redundant column it adds up extra space in memory and storage and all
these other things and Processing times we're just going to get rid of it uh that'll be at the very end I'm sure so it looks like we are good to go that's how we remove duplicates now um there are different ways to do it when you have different columns like if you have a unique column over here makes it so much easier so so so much easier but we didn't have that so we had to kind of do a workaround uh Welcome to the Real World now now let's look at standardizing data so standardizing data
Is finding issues in your data and then fixing it so I'm already noticing right here looks like we have a space at the beginning uh we could easily just do a trim on this column um and let's I I don't even think I was um I did this when I wrote out all the the scripts for this let's just do from this table why am I writing it all out again we actually want to select the company and then the or actually we'll just do distinct company Distinct company let's run this and if we do
a trim around this let's run this again and that looks better so if we do uh company company comma and then we'll just do the trim I don't want to we don't need to do distinct right now we'll do the company this just looks better so we're going to update that uh it's super easy now if you ran to nichu Just a second ago uh I may need to help you change that so if you couldn't update or delete those things earlier I should have told you this earlier I apologize all you need to go
is to edit you just need to go to edit go to preferences at the very bottom go to SQL editor go all the way down to the bottom and right here we have safe updates on if you have this selected that means you can't update anything that's a problem so what you need to do is Select this uh Or unselect it like I have it and save it you may have to even restart your myal potentially in order for those changes to take effect but then you should be able to update that now all we're
going to do is update this table and we're going to set and now we need to come back here and we'll say we're going to set the company equal to trim now if you don't know what trim is or you haven't taken that lesson trim just just takes off the white space off The end so it took the white space out of here or off the right hand side as well so we're going to update this and let's do a semicolon a semicolon let's run this let's select this again and it was updated properly so
we're already off to a great start now the next thing that I want to take a look at is the actual industry so let's go back's copy this and let's take a look at the industry so we'll do industry and we'll Run it now if you look in here there's a ton of different Industries um and there's marketing and marketing oh because I haven't done distinct uh please ignore me let's do distinct and there's a ton of different Industries in here Transportation Healthcare consumer uh there's a blank one which we'll take a look at Aerospace
there's a lot of really unique ones let's actually order this we'll do order by uh and let's just do one which is the First column we're just ordering it on S so we have null we have blank that's a problem we'll take a look at that later uh but this is an issue crypto cryptocurrency and cryptocurrency these are all the same thing these should all be uh on or labeled the exact same thing the reason we need to change this is because when we start doing uh the exploratory data analysis visualizing it these would all
be their own rows their own unique thing which we don't want we Want them all to be grouped together so we can accurately look at the data let's take a look at any other ones fintech and finance that could be the same thing I'm not 100% sure I'm not a fintech person um I think for now the only one that I'm confident in changing is this one right here which is cryptocurrency so let's go ahead and update that so all we have to do and we need to actually let's select really quickly where it's like
crypto so we'll say uh where Industry and we want to select everything where the industry is like and we'll just do crypto they all start with crypto right yeah we'll do crypto just like this and let's run this and let's just take a look lot of layoffs in the crypto industry good night all right let's find where it's cryptocurrency okay so even this one it's crypto and I know Gemini crypto crypto and then it says cryptocurrency So these should be all crypto you see how 95% of them are crypto so we're going to update these
other ones oh this one is CR YP is that how you spell crypto jez I don't know anything all right so we want to update all of them to be crypto so what we're going to do is we're going to say update layoffs industry 2 we want to set the industry equal to crypto just like this where and we can do it a few different ways we can say industry We I think we can do like let's try this real quick I I some of this stuff I don't have planned out I'm just kind of
going with it as we go um which I like better you know we kind of we work together on this we figure these things out together that's what I like um then we'll do like crypto just like this exactly like we had it up here so if it's like crypto it should be crypto let's try this let to see if it ran CU it may not have I can't remember yeah it Worked okay so it updated uh three rows and that looks correct now let's go back up and let's run this as we scroll down
they are all the exact same beautiful beautiful beautiful so if we do uh distinct industry again let's get rid of this if we run this query and we scroll down crypto is its own thing beautiful and it looks great we can look at those later on how we can update those um but let's keep going let's look at our whole Table again and these blanks and these nles are actually an issue we do need to deal with them but I I my instinct is telling me go fix it um but my you know tutorial side
is saying okay stick with uh the tutorial the order that we agreed on um so let's go take a look we've looked at company we've looked at industry um let's just real quick look at uh distinct uh location now it's good to look at most of these things right there Could be small tiny issues that you just never saw um and we're just going to order by order by one just do a real quick just a scan to see if we find any issues um that could be an issue but that could just be another
language if I'm being honest I don't know I'm as I'm just scrolling through here because I want to make sure because this is not something I had in my uh pre-written script this looks pretty good to me um Let's do everything we'll run this and now let's look at Country so we'll do distinct country and let's run this and let's scroll down again this is sometimes just what I actually do all right we got an issue right here super common somebody put a period at the end some Dingus uh and we're not going to judge
that person I don't know who it was or who ruined this data set but um yeah that's a problem so we're going to need to just update that It looks pretty simple um but I'll just say where country is equal to or let's say like and then I'll say like United States there we go and oops want to say select everything I just want to see um where it's at oh jeez there's too many let me see if I can spot it I can't spot it it looks like they're supposed to be United States not
United States dot that's the issue um we can easily easily fix this and we can probably let's do um really Quickly let's do select oops select distinct and then we'll do country comma and then we'll do a trim cuz we want to get rid of that um that one we'll do country now just doing the trim won't fix it let's go to the bottom so that doing the trim doesn't fix it but here's what you can do it's a little trick of the trade here we're going to do something called trailing which means coming at
the end so what's trailing the period from country let's try running This scroll to the bottom and it fixed it so this is a little um a little Advanced little tidbit for the trim here we can do trailing from the country and we're looking for something that's not a white space we're specifying we're looking for a period so now we can do is we can say update we can set the country do update um this table and we'll oops and we'll set what am I doing what's going on here we'll set the country equal to
and we'll do it Just like this we're only going to do it for Country right uh so we'll say is equal to trim and we'll say where country is equal to or actually let's say like and let me see if I have this oh I don't let's let's just say like United States like we had it before just like this so let's go ahead and update this after I put my semi colon in let's run this and let's run This again again it shouldn't need to fix it anymore it's just one row that's perfect that's
exactly what we wanted now one thing that's really important uh and this is you know this is a longitudinal it's not the right word at all give me a second I can't I can't speak and right at the same time so sometimes I just say uh dumb things um uh if we want to do not longitudinal but um time series that's the word I'm looking for if we're trying to do time Series um exploratory data analysis time series visualizations later on this needs to be changed right now it's text and we can look at that
by going right actually let's refresh this we're not looking at staging we looking at staging two if we look at the columns and we come down here to date it is a text column that's not good if we're trying to do uh time series stuff we want to change this to a date column now how can we do that let's take a look so let's do Date and let's not actually do it like that let's do date backs slash so we're just going to look at the date now let's change this cuz we want to
format it how we want to format it with it which is month day year so how can we do this well there's something that's very very helpful works perfectly in the situation and is exactly what we're going to do it's called string to date so we're going to do string underscore there it is right there underscore 2core date it literally helps us go from a string which is a text that's the data type to a date so it's perfect now all we need to do is pass through two parameters we have to pass through the
column which is the date column and then what format we wanted in now if you haven't done date formats before I'm going to kind of walk you through it while we're looking at it um in order to format this properly you use a percent sign so it's going to be a formatting For a month a lowercase m a capital m is something completely different I believe it's spelled out I need to I we can look at that in a second if we want to actually and then we can do this right here and then we'll
do another one so we're formatting it in the way that we want it but also converting it to an actual uh date column so now we want month and then we want day lowercase day we'll do a forward slash and then another percent sign and then a capital Y which stands for I believe the four um four number long year uh have a let's just um let's look at this real quick so it worked perfect so we're it's taking in this format that it's in right over here and converting it into the date format so
this is the standard date format that you're going to find in my SQL now let's see what happens really quickly just for fun uh let's see if we do capital M uh it looks like that's not going to work at all uh let's do lower Case Y and um just formatted it to 2020 I think it took the first two numbers it looks like I don't know why it's doing that if I'm being honest um but if we keep it with the capital Y as we should this looks perfect this looks exactly like what we're
trying to do so you can mess around with it it depends on the how the data is formatted in your original column when it converts it to the string to date and there's a lot of Different stuff you should just look up um date formatting in my SQL really interesting stuff so we're going to update this date column to this which is our new date column let's go ahead and do that we're going to say update you guys should be getting used to this by now that's the whole point is getting used to doing these
things so we're going to set date equal to and then we're going to put in this right here the string to date go ahead and do this And let's run it make sure it worked 2355 rows it looked like it did every single one uh but let's go ahead and get rid of this and let's run it and it looks like it worked perfectly now there were some nlls it looks like and that'll be something we have to look at later when we talk about nlls but um overall I believe this looks proper now if
we refresh this let's refresh let's Come down to the date you'll notice it is still a text it's date it's called text but now it's in the date format now that's really important and maybe I should have done that earlier if I'm being honest um tried to convert it to a date column it wouldn't work it would give us an error um you'll just have to trust me on that one but now we can do it where we can change it to a date column so let's do alter table now only do this never ever
do this on your raw Table only do this on things like a staging table because we're about to completely change the data type of the actual table so we want to change the layoff staging two and then we're going to come down here and we're going to say modify column and what column are we modifying it's this date column there we go and we want to change it to what data type a date and am I spelling this right yeah I just need a semicolon here whenever I see an error I Always yet a just
look for the semicolons so let's go and run this and let's refresh see if it worked and the date was changed to a date which is perfect it's all we wanted to do uh just to make sure we were doing what uh or we'll set ourselves up later in the future really well let's look at our table all right this is very good so we fixed a few uh just issues with the company I believe something with the industry oh the cryptocurrency we Changed the country um I'm just going to go ahead and tell you
right now this one uh we're not going to look at until we look at the um nulles and whatnot in just a second so we're not looking at that one yet and then uh we have this extra column that we've done so we've done a lot so far but the next thing in the process step one was remove duplicates step two was standardization step three is working with null and blank values now this is going to happen You're going to have nulles and you're going to have uh blank values in here I it's somewhere um
it's just going to happen and so we need to think about what we're going to do with that information whether we want to make them all nulles make them all blanks try to populate that data let's see what we're going to do so let's start off with the total laid off we'll just do uh where total underscore laid underscore off is null So in order to look at the null we say is null let's try equal to null it's not going to give it to us we have to say where it is null so we
have these values these are completely null uh there's quite a few of them but remember this is also useful information but if they have two nulls uh that probably is pretty useless to us um that's something I think we'll take a look at in a little bit actually we'll say we and we may save this query Percentage uh laid off is null so if they're both null like these these are all I believe fairly useless to us these might be ones that we remove so let's actually look at this um in step four we look
at removing rows and columns but one thing we should take a look at I remember this industry let's do uh industry do distinct this industry had some missing values and let's take a look at That okay so we have a missing value and we have a null here so let's look at this query and let's say where industry is null or do industry is equal to a blank like this we'll select everything to run this all right so it looks like there are a few that are blank now what we can try to do is
see if any of these have one that's populated let's take Airbnb for example let's search for this really quickly and this is 100% um you know It's just helpful it's really really helpful to be able to populate data that isable is that a word um let's try it so we'll say uh select everything I just wanted to do where spell that right where company is equal to and let's do Airbnb there we go let's run this and it looks like we have this one right here so for example um these whether they have them or
not we're going to try to populate these if this Bal or carvana or Jewel had multiple layoffs these ones should if these ones aren't blank if they have one that's not blank we should be able to populate it for example um not the one I was trying to do if we look at Airbnb this one has travel so we know this is the travel industry so we can populate this with travel again we want this data to be uh the same so if we're trying to look at you know what industries were impacted the most
this row isn't going to be Affected or this row won't be in our output because it's blank we want that to be traveled to represent the data properly so we want to update it so if this one has travel we should be able to update this row with this travel right here so let's see how we can write this and let me give myself some rows right here all right now what we're going to need to do is try to do a join here so let's try writing out in a select statement then we'll just
change it to An update if it works so we're going to select everything and we're going to do this from staging two from staging two and we'll call this uh st2 and then we'll join on itself cuz what we're going to do is we're going to check in this table does it have one that is blank and not blank if so update it with the nonblank one that's essentially uh in layman terms what we're trying to write but writing it out could be a little bit more difficult um So we're going to join on itself
and we'll call this uh let's actually call this table one T1 and T2 because they're the exact same table uh and we'll we'll do this on and we're going to say t1. compan is equal to t2. company so the company has to be the same that's important and we probably should do the location is the same as well now we'll do and t1. location is equal to t2. location I'm imagining you know there's another Airbnb in like South America Somewhere that's called Airbnb but you know I'm just imagining a scenario right where we have to
think about different use cases rather than just large companies so those other ones they may have ones that are in different locations we don't want those um we don't want to change them if they're not the same so these are the same now what we want to find is we're going to say oops we want to say where then we'll do t1. Industry is null and then we want to check that t2. industry is not null we'll say and t2. industry is not null and let's just run this let's see if we get anything so
let's think this through because we got nothing in our output we're selecting everything we're joining on the company and the company um and the location where T1 industry is null and T2 industry is not null let's just get rid Of this for a second I just want to see if this changes anything it doesn't and it's possible actually that instead of doing is null we could do or and this I'm glad we're walking through this we can do or is equal to blank and let's try running this there we go okay so it looks like
there's Jewel carvana and Airbnb these ones all have Industries um where it's null or blank and an industry is not null so that's really good now if We scroll over see the industry here this is our T1 this is our first table if we scroll over I bet we'll see the t2 industry where it's not null let's scroll over and here's our industry we have travel transportation and consumer so this worked exactly as we had hoped I can even um pull this up here just to show kind of show you a little bit easier what
that's doing and we'll do t2. industry this is kind of like what we're trying to do so if it's blank this One is going to be populated into here here if there is one that is not blank so that's essentially what we're going to do let's write the update statement and we're going to see if it works this we have to translate this to an update statement so we'll do update and we're going to update uh this right here so we'll say update T1 and then we'll do the join right there and now we have
to do a set statement so we'll set the t1. Industry equal to and I'll just copy this t2. industry I just don't like I don't like writing things out um then we say where so we do this it's like that and let add a semicolon okay let's confirm so we're updating this table T1 we're joining on T2 where the company is the exact same we're setting T1 industry equal to T2 industry so the T1 should be the blank one so where the T1 industry is null or Blank and T2 industry is not n let's go
ahead and run this semicolon see if there were about three updated yeah rematch zero R affected though let's go take a look um we have to let's run this query looks like those are still null let's run this uh that one is still blank now let me think here I'm I'm trying to think of why this didn't work and I want to walk you through my thought process it is Poss possible that because these are blanks And not nulls that it's not working I and I will say that is something I typically do where I
set these blanks to nulls first so let's actually try that and see if that changes anything I'm just going to update uh this I'm going to say set the industry equal to null we'll say where IND industry is equal to Blanks so we're just changing it to null where it's blank let's try this let's go back down here to our select Statement so these are all nulls okay I think I think this is now going to work because now you can see on this side it's going to there's only one option for it to populate
it before there were those um blanks which I think was causing the issue um let's get rid of of this part cuz now we have no nulls and now let's try running this we're workshopping this on the Fly guys uh let's see three rows affected heyo All right let's go see if it worked um let's run this query and we have none that's perfect let's look at Airbnb all right all right R into into some issues but we work through it we figured out the issue and now it's working properly and we can even come
back up here to select everything and it looks like Bailey's is the only one that still has a null let's look up Bailey's real quick and we'll say we're company is like uh Bailey let's run this yeah and there's only one so there wasn't another row all these other ones like carvana and um I can't remember the other Jewel and Airbnb those ones had an extra row they did multiple layoffs this one only did one layoff so we don't have another populated row where it's not null actually populate the null row that's really all that
happened uh that's why that worked that way so I'm really happy that worked awesome job guys uh I was Starting to question myself do I even know how to use my SQL I mean I was really starting to question my abilities here um take a look uh I think that is all we're going to do for populating null values now here's why things like total laid off percentage laid off um funds raised how are we going to populate that with the data that we have here I don't believe we can now we might be able
to populate oops we might be able to populate some of this if we had The um company total like if we had the original Total before laid off because then we could do calculations like um oh these companies went completely out of business that's not good 1% that means 100% was laid off um but if we had you know the total if they had 50 employees and 100% were laid off we could populate the total laid off whoops did it again we could populate the total laid off by saying if this is 50 100% was
laid off that's 50 people were laid off we don't Have that data so we can't go and populated I don't believe funds raised we might be able to scrape some data from the web and populate this but that's a totally different thing um not part of this project so I think the data cleaning for the null values and blank values I think I think that's going to be done um it's possible that the stage could be the same and if you want to go check you can but we're going to keep chugging along because we
want to remove Columns and rows that we need to now if you remember we were looking at this before did I save that uh query let's go look here we go bring this down to the bottom all right these rows let's let's really take a look at these um and think about if there's going to be Health dust um what we are trying to do with this data in the near future is we're not just trying to identify a company or a location that had layoffs and maybe we Are maybe that maybe we are trying
to do that but these have no layoffs and no percentage laid off so in my opinion I don't know if these laid off any at all um I believe that we can get rid of these now deleting data is a very interesting thing to do you have to be confident am I 100 % confident no not really but I'm confident enough to know that what we're about to look at in the next one we're going to be using these total Aid off a lot percentage laid off A lot when we're looking at um you know
actually querying the data and doing some exploratory data analysis so we're going to use these a lot I don't think uh these I'm not even sure if these are accurate I'm not even sure if they actually did have a layoff it's saying they did it doesn't show if they laid off any so um can we delete this yes should we delete this it's iffy uh I'm not 100% if I'm being completely honest and there's a lot of rows like that this Is this could be like 100 or so I'm really not I mean I could
run a query and run it but I don't want to I don't it's not a big deal the point being I don't think we need this information so we're going to get rid of it if nothing else just to show that you can do it so now we'll say uh Delete and then we'll do from here there we go so now we're going to delete these rows let's try to select them again again and they are gone so we deleted the ones where the Total laid off was blank and the percentage laid off was blank
we just I can't trust that data I really can't um and let's go back down from right here a semicolon so I sometimes I have to walk myself through these things um all right this R num I mean come on we don't need that anymore let's get rid of it um so what we can do now it's a little bit different syntax we want to drop a column from this table so we have To do the alter table again we're going to alter table layoff staging two and then we're going to say drop column and
rowcor num if we run this then we run the table again should be gone and it is so this is it this is our finalized clean data now in the next project we're going to be doing exploratory data analysis on this cleaned data we're going to finding Trends and patterns and writing complex queries it's going to be Phenomenal I'm super excited about it and I love this data cleaning one um I made some mistakes I'll be the first one to admit but cleaning data is not always a straightforward thing um you know you have to
you kind of mess around with it figure it out and and you know that's what we did uh whoa took a while so just a recap we remov duplicates we standardized the data we looked at the null values or blank values then we removed any Columns or rows so we did a lot um if you go back and you actually scroll through here and look at some of this code that we wrote uh it's not super beginner stuff so if you're following along with these things and you are getting this project this is a fantastic
project to put on your portfolio I myself would put this project on my portfolio because it's a very very relevant thing hello everybody in this Pro project we're going to be focusing On exploratory data analysis now in the first project we worked with this exact data set and we cleaned up the entire thing and that was a really good project and it set us up to explore the data and with all that clean data we'll be able to look at our data much better and find better insights while we are using it now normally when
you start the Eda process or the exploratory data analysis process you have some idea of what you're looking for sometimes not always And sometimes when you're exploring the data you also find issues with the data that you then have to clean so even though I did a data cleaning video and then an exploratory data analysis video and they're kind of separate projects sometimes those coincide together where you're exploring it and cleaning it at the same time now what we're going to be doing here with this data set we're just going to be kind of exploring
it I don't have any agenda I don't have any you Know one thing that I want to look at I just kind of want to look at everything and we'll kind of discover and go uh about things as we are learning and looking at this data set we will however start off really simple with kind of the basics work a little bit more towards the tougher stuff and then at the end we'll have some more advanced things that I think will be really fun so with that being said let's start off with kind of more
easier things we'll kind of Just ease our way into exploring this data set let's pull this down and let's copy this right down here now we're going to be working with this total laid off and percentage laid off or most likely this total laid off quite a bit the percentage laid off isn't super helpful because we don't know how large the company is we don't have another column here that says here's how many total employees they had and then okay they had a percentage laid off you know We won't work as much with this one
but we'll work quite a bit with this total laid off let's look real quick we can look at something like the Max uh total and I need to use a parenthesis Max total laidor off and let's look at this so on one day there was somebody out there who had the Max total laid off of 12,000 people that's a lot of people to lay off in one you know one go that's a lot uh let's also take a look at the max and I think it was Percentage laid off let's run this and it looks
like one now one represents 100 that means 100% of the company was laid off um and that's you know that's not great it just means an entire company went under essentially we can actually take a look at that because I'm interested to see you know if there's any companies I recognize or can see um where and you come right down here where the percentage laid off is equal to one Let's go ahead and look at this and let's take a look so we have this ahead I'm just going to go through here and see if
I recognize any of these sumon the crypto space blockfi I feel like I recognize that one I don't know uh let's keep going deliver it's not good they left luck of 120 people I'm just curious I mean I'm I'm just kind of scrolling through here trying to see if I recognize any these are companies that like completely went Under or or lost all their employees Vault Bank interesting just interesting to me we're going to be taking a look at a lot of stuff um but these are companies that completely went under um that's you know
fortunate we can also order by uh total uncore laidor off in that's not how you spell it in descending we'll see which company went under had the largest so this one had 2,000 construction company had 2400 people they went um under Doesn't say what stage they were at but that's in the United States we can also take a look at and there's another column over here called funds raised in millions let's look at that one so I want to see um these are companies that had a lot of funding or potentially a ton of funding
uh let's go over this is like $2.4 billion I believe like I think it's like a ton of money um quibby I believe I know this company uh in blockfi I I Thought I had heard of them I'm pretty sure I know that is so whby is one that I'm definitely familiar with it was like a short form uh Media Company yeah yeah yeah then there's British volt which looks like an electric company that went under so you know some big companies that went under um in 2023 2020 2022 so that's interesting so we have
a lot of companies here and we're just looking at um that had total Aid off but let's take a look let's let's use Group by real Quick I want to look at the company and I also want to look at the sum of the total laid off and for that we need to use a group by the company and let's just start with this and I'm sure we'll use an order by in a second yeah let's order order by order by let just do two for now in descending and two stands for one two this
is the total Aid off so uh for the total for this table and we don't know how far go back I goes we haven't Checked that yet we'll check that in a second but for this table you should recognize a lot of these companies so I think it starts in like 2020 until like sometime in 2023 but this is Amazon Leto of 1,800 people Google 12,000 I'm guessing that's at one time because that was the max that we looked at earlier uh this is Facebook or meta Salesforce Microsoft Phillips Uber Dell Cisco pelaton I mean
these are a ton of big companies arvana they let go of Thousands and thousands and thousands of people Twitter that's not surprising uh given what's the change of things Groupon um ton of ton of people or ton of companies and that's a lot of people that have been let go now let's really quickly uh before we keep going I want to look at our date ranges real quick so let's select everything um whoops we'll do from there and how do we want to do this let's do minimum of date and let me do it like
this Date and then we'll do uh the max as well because I want to look at the date range that we have here let's run this it looks like it starts in 2020 of 311 so right when like I believe the pandemic started the uh Co started I want to say that's like right when it hit at least Us in the United States then this is almost exactly 3 years later so early 2023 so just in those three years you know here's some of what we're looking at these companies Have let go of quite a
few people or had layoffs we could also take this exact thing oops what did I do here copy this again we can also take this exact thing and look at quite a few other things there was um the industry so we look at industry like what industry got hit the most during this time or had the most layoffs um all we're looking at right now is total laid off we can also look at um percentage in a little bit but it looks like consumer got hit really hard Retail really hard that makes a lot of
sense with shops closing down because people couldn't come in for the Corona virus now we're just making assumptions right um but you know during that time it was mostly covid that impacted a lot of stuff then we have transportation Finance Healthcare food real estate um yeah there's a lot a lot of people let's look at the lowest ones manufacturing ftech Aerospace energy legal so low numbers on those High numbers on these So really really interesting uh let's go back up just want to look at our whole table really quickly see what we got while we're
looking at this stuff and let's run this now we're looked at the company looked at the industry I would really be interested to look at the country as well which countries at least from this data set and we can copy or we can go right here country because I believe that United States had the most Holy mackerel they had by far the most uh then India this is 256,000 people um lost their jobs we'll look I think we'll look at the dates in a little while like a kind of like time series like how many
per year per month per day or whatever you want to look at but goodness gracious uh that's a lot of people within just three years in the United States India Netherlands Sweden Brazil Germany United Kingdom then it Goes down and down and down but these are just reported um from this data set that I I had gotten so really really interesting good night the United States had much more than than most for sure um let's actually look at that date real quick or we can look at it by year um so we have this date
and if we do it like this and we can by date real quick so this is going to do it by individual date and let's order by let's do one so this is the Most recent date so it's literally by date that's reported and we don't want that let's do it by the year so 2020 2021 2022 2023 we can do that fairly easily we'll use this year function and we'll group grout by the year as well let's try running this there we go it looks like in 2020 880,000 people 2021 uh 16,000 160,000 in
2022 this looks like the worst year and then it's only we Only have three months of data in 2023 there's 125,000 holy smokes so in 2023 it looks like we're ramping up because I'm recording this in 2023 about a month after this dat set um that we got this data set there's 125,000 people um around the world you know but just in those first three months so this is going to be a lot higher than even 2022 that's pretty wild um very very interesting one other one one while we're looking at group by um there's
There was a column and you can go back and look at it if you'd like but it's called stage and this shows the stage of the company and if we run this we're all just looking at total a off but if you look at the um stage of the company this is like the different series that they're in ABC d a I believe is like a series a funding that's like a super super starting oh this is like a seed phase then there's series a and then it goes up up up up until usually they
go Um like they do IPO or they get acquired or something now if we go up here and we do two descending I want to see which one had the most so this is post IPO this is the Amazon the Googles of the world the large large company compies that are post IPO or initial public offering then there's unknown we don't know which that is um a lot of you know layoffs from Acquisitions cdb all the way down so it looks like um most of it's coming from you know these Ones right here really really
interesting let's go look at percentages I'm just going to Lally trying to say literally I'm going to literally copy these um and with percentages I don't think uh let me look at percentage I don't think the sum is going to be a good indicator I don't know if this is a good one to even look at because and then we're looking at company right now because percentages refer to a percent Of the company right so we don't have hard numbers because we don't know how large these companies are so now that we're actually looking at
this this percentage laid off isn't super relevant um really the one that's kind of more you know has better this is a better use for what we're looking at this total Aid off because again we don't know these sums we could we could look at like the average right um but again that just doesn't help us that much I don't think Um think we're going to really dive into that too much is my uh is my feeling now one thing that I would be really interested in is to kind of look at the progression of
layoff right uh you could call this a rolling sum so start at the very early EST of layoffs and do a rolling sum until the very end of these layoffs um and let's go to the bottom this is where it's going to start getting a little tougher um and there's you know we're just doing a little bit Of exploratory data analysis you know do digging into this a little bit you can go and dig into this as much as you'd like you don't have to just do what I'm doing but I'm just trying to show
you some stuff now let's try to do rolling total of layoffs um we could do that on the day although I feel like that's going to be way too many rows let's do it based off the month so right here in this month now let's see if we do just the month let's do something I'll show You the month and that's going to be an issue and I'll in my head I already know but let's look at it we could do something like select um from and let's get this there we go so if we
do um we'll do substring let me add a sem let's do substring and we want to pull out this month right here so we'll go 1 2 3 4 5 six so start at position six um and this is of course in the date column we'll start position Six and then we'll take two let's just run this really quickly and there's our month so this we could do this as month right um or like this is that correct yeah so as month so this is our month that we're doing it now if we group on
this and we do like something like a sum of total uh laid off I think that's the column and then we do a group Buy on this month so it' be like this right here we'll Do group by this let's try running this we should be able to do month as well let's try this real quick as well because I don't want to have this if I don't have to run it perfect so the months right here don't show us the year so if we're trying to get a rolling to total of just the month
it's actually could work fine when we actually implement the the rolling total use um you know a window function but the issue with this is it's Just going to show us month so this is 2020 this is January of 2020 2021 2022 2023 any other years we have it this is not a great rolling total what if we did one all the way to I want to say it's 7 six7 let's try this now this is going to give us a much better let's order this order by a one sending this is just our first
column so now uh well we should do it where it's not give me a second I'm I'm I'm figuring this out as we go we'll do Where uh the month write that where the month is not null I'm just going to get rid of that one and of course uh that doesn't work because we're looking at the substring so let's try doing this let's there we go um it just wasn't reading in that month that I was trying to use let's go down now here's what we're going to do is we want to take it
from the very first month and we're grouping everything so these are all the Layoffs from 2020 of 03 so that's March of 2020 and we have April May and these are the layoffs so this is really good this is exactly what I was imagining in my head so we want this this is just you know 12 months in a year and we go all the way to the bottom and I want to do a rolling sum of this so let's see how we can do that and we'll use this logic in a little bit let's
copy this and let's do select everything we'll do right here now what we actually Want to do now that I'm thinking about is we you want to take this data and we want to do the rolling sum based off this exact thing so we actually need to take uh this let's get rid of this and we'll do it with a CTE so we'll say with and we'll do rolling undor total that we'll say as and then we'll put this in here just like that so with rolling total as now we're going to say select and
we'll just do from here now what we need to do is we need to select The month so let's go ahead and select that month and we'll take it just like this so we'll select the month and we need to do a rolling total all we have to do for that is the sum of which colum we're doing let's actually change this real quick um we're going to call this as um total off I'm just going to keep it simple so the sum of total off so now we're doing that but we want To do
it over and all we need to add into here is an order by we're not going to Partition by anything because in here we already did a group by so it's you know kind of like partitioning it we just need to say order by we just need to order by the month I believe so let's try that and let's run it let's do that and we actually need to since we're doing um this we need this at the end and we can rename this if we'd like so we can do This as rolling uncore total
lowercase let's try running this and let's see what we get okay and this looks correct so starting in 2020 of 03 we had 9,000 layoffs then the next total we added onto here now this visually isn't the best I would like the month right here as well so let me actually add um let me create its own row put a comma here then right here I want to keep is total off so we can visually see better much Better okay so we have the month and as it goes down we're having more laid off now
this is our rolling total here's essentially how this works it starts with 9,000 628 then it adds on the next month which is 26,000 which equals 36,000 then it adds on the next month and we get 62 adds on the next month 69 right it keeps going all the way down this just shows each month how many were laid off and this shows a month-by-month progression all the way down to the Bottom so let's keep let's just you know take a look in 2020 of 03 we had 9,000 by the end of 2020 we had
about 81,000 or so then at the beginning right here all the way down to 2021 by the end of 2021 we only had 96,000 so 2021 was a good year it looks like um comparatively we had 90 80 well let me see 91,000 people let go and here we only have 96,000 let go so that's what our 81 it's only like 15,000 people that's like Nothing um comparatively then in 2022 uh things start ramping up dramatically it looks like we have um 12,000 people 17,000 16,000 and they're adding up it's going from 97 all the
way up to good night um right before the holidays in 2022 of this past year I mean we had uh 247,000 people so that's like 30 some thousand myth my math really bad it's like 150,000 and then we only have oh we have even more here actually and then we only Have the first three months of 2023 so these months right here were really devastating just around the world now we can also break this out potentially by country so we can see how many per country but this just around the world that's a lot of
people losing their jobs all the way up to 383,000 so in this range 383,000 from March of 2023 all the way back to March of 2020 lost their jobs and this is just reported I'm sure there Was uh you know much more than that this is that like companies larger companies that have like series a funding IPOs Etc um but a lot of small businesses went out of business um so we don't we don't have that information in this data set so I think that's what we're going to do next let kind of look at
the company maybe because I'm always interested in the company and actually earlier let's not do that one earlier we're looking at the company the sum of total aof let's Um let's bring this down let's run that CU that's what rolling total is by the way rolling totals are great I'm really good for visualizations as well um let's see yeah so I want to take a look at these companies but I want to see how much they were laying off per year so instead of just looking at it as a total will break it out by
the year now I'm just going to warn you this is probably going to this most likely be our last one in the in the lesson this is going To be probably our hardest one yet um potentially we'll see maybe the other one was earlier uh was harder earlier now let's use this kind of as um a starting point but what we're going to need to do is we want to take the company but I also want the date so I need to do a comma then date so we need our date here here and I'm
going to do that I need to group by the date as well so we'll do date and let's run this all right now this is just doing the you Know company and the exact date we don't want to do that let's actually do the year let's just look at the year I think that'll be plenty you could also do the exact same thing as we did above with the substring um although I think that's going to get a little Messier um Stu you know just a thought let's run this okay so now we're looking at
just the we're grouping by year let's order by uh let's say the company and we'll do that in sending there we go and let's Run this so now we have it open let's see who you know you can see people who made multiple layoffs this is in 2020 they let go of 200 and then in 2023 they let go of 155 this is a company I've never heard of so this is already looking really good now let's say we wanted to use this and what we want to do is we want to rank which years
they laid off the most employees now this is just a small uh sample we'll look at more just a little bit we can actually Look at um let's just do three uh three descending just like this should be large companies so you know some of these companies like Microsoft even Amazon right here and Amazon right there they let go of multiple or thousands of people in different years so I want to rank those I want to say you know the highest one uh based off of the laid off should be ranked number one that's the
year that They laid off the most people so let's go ahead and try to do that thing we need to do is uh do a CTE we'll start with that let me um add some more things down here so we're good to go so let's do we'll do with let's do uh companies this is going to be the company year underscore year we'll do it as and that's what this is going to be this is is our company year and we can do select everything from company year it's going to be the exact that we're
looking at Let's go ahead and run this okay so this is good now I do want to change these columns and I can do that right here we'll do company um let's call this years and then we'll do I'm going do total laid off again so total uncore laidor off this the sum right total laid off per year so let's go go ahead and run this now there we go we have company years and total laid off so this looks much better and what we're going to do is Select everything but we want to partition
it uh probably based off this years right here and then we want to rank it based off how many they laid off in that year so we'll get to see who laid off the most people per year cuz some companies like Amazon they let they let off multiple people per year what was it the highest per year that's kind of what we're going to look at um so we'll do Dore Rank and we're going to do that over now We're going to Partition by oops that's not how you spell partition Partition by we want to
Partition by the years so all of the 2021 layoffs will be in the same partition all the 2022 will be in the same partition and we'll do years and we want to also order by the total laid off now we want to do that in descending so we'll do total laid off descending and then we want to to um add this dense rank to it so let's try it let's run this good night that's a a big one so Let's take a look so in 2021 it looks like um or 2020 it looks like uber
had the highest and we want to take out these NES so let's do um where years let's say is not no and let's run that here we go so in 2020 and that's what we're partitioning on first it looks like this is one two three these are the top ones um and let's order by and let's do the let's order by the rank um first let's call this as bring it down what do we want to call This we'll call this as ranking there we go order by ranking a in there we go now we
have our ranking so in 2020 this is the biggest one of layoffs 2021 this is the biggest layoff I guess we'll have to take a look in meta in 2022 they had the biggest layoff and Google had the biggest layoff total uh for 2023 so this looks correct but I kind of want to filter on this ranking to be able to only filter Maybe the top like five um companies per year and I think we can do that let's actually get rid of this I think what we should do is we should add this as
another CTE and query off of that so now we'll call this company uncore yearor rank so now we have the year rank as we'll have our query oops have our query so now this is our company year rank so now if we do select everything thing from company your rank this we run it so now we have our Rankings let's come down now we have our rankings but I just want to filter it based off of that ranking we'll say uh where ranking is greater than or equal to let's say five we'll look at the
top five rankings let's run this and I said greater than I wanted uh less than run that that's look good okay so really quickly we have in 2020 we had these are the top five people who laid people off Uber Booking.com Groupon swiggy Airbnb in 2021 the largest layoff was bite dance which you think is Tik Tok right uh Cera Zillow uh yeah these are top five so 2021 or 2022 and 2023 were definitely the largest as well we have meta 11,000 people Amazon Cisco p and carvana as well as Phillips they tied that's why
we have the DSE ranking because some of these will be ties then we have Google uh in 2023 all the way down to Dell these are all ones I know Microsoft Ericson Amazon sale force and Dell so this is really really interesting just looking at a year-by-year snapshot right these are the total laid-off for each company and we could even go back and change this for like industry or you know really whatever we want to change this to this is just an interesting query in general to look at you know per year and we could
go back and change per month or lots of stuff we Can change in here but this is really interesting to me um it just looks like a lot of the large tech companies had some took some big L's took some big hits um let's recap this query really quickly in case you know it's tough to follow but we created this query up here and we were looking at the company by the year and how many people they let off then right over here we said with the company year we Chang these columns this is our
CTE so we created our first CTE then we went and we gave it a rank and we wanted to you know filter on that rank so we did this rank as another CT we just did a comma had a second CTE and we hit off the first CTE the company year which is right here so we hit off our first CTE to make this second CTE and then fin finally we um queried off of the final CTE definitely not an easy query to kind of think through and walk through but I hope you know you're
able to follow um because you know that's a a A really good query this is something I've definitely done in a real job when I was working with a lot of healthare data this is u a lot of stuff that I would do and so this is a you know pretty good um pretty good query to know how to do but with that being said uh we are done with this lesson I hope this wasn't too short I don't know how long I ran but um you know we looked at a lot of different stuff
let's go back to the top again we were just exploring the Data we looked at lay it off a lot um looked a lot at the company uh when these dates actually started for these layoffs in this data set we looked at the country the actual year of laid off uh then we went to a little bit more difficult things we looked at it per month so per month how many layoffs they had and then we did a rolling total this one was a pretty good one using that substring um I love substrings man they're
awesome Or lady they're awesome uh and then we came down here and we did the one we just did with multiple ctes in the company I think it was a a really really good solid project um combine that with that data cleaning project and man you got just a really good start with some MySQL projects and this one can be expanded upon don't stop where I stopped right let me go back up the top don't stop where I stopped right this data set has so much data in it you can do a lot Of different
things and even if you want to you could go and find these companies right over here and you could try to uh get their total uh total company that they had and you could use this column a lot more that'd be really interesting with some calculations there so with that being said that is the end of our exploratory data analysis project I hope you enjoyed it I hope you learned something both in the data cleaning project and in this exploratory data Analysis project that's what this is all about about and getting the confidence and gaining
the experience to create these projects and add those to your portfolios speaking of which if you haven't already check out my video on how to create a free portfolio website uh using GitHub awesome I highly recommend it you can add these to your portfolio so with that being said thank you so much for watching I really appreciate it if you like this video If You learned anything at all be sure to like And subscribe below check out my channel for tons of other videos just like this one and more I will see you in the
next video [Music]