hello and welcome to video 12 in this dp600 exam preparation course and we've made it this is the final video in the study guide we're going to be looking at querying data by using tsql now you'll notice that I've added an extra video there CU I really wanted to give you one more video just to really help you prepare for the exam so we'll be going through how to prepare for the exam some other resources that I think you should take a look at before you take the exam and some advice for when you're actually
sitting the exam as well but we'll be going through that in the next video for this video we're going to be focusing on these three sections of the study guide so we're going to be looking at querying the Lakehouse and the data warehouse using tsql and we're also going to be looking at the visual query editor which is a feature of both of those two SQL endpoints as well finally we'll be mentioning how to connect to and query data sets using the xmla endpoint as ever I've released study notes to help you go a little
bit deeper just to make sure that you're covering off all the right points and links to other further resources if you want to go a bit deeper on whatever I've mentioned in this section of the study guide as ever I've also got five sample questions to test your knowledge at the end of this video so let's just start by looking at the different ways that we can access the TC core engine within fabric has a few different ways to be aware of so one of them is the Lakehouse TC core endpoint now one thing to
bear in mind here as we've already mentioned quite a lot already is that this is read only so all you can really do here is Select statements ddl that kind of thing you can't do any sort of inserts updates deletes or that kind of stuff from the tsql endpoint in The Lakehouse now the more obvious place to do tsql is within the fabric data warehouse here you're going to have the opportunity to write tsql both ddl DML insert updates deletes select statements all of that stuff so the data warehouse is going to give you the
ultimate flexibility really to write tcq scripts within fabric now on top of the tcq query editor you can actually also create tsql like queries using the visual query editor and this is quite similar to the dataflow Gen 2 if you've ever used the visual editor in a data flow so we can do things like merging different tables and filtering tables and adding additional columns that kind of thing but we can do it through a no code visual interface and we'll be taking a look at that in a bit more detail shortly so the other option
for writing seel is via the xmla endpoint and as we've mentioned previously in this course to connect to that xmla endpoint we need to go into our workspace settings as you can see on the left here grab the connection string go into SSM m in this example connect via the analysis Services server type pass in your xmla endpoint in there that's going to bring through all of our lake houses and our warehouses within that workspace and then we can write different queries depending on your use case from that xmla endpoint so now that we have
a good understanding of where we can write tsql for most of this exam you need to have a pretty good level of tsql at least in appreciation for what a lot of different t tql functions do and be able to at least read tsql quite well and I don't think there's kind of a definitive list of which tsql functions you need to be familiar with but I definitely recommend being comfortable with the following so the difference between where and having group by summarizations Union and Union all different joins and when to use them Common Table
Expressions things like lead and lag row number partitioning that kind of thing subqueries and cross Warehouse queries as well so rather than just describing all these things I think it would be better to jump into fabric open up a data warehouse and show you some of these functions in action okay so here we are in SQL Server management studio and I've connected to one of my gold data warehouses here called DW gold using the SQL connection string and you can see that currently we don't actually have any data in this data warehouse the tables are
empty so the first thing that we're going to do do is just to bring some data from some other tables that we've got into this data warehouse so to do that I'm just going to be using this Seas so create table as select so it's going to enable us to create a new table in this data warehouse using existing data in another data warehouse or in this example it's actually a lake house so we're connecting to the SQL endpoint here so this is an example of cross database querying because what we're doing here is we're
creating a table from another Lakehouse Al together so we're getting the all of the data from this db. Revenue table in our Lakehouse bronze and we're creating a new table called dbo fact Revenue so then if we refresh these tables we should now have the first one which is dbo fact Revenue this one here and we can do the same for dim date and dim Branch we're going to be using these data sets just to show some of the functions that you need to be familiar with for the exam we going to cover all of
them because to be honest I'm not sure exactly the full breadth and depth of what is expected for the exam for tsql but we're going to go over some common types of problems that you might see in the exam so now if we refresh our tables again so now we've got our dbbo fact Revenue our dim dates and our dim Branch so let's just start by visualizing our data here so specifically I'm going to be looking at this fact revenue and these data sets come from the same data set that we used actually in a
different part of this exam preparation course it's relating to Car Sales and car dealerships so you can see in our fact Revenue table we've got a revenue figure and we've got a date ID column and a branch ID column here and that's what we're going to be using for the rest of this analysis so what I'm going to be doing is presenting you with a series of problems and then we're going to walk through how you might solve that in SE we'll be starting off quite simple and then we'll be adding in more and more
functionality as we go through so what if we wanted to calculate the top five branches by total revenue so to do that we're going to be needing to perform some sort of aggregate so this is what our fact table looks like currently we've got Branch ID and revenue and what we want to be doing is calculating the top five branches so what we could do is just select the top five here Branch IDs some of the revenue so what we're doing here is doing a simple group Buy on the branch ID because we're looking for
the top five branches and we're going to order it by the sum of the revenue so this is the aggregate calculation that we're running on this aggregate here it's a sum of the revenue and we're ordering it by the sum of the revenue descending so we're getting the top five now let's change the problem a little bit and maybe we want to return only the top five branches in Spain or maybe the top three branches now Spain is a country name that comes from a different table in our data set so what we've added in
this example is an inner join on dim branch and we're joining on the branch ID because we have the branch ID in both data sets we're using an inner join here because we only want to return data for which we have both keys again we're aggregating by the sum of the revenue this time we've actually brought through a few different columns from that Branch Dimension table so let's just run it all and see what we get here okay so this query has returned the top three and what we've done is we've filtered this results for
only country names that are equal to Spain now we've had to add in some new things into this group by selection because we've also got country name and Branch name mentioned here so we've also brought brought through the branch name so that we can just you know we've got more than the branch ID maybe you don't not familiar with the branch IDs you want the actual Branch names in this example and we can see that that has brought through the top three branches that are in Spain by the total revenue so that has sorted that
problem but what if we wanted to filter after the aggregate so for example give me all of the branches that have a revenue of greater than some amount so here we're not going to be doing the wear statement because we want to be using having so when you want to be doing filtering after the aggregate or on that aggregate value then we're going to be wanting to use having and that's going to come after our group by statement so pre previously the we statement here CU we were kind of prefiltering on this country name now
we're looking for the results of an aggregation that are greater than in this example 5 million or 50 million so we've got a very similar setup we've removed the wear statement because now we're not just interested in Spain we want all the branches and we're going to use this having some of the revenue greater than 50 million so here you can see it's returned only two branches which makes sense what if we change this having statement removed one of the zeros yeah so here we're just looking at 5 million so when we've got 50 million
there's only actually two branches that have more than 50 million Revenue over this time period or over the full data set in that factor table now in this example we've been using the inner join because that's what we wanted for this specific use case but for the exam you're going to be need to be familiar with left join right join in a join F outter join all of these different join types and when they are useful for different scenarios now we're not going to be going through all of these here CU there's quite a lot
to go through but I definitely recommend you know if you're not familiar with these things learning the differences between these and when you might want to use one or the other so another thing that you might need to be familiar with for the exam is comment table Expressions now these are really heavily used in the world of SQL when you're creating views or things like that so you need to be familiar with how you construct a comment table expression what the the key words are what the general structure is and that kind of thing here
in general the Comon table expression allows us to Define kind of like variables that we can use later on in our script so here we've got these three lines of the query we selecting the top five Branch IDs and summing the revenue we're grouping by that Branch ID so it's similar to what we saw before and so we can actually just select these three rows and have a look at what that returns us and then what we're doing is we're kind of saving that in this variable or this commentable expression called top five rev and
the syntax here is always going to start with with which is the key word for a commentable expression you're going to give it a name and then as Open brackets put in your expression within those brackets and then we can use this top five rev later on in our query now one of the other benefits of Comon table Expressions is that we can create more than one of these so for any subsequent Expressions that we declare firstly we're going to need a comma there and then we can call a second one branches so for subsequent
ones we don't need the with keyword we only need that on the first one and so here I've defined branches as select Branch ID and Branch name from dbo dim Branch so this is what this one looks like so we're just getting the branch ID and the branch name and we're storing it as this branches and then the final kind of section of a CTE is the select statement so you're always going to need to return something from this Common Table expression and we can do that with a select statement so as we can see
here we're selecting the branch ID the branch name and the total revenue and these are coming from top five rev so that's our keyword for our first expression that we defined up here and we're joining it with our branches which is our second one here we're giving it these aliases and to run this we have to select all of the different sections and then press execute and we get this result here so we've got the top five branches by revenue and then we've joined that on the last section of our CTE back to the branch
data set now obviously this could also be written in a different way you probably don't especially need a CTE to return that result but I just wanted to show you the structure of a comment table expression because you might get asked about that in the exam or you might get shown some code that is in this format and you need to understand how it works another thing that you might need to be aware of for the exam are the lag and the lead function so this is a way of creating new columns that reference other
columns but with some specified offset so probably worthwhile just taking a look at a bit of an example here so what I've done is I've defined another Common Table expression here so in the first part we're getting the top five branches and then we're dividing this Revenue by 1 million just to give us you know some a bit easier values to comprehend and we're getting the floor of that division so it's going to round down two single digits in this case and and we're defining that column as Revenue in millions just to make it easier
to understand what's going on with this lag and Lead function that we're going to introduce shortly so this is what our data set looks like we've got five branches and we've got rev which are all singled digigit whole numbers now so we've stored that as rev T and next we're going to introduce a lag function over this data set so we got select Branch ID and rev M which is looks like this and we're going to add on another column into this using this lag function so what the lag function is going to do if
I just run this let's just start with an offset of one and we rerun this so the lag function is going to look at the column which you pass in which is revenue M and it's going to look at the value of the row number minus the offset so for row number one here it's going to look at row zero it doesn't exist so that will return null but for row two it's going to look for the value in row one of the column that we pass into the function now another thing that we've done
here is we've ordered it by this rev M so the order here has changed it's now in ascending order from 1 2 2 3 6 like so and we can change this lag function to anything we want so here we're doing it with a lag of two so the offset parameter here is going to be two so it's going to introduce another null cuz now we're offsetting by two and it looks like this now our handin hand with the lag function we also have the lead function which looks in the opposite direction really so rather
than looking at the lag so Looking Back In Time the lead function is going to look forward in time or at least forward in your row index so it's actually going to start on this row here now let's just change this back to a offset of one so we have and let's just change this to lead so when we run the lead function here on the same column Revenue M with an offset set of one this is going to be the result here so now our first value in this lead call column is actually going
to be the value here similarly this value is going to be here this value is three six comes from here and now we have an old value at the end of our data set because we don't have you know a sixth Row from which to pull that data from now one thing to bear in mind is that we can't do something like this so we can't do a lead or a lag with an offset of minus one the offset parameter cannot be a negative value so bear that in mind now another function to be aware
of is the row number and partitioning so row number as a name suggests basically runs through your data set and sequentially numbers your rows okay so let's just run this first section of code just to have a look at what this is doing here so here we've defined our row number and we've given it this name row num and we've also passed in a partition value so what it's going to do is it's going to group all of our dealer IDE s so here you can see in the output here all of these dealer IDs
so dealer ID DLR 001 this first one all of these values are within that partition okay they're all that same dealer ID and the row number is going to assign a row number based on the order so the revenue so these are all ordered in Revenue ascending order and then we're creating our row number one to8 within this partition then you'll notice for the next dealer ID DLR 002 now we've got 10 values within this partition ordered in the same way and we've got row numbers defined here from 1 through 10 so that's this section
within the CTE what I've done is I've just selected star from Parts which is the name of our CTE and I've just done a bit of a wear statement just to only get two of the dealer IDs back so here you can see this example here we've got DLR 001 which is this one and we got DLR 0017 which is this one so make sure you understand row numbering and partitioning and ordering by for the exam because this is something that could easily come up now the last section I wanted to mention or the last
topic that I wanted to cover is subqueries so for all of the other examples within this tutorial or this video so far we've been using select star from table right so if we go back up here we got select all of this stuff from dbo fact Revenue now a subquery basically doesn't have that structure of Select star from dbo do table instead we can actually open up a brackets here and rather than doing a whole table or getting the data from the whole table we can create a subquery that's going to basically prefilter or do
some sort of tsql query return the results of that subquery back up to the top level here and we've got to give it this Alias of sub doesn't have to be sub but that's just what I've called it here and we'll see the results there so what this is doing is first going to calculate this so whatever is in our brackets we're going to get the revenue figures just for these two dealer IDs and then we're going to get select star from that result so again this is just a bit of a toy example just
to show you subqueries in a bit of action at least the structure of a subquery just so that if you come across it in the exam you know what that is okay so here we are in a data warehouse and it's the same data warehouse that we were using for the first part of this video the DW gold so we got our fact new table our dim date and dim branch in here now specifically I want to show you the visual query editor so we can access it by click on a new visual query and
immediately we're going to get this introduction here to build a visual query need to drag some tables onto our canvas here so let's just drag on the fact revenue and we'll see what we get so the visual query engine basically tries to make it as simple as possible to transform your data that is in your data warehouse or ALS also in the Lakehouse tsql endpoint as well so we've got our fact Revenue table so what can we actually do with it here well you can see along the top here we've got some kind of quick
options for choosing specific columns removing columns that kind of thing filtering so removing certain rows sorting rows transforming we can do group buys we can also do merging and appending different data sets together so if we want to drag another one of these onto the canvas as well then we can kind of combine using either merge or append but we're not going to do that in this example now another thing that you can do within the visual query engine is to click on this plus button here and we get access to a few more commands
so we've got all of the ones that are available in that top menu plus we've also got some transformation so we can do some text Transformations on a specific column we can do length finding the First characters adding columns using for example a conditional column or column from example examples so if you're used to using the data flow power query engine you might be familiar with creating a new column from examples and we can also do some adding column from text using these methods here so say for example maybe I want to do a group
Buy on this fact Revenue table and I want to get the median value just going to change this for the median and I want it on the revenue column and I want to group by our branches so we also have this fuzzy grouping as well so if this column is not particularly good quality you might want to add in some fuzzy grouping which is basically going to look at likeness or similarity between different values in that column and if it's above a certain threshold so they're very similar so maybe there's just one character that's different
it's going to add it into the same group so that's what fuzzy grouping would be we don't want to enable that there because we've already managed our data quality so if we do an okay here you can see it's add in this step here so now along the bottom you can see the result has actually updated so we've got this group by the branch ID and we've got our new column which is the median value within that aggregate basically so say I wanted to I just remove this one because we don't don't want to be
doing anything with our dim date we got our fat Revenue now and what we can do is either save as table so this is going to save the results as a new table or we can save it as a view so you can see that this is actually gray out here cuz it's you can't save this as a view because the query fact revenue is not supported as a warehouse view since it cannot be fully translated to SQL and the reason for that is because we've used this median and median is not actually a function
in tsql so what we can instead do is rather than grouping by and aggregating on that median value we change this to sum I expect that yeah so now that error message or that warning message has now gone away we might want to also change this to sum of the revenue and then then we can save it as a SQL view so maybe we want to do some of the revenue aggregation give our viewer name it also gives you the SQL statement for that view which we can copy to the clipboard if we want and
we just save it as a view which you can query from a powerbi semantic model obviously acknowledging the fact that with a view it's going to fall back to direct query mode if you're using direct Lake mode to access this data so that was just a tour of the visual query in the data warehouse think for the exam just understand what it's capable of have a look at the different functionality here understand what you can do here because you might get a question or two about the visual query engine okay so let's just round up
this video by looking at some practice questions that you could expect for this section of the exam question one the following SQL scripts creates the results shown below what is function in this example so we're doing select Branch ID column one and then your function which you have to work out what that function is from sales and returning that table below so take a look at the answers on the right hand side have a little think about what that function would be to produce the results in the table at the bottom of the page pause
the video here have a think and then I'll reveal the answer to you shortly so the answer here is a we want to be using a lag function because we can see that what we're trying to do is make a transformation from column colum one to column 2 and we can see the pattern there that on Row three in column 2 there's obviously an offset going on there and we can see that the offset is two because Row three in column 2 maps to row one in column 1 and we know there going to be
a lag function because column 3 is actually two rows behind what's on column one and you can see that replicates below with rows four and five as well and another clue here is that we've got two values at the Top If You Got null values at the top it's going to be a lag function because it doesn't have a value for that Row one in column 2 and row two in column two so that one's always going to be null so we know there got to be a lag function and we know that the offset
is two so it can't be B which is a lead function the lead function is going to look ahead of time rather than looking back in time and the bottom two there have an offset of minus two so the offset in a lagon a lead is always positive so those two would not be correct either question two you have the following query analyzing sales data for various products your goal is to analyze the sales data by product name and year but only for the products that have a yearly sales amount of more than $50,000 how
would you complete this query so take a look at the different options there have a think about how you would come to that conclusion that this question is asking for and I'll reveal the answer to you shortly so the answer here that we we're looking for is B so the question here is asking you to analyze sales data by product name and by year so that is the first clue here in our group by we need to be grouping by the product name and the year not the product key and the date key because that
wouldn't be the right aggregation for what the question is asking here now the second part of the question is we're asking for products that have a yearly sales amount of more than 50,000 so the yearly sales amount of more than 50,000 that's going to come from our sum of the sales amount and the sum of the sales amount is going to be calculated during that aggregation so we need to be using having here what we're trying to do here is filter out the result of the aggregate we're not filtering out before the aggregate because that
would just filter out individual rows in this table we want to be looking at the result of an aggregate function where the sum of the sales amount not just individual sales amounts the sum of the sales amount is more than $50,000 so the answer here is B we need the group buy product name and year and then having we need to use the having statement to only return the product names with a yearly sales of more than 50,000 question three you're trying to inspect a join between two tables to spot referential Integrity violations which of
the following tsql join types would be easiest to identify keys on both sides of the join that do not have a match on on the other side of the join is it a left join B right join C inner join D full outer join e cross join so the answer here is going to be the full alter join so again there's two parts to this question really firstly we need to understand what a referential Integrity violation is so referential Integrity is when you're joining two tables together obviously you're going to be joining on a specific
key now a referential Integrity violation would occur when one of the keys in the left hand table is not in the right hand data set and vice versa as well so to be able to identify that using a SQL join we're going to be needing to use the full outer join because the full outer join is going to bring back firstly the instances where those keys do match and then secondly all of the other results that don't match from both tables so when we get this back we're going to bring back some null value use
where there isn't an appropriate join or isn't appropriate match on that join for the table on the left and equally the same on the table on the right so that's going to help us identify referential Integrity violations by investigating where there are null values in that output the left join and the right join they're not going to help us here because we need to identify violations on both sides of that join the inner join is only going to return the matching keys from both sides and the cross joint is just going to return us all
the different combinations of the keys that exist in these tabl so that's not going to help us really identify referential Integrity violations question four you have two warehouses Warehouse 1 and Warehouse 2 you want to create a SQL view in Warehouse 2 that combines data from both warehouses Your solution should minimize development effort which solution do you recommend is it a use a data pipeline copy activity to copy the table in Warehouse 1 to Warehouse 2 B create a shortcut from Warehouse 1 to Warehouse 2 to perform the query C use cross database querying between
Warehouse 1 and Warehouse 2 or D use a data flow Gen 2 to read the table in Warehouse 1 and set Warehouse 2 as the destination so the answer here is C use cross database querying between Warehouse 1 and Warehouse 2 now within a warehouse we can query any other Warehouse within the same workpace so this would be the most economical or require the least amount of effort cuz we can just query directly the data the table Warehouse 1 from Warehouse 2 B would be incorrect create a shortcut because we can't actually shortcut from a
warehouse one to Warehouse 2 functionality doesn't exist at the moment and the data pipeline copy activity and the data flow Gen 2 wouldn't minimize the development effort so that would technically meet the requirements apart from the requirement that says your solution should minimize development effort and this is something that you might see on quite a few questions within the exam so when you see that it's a bit of a flag that you should always look for the most efficient way of doing things or normally there's a method that requires little to no effort against some
that require more development effort so the answer here is C using cross database query question five you're using the tql query editor and your goal is to add a new column to your data set called salary bins and this column is going to bin a continuous salary variable into three different bins less than 30,000 between 30,000 and 60,000 and more than 60,000 which functionality should you use to add this new column A add additional column B add column from examples from all columns C add column from examples from selection D duplicate the column e add
column from text so the answer here is a add a conditional column so when we're in the TC core visual query editor there's a few different options there to help plus add columns and for this specific use case creating a new column which is going to bin the values in a specific column we're going to be wanting to use the conditional column because that's where we can add in the logic for less than less than or equal to and more than and we can add more than one conditions on that column which would help us
achieve our goal of creating these bins in this salary bins column now add columns from examples so these two are functionality within the tsql visual editor but it's going to be very difficult to implement that logic from a columns from examples it's not really a good use case for that type of adding column functionality and D duplicating the column well that's not going to do much that's just going to duplicate the column and similarly e adding a column from text that's also not going to achieve what we're looking for so the answer here is a
adding a conditional column we did it congratulations we made it to the end of the study guide this is video 12 out of 12 and we've covered an awful lot of ground over the last 6 weeks so well done for sticking with it it's a very tough exam we've covered a lot of different things this exam covers a very wide range of topic from tsql py spark Dax all of the different planning and all of that sort of things so to get this far is a really good job now as a bonus I'll be recording
one more video in this series basically to help you prepare to take the exam so things you need to know before the exam when you're booking your exam and whil you're in the exam to try and help you get as good a score as possible so thank you very much for joining us in this series I'll see you in the next video which will be the final video I look forward to seeing you all there