hello and welcome back to the channel today we're continuing our dp600 exam preparation course and we're up to video9 designing and building semantic models now as you can see from the course plan we're making really good progress we just got a few more important sections to look at and today we're going to be starting powerbi and semantic modeling part of the exam specifically we're going to be looking at the different storage modes import mode direct query direct Lake we're also going to be looking at composite models and what we mean by that including aggregations as
well we're going to be looking at the large format data set and then we're going to be digging into a bit of a practical example in powerbi desktop defining different Dax measures we're going to be looking at functions iterators table filtering windowing information functions and some of the other more advanced or more recent features as well including calculation groups Dynamic strings field parameters that kind of thing as well as ever we're going to have five sample questions at the end of this video to test your knowledge now bear in mind that I wouldn't classify myself
as a powerbi developer I used powerbi quite a lot about six seven years ago recently I've been more focused around data engineering data science so I'll try and explain these Concepts as best possible but I definitely recommend doing your own research I'll leave a link to a lot of really good resources for this kind of thing in the school community so that you can go in a bit more detail there so first up we're going to be looking at storage modes now I've mentioned this fair amount on the channel and lots of people talk about
storage modes mod within powerbi and fabric so we're going to do a bit of a revision what we mean by different storage modes some of the advantages and disadvantages and how you can choose between them so this is the diagram that exists in the documentation I think it does a pretty good job at framing out three different storage modes connection modes you might also hear it called as well now if you've worked in powerbi for a while you're definitely going to be familiar with both the import mode this one in the middle plus the direct
qu mode potentially you might have used that as well at the top there so for those not coming from the PBI background the the import mode is basically going to take a copy of your data from source and load it into powerbi so it's storing a a copy of all your data in the actual powerbi data model itself now that makes it really fast when you're building ports cuz your data is right there now it does have some limitations in that because you're copying your data into powerbi there are some limitations around the size right
that's one of the main limitations on import mode and also because you're having to do this lift and shift import on a schedule normally your data in your powerbi data model is not going to be always up to date because you're going to have to do it every hour potentially there is the chance that your data will become a little bit stale on the other hand we have direct query so moving up to this top one here in direct query whenever the user views a particular visual particular report page power pill actually sends a query
back to your Source it's going to perform a query of that source and then get back fresh data so in doing so it's near real time so that's one of the benefits of direct query some of the downsides of that mean that we can't actually perform much transformation on that data it has to be transformed in the source right you have to create views and tables that already are transformed in practice direct query can be really slow because you've got to do that query every time you go back to the original data source and for
the user they're sitting and waiting for their visuals to up update every time you click a filter or you change the page it's going to take time it's going to be pretty bad user experience so the final one we've got here is the new one that came with pobi it's called direct Lake and direct Lake creates a connection between what you create in your powerbi report and the underlying parquet files so it's going to read the parket files in your one L environment directly so let's just have a look at potential reasons why you might
want to choose import mode or direct Lake mode or direct fre mode so some of the key considerations well as we mentioned for import mode it's going to be really good when your data is small enough to fit within a PBI data model including large format semantic models which we're going to talk about in a short while another good use case for import mode is when you want really good read performance in your dashboards you know like interactivity and a good user experience for your dashboard users when you don't have requirements for near realtime updates
and if you want to use calculated columns or calculated tables if you want to be doing that kind of stuff then you're going to be want to using import mode and if you want to combine data from multiple different data sources that's another good use case for import mode which me leads me nicely into when you would choose direct Lake mode well the first limitation really is that your data has to be stored in one fabric data store so in a lake house or a data warehouse for example you can't use direct Lake mode to
access data across lots of different data stores so that's one limitation the prime use Cas is when your data set is really really big we're talking tens or hundreds of gigabytes here now obviously that would be too big for most import mode models but that use case is really really good in direct late mode now with direct late mode it will require a little bit of a different skill set within your team because you're going to have to do a lot of the data modeling more Upstream in your Lakehouse in your data warehouse right because
you need to materialize parket files that can be read by the direct Lake mode connection and typically what that means is your data transformation your data modeling is going to have to be done in your lake house so either using spark or tcq and that might be a slightly different skill set to what you might have in a an import mode powerbi team for example okay so finally let's just talk about when you might want to choose direct query mode for your semantic models well again if you need near real time updates that's going to
be a really important one again you're going to be needing to do your data Transformations more Upstream so in your data source wherever that might be and direct query is also important part of what we call composite models which we're going to look at in more detail shortly so let's look at composite models then so a composite model combines one or more of these different connection modes that we just discussed previously now commonly this is a direct query fact table and import mode Dimension tables because if we think about the common characteristics of a fact
table versus a dimension table well in our fact table we're going to have a lot of rows normally a lot of data could be millions or even billions of rows and it's likely to be updated very often maybe every minute or every second even in some oltp transaction processing type fact tables they could be hundreds and hundreds of records every second now because of those characteristics direct query can be a good match for that type of data set right because you get near real time updates so on data sets that are changing very often and
very fast direct query gives you that near real time access to fresh data right now with their Dimensions they might be changing a lot slower so it makes sense to use import mode for those Dimensions you know your product table might be updated once per day for example so a direct query connection mode in that example wouldn't really make too much sense because you're going to lead to user experience issues on the front end for that table and you're not going to be getting much benefit cu the underline data isn't really changing very often now
another benefit of composite models is that they provide a way to model many to many relationships without the need for bridge tables as well so next up we're going to look at aggregations now in this context we're talking about a specific powerbi feature for managing aggregation and specifically what this feature does in powerbi is it takes a really large data set it creates a aggregation either automatically or us a generated you can assign the aggregation that you want to build and then it caches the actual aggregation so when you have really large data models it
can improve the performance CU you're caching the aggregation rather than loading in you know a really long fact table for example and because of that they're often used in conjunction with composite models now either you can create the actual aggregation itself in your data source and then just pull it into to your powerbi data model or you can bring it into powerbi and then use the power query engine to create an aggregation which then gets loaded into your powerbi engine now as I mentioned there's two really types of aggregation we have userdefined aggregations and that's
using the the manage aggregations dialogue in power your desktop to Define these aggregations for a specific aggregation column and then you choose how you want to summarize do you want a Min a Max that kind of thing plus the detail table and detail column properties now if you have access to a premium subcription powerbi then you also get automatic aggregations and these are basically going to use machine learning to try and optimize direct query semantic models and they're going to look for the best aggregation to improve performance another thing we need to be aware of
for this part of the exam is the large format semantic model now large format semantic models provide a highly compressed in memory cache for optimized query performance enabling fast user interactivity so if you've got a semantic model that's perhaps bigger than 10 20 30 GB what you can do is you can convert that small format semantic model into a large format semantic model it's going to apply this compression this in-memory caching it's going to improve the performance of those models now it's not just models that are over 10 GB in size where you might want
to think about converting to a large format semantic model in many cases even below that kind of 10 GB threshold there's some benefits in converting to a large format scientic model firstly you're going to get the performance benefits anyway secondly it's commonly used when connecting to to third party tools via the xmla endpoint now another feature of large semantic models is ond demand loading now if you watch some of my previous videos you know that direct L connection mode also uses this on demand loading and what that means is that when a user is viewing
a particular page in a report they don't need the full data set to be loaded into memory on demand loading has a look at the underline par files and only loads the required data that is needed to visualize the data that's being requested at that specific time now that can really improve performance again but it's a feature that's shared between large format semantic models and the direct L connection mode as well okay so for the next part of this video we're going to switch over to powerbi desktop and we're going to be using this to
explain some of the key things that we need to know for this module in the exam so we're going to be looking specifically at variables iterators table filtering window functions information functions calculation groups Dynamic strings and field parameters and each of these different features and Dax Expressions I've got a little bit of an example just to talk you through the implementation what that looks like so let's start off with variables now variables are very common in pretty much every programming language that exists and Dax variables can help us avoid code repetition and also potentially improve
the performance of your Dax code too so here what we've done is we've created two variables one called total revenue and one called total days you notice the syntax here is to use V to declare it as a variable and then it stores that result locally and then you can use it later on in your Dax expression typically you'll need to use a return statement as well when we specify these variables and so in this example we declaring total revenue and total days as variables then we're using that in this return statement to give us
the overall average revenue per day next we're going to look at iterator functions and iterator functions in powerbi basically enumerate through all of the rows in a table and they perform some calculation depending on the specific iterator function that you choose and then it's going to aggregate the result now examples include sumx count X average X most of them have this x afterwards and here we've got a bit of an example here to Showcase iterator functions and we're using it in this cumulative measure so we've got this cumulative revenue and we're doing sum X okay
and we us using it on this filter so what we're basically saying is for each of the rows in our date table we're going to go one by one and for each of them we're going to increase the number of rows that are being filtered right so the first row here we're basically comparing the the current date or the the date in the row that we're interested in with the max date which on the first pass when we're enumerating through this table there's only going to be one row the top row right and then we're
calculating the sum of the revenue on that particular date now on the second row obviously this is going to increase the two rows so then we're going to do a cumulative revenue for the revenue on the first and the second row then we're going to go down to the third row in that table and it's going to give us the sum of the revenue on the first second and third rows so when we do this through the whole table the result is this cumulative chart of Revenue basically from the first date here all the way
through to the last date which is about 19 billion in revenue on the last date in our data set which is 31st of May 2020 next up we have table filtering now table filtering uses the function filter and it basically returns a table that represents a subset of another table or expression that you're using so in this measure we're calculating total revenue by category and so we're using the calculate function and we're passing in the sum of the revenue and then we're filtering it by specific product names so what this is going to do is
create us this chart right so we get Revenue figures for each particular category because we're doing this filtering of particular product names so like Audi TARTA Hyundai these are all product names and our filter expression here is basically filtering out the product name where is equal to the selected value of product name next up we're going to look at window functions and there's actually three functions in Dax that a class as window functions you have the window function called window index and offset we're going to be focusing on the window function in this example and
some of the use cases where you might want to use a window function well if you've used window functions maybe in SQL the result is quite similar the way that you implement it is quite a bit different actually so a use case might be for things like rolling averages so if you want to create a 3mon moving average of revenue for example you might want to use window functions obviously there's lots of ways to do moving averages in powerbi in IND window functions is one of them or our window can actually be a categ variable
so say for example the average revenue for each department in a company so if we take a look at the documentation for the window function at a really high level basically what it's going to do is return multiple rows which are positioned within a given interval for example we're going to give it a from and a to window basically and it's going to return the rows that meet that criteria right between the from and the two dat now there's a few other par maybe to be aware of here including from type and to type so
here we can specify either absolute or relative values so absolute is just going to take the whole table from top to bottom and pick the absolute value in your from uh parameter here or do you want it to be relative right so relative so if you pick a from type of relative rather than looking at all the values in this table and picking the first or the second or 90th value from top to bottom the relative is going to look at the current date and then look at maybe minus 10 might be a relative from
parameter now on its own the window function is not particularly useful normally how it's used if we look scroll down to an example here it's going to be used in combination with some other sort of measure typically as you can see here it's used in this iterator function so we're using the window function to get a window of data and then we're applying average X on through that window and that's to return the 3-day average price so that's a bit of an example of how you can use the window function in practice so information functions
are another class of functions that exist within the Dax language and and there's a lot of examples of information function you've probably used them if you've used Dax before like contains or contain string or has one value is blank is error selected measure user principal name as well it's basically going to check a particular value in your table and return depending on the information function that you use normally it's a Boolean value but sometimes it's something else like user principal name it's not actually looking in a table in that point it's looking at your actual
powerbi file and looking at the logged in user so in this example here we've got a table very simple table of simple transactions we've got transaction IDs and some revenues and we're using this function here is blank which is an information function and we're basically using it in an if statement so if this is blank returns true then obviously we're going to use no Revenue recorded if it returns false FAL so I there is some value in that column then we're just going to write out Revenue recorded and then we get this sort of table
here next up we have calculation groups and it provides a simple way to reduce the number of measures in a model or at least the maintenance of those measures in a model so say you want to create daily average monthly average and then yearly average measures and you might want to do this for revenue and you might want to do this for cost and you might want to do this for salaries there's a lot of repetition that you're going to have to do in this Dax code so what we can do into calculation groups is
basically parameterize those Dax measures so that the actual maintainable code that you're writing is a lot less now you can create these now in power desktop as of a few months ago and also in TBL editor as well I've got an example here where I've created this calculation group and it's called averages now to have a look at our calculation group let's just go over to the modeling Tab and then you can see here we've got calculation groups averages and we've got some different calculation items so obviously to create a new one you can create
a new calculation group here now in this calculation group I've actually got three calculation items the first one is just the total which is just selected measure that's kind of like your your Baseline measure and then we're going to reuse that or we're going to call that within other measures so like the daily average is going to call selected measure here the monthly measure is also going to call selected measure but this time on the month next up we're going to look at Dynamics string formatting and this is a pretty cool one it basically allows
you to apply string formatting on numerical measures without updating the underlying data type underneath so it can remain as a numeric measure in this example here we've created this measure called Dynamic format measure we're using this sum Revenue just as an example here in our example what we're doing is giving a few different options in our switch statement so if it's less than a th we're not going to do any formatting at all if it's between between a th000 and a million we're going to add in this K so th000 we're going to add in
an M if it's in the million ranges and billion if it's in the billion ranges basically now the benefit of this is that well in our tables and in our this is just a card it's going to show a much nicer presentation of that number 5246 million rather than lots and lots of numbers and the big benefit is that the difference between this and the modeling tab format string so if we were to click on a specific Revenue number here for example so we can maintain our format as whole number but with also formatting the
presentation right and this is important maintaining this format as whole number because we might want to do something like visualize this data within a chart so the final feature we are going to look at today in powerbi is these field parameters and field parameters allow the report user so the end user here is going to be coming into your report to select different categoric variables and also measures as well kind of dynamically so depending on the way in which they want to slice the data you can set up field parameters to give them this flexibility
so in this example here we've got a number of different parameters and we're looking at the revenue by different parameters So currently we're looking at by location but you might also want to look at by dealer or by country or by model or by products and you're giving the end user the flexibility here to decide now to set up field parameters you can go to the modeling tab new parameters fields and in that way you can set up a new field parameter just give it a name and then pass in the different parameters that you
want you also rename them if you want here as well that's going to set up your field parameters and then in the actual visuals it's going to create this parameter maybe I could prove the naming here but it's basically going to look like this so it's just going to have this object here and it's going to say location is name of this it's going to give it an index here 0 1 2 3 so that when you click on the filter it's going to update the visual and on the visual side again we've got on
the y- AIS just this parameter that we created and our measure that we want to visualize so sum of Revenue now in this instance we've got a field parameter in the y axis but we could also have a field parameter for the revenue as well maybe you wanted to do some of the revenue average revenue median Revenue whatever you would want and you want to give the user flexibility to change these dynamically that's how you would do that there okay so we've been through a lot in this video now let's test your knowledge of what
we've been through and some of the questions that you might face in this section of the exam question one the Dax expression average X is an example of a an information function b a calculation Group C table filtering d a window function or E and iterator function pause the video here have a little think and then I'll reveal the answer to you shortly so the the answer here is obviously an iterator function now the big clue here is the X at the end of average X which generally denotes an iterator function of course an iterator
function is those ones where we're going to be enumerating through every row in a particular table and Performing some calculation before combining the results in some way depending on how you set up your iterator function it's not going to be information functions it's not going to be a calculation group table filtering here well you might actually do some table filtering within your iterator function but average X itself is not table filtering function similarly with window functioning again you might use an iterator function average X within a window function but average X itself is not actually
a window function question two on demand loading I loading only the data is needed for a particular query is a feature of which two are the following a import mode B direct Lake mode C direct query mode D large format semantic models or E the xmla endpoint the answer here is B direct Lake mode and D large format semantic models so we mentioned this when we were going through the slides this feature on Dem man loading is actually shared by two of these modes here so direct late mode and also it's a feature included in
large format semantic models as well on demand loading doesn't really make sense in import mode because in import mode we've got the full data set there for us to query anyway now you could argue that direct query mode what it's actually doing is very similar to On Demand loading whenever you get a request for a query you're actually going back to the data source and you're querying that data source directly and then loading in only the data that's needed whatever comes back from that database query but I do think there is a distinction between what
direct query mode is doing and specific feature called On Demand loading and I do think these two are slightly separate so although direct query does a similar job it's not actually leveraging on demand loading and the xmla end point e is just not the correct answer question three Dynamic format overcome which significant limitation that comes from using the Dax format function is it a the format function is slow on large data sets B the format function returns a string value so the values can't be used in chart visuals is it C the format function can't
handle date local conversion whilst formatting or is it D the format function can't be used with field parameters so the answer here is B the format function returns a string value so the values can't be used in chart visuals one of the major benefits of dynamic format strings is that you actually retain the original data type for that particular column so if you've got a numeric data type maybe you want to format some millions or billions in that numeric data type you can create a dynamic format string that's going to visually format the string but
the underlying data type Still Remains numeric so you can use that field within charts right your maybe want a Time series chart that wouldn't be possible if you used the format string because that's going to return a string value and you can't visualize a string value in a chart like that so the answer here is B question four the Dax expression selected measure is most likely found in the construction of which of the following is it a a calculation item in a calculation Group B field parameters C an iterator function D large format semantic models
or e a window function now the answer here is a a calculation item as part of calculation group so as you remember you need to have that selected measure that's what makes it Dynamic and parameterizable let's say is that selected measure expression it's not part of field parameters or iterator functions or window functions large format sematic models well of course you could have a calculation group within your large format sematic model but the most likely place that you're going to find this because it's pretty much necessary is in that calculation item when you're creating calculation
groups question which of the following is an irreversible operation which means it can't be changed afterwards right you can't go backwards is it a changing the cross filtering of a relationship to bidirectional B changing the storage mode of a table to import C naming a calculation Group B converting a semantic model into a large format semantic model C creating a window function so the answer here is B changing the storage mode of a table to import mode now this assumes that the original storage mode of that table was direct query then if we're moving it
back to import mode that is an irreversible operation I'll leave link to the documentation there that kind of specifies where that's the case obviously with a the cross filtering of a relationship to bidirectional we can change the cross filtering that's not a problem of a particular relationship we can also rename calculation groups creating a window function doesn't particularly make sense because of course you can just delete the window function or delete the the measure now converting a semantic model into a large format semantic model is an interesting one I was under the impression that this
also was an irreversible operation but then when I was actually going to research it and test it out I could actually convert a large format semantic model back into a small semantic model so for that reason I've included it as false but let me know if you think that D is also a correct answer for this question congratulations that's the first part of the semantic modeling section of this exam complete in the next lesson we're going to be looking at model optimization and security so make sure you click here to join us for for the
next lesson I'll see you there