hello and welcome back to the channel today we're going to be continuing our DP 600 exam preparation course and we've made it to video 11 of 12 in this series and today we're going to be looking at performing exploratory data analytics now there's this video and then one more video to go so we're very close to the end so keep going we're almost there and within this video we're going to be looking at descriptive diagnostic predictive and prescriptive analytics and specifically Al we're looking at how to implement those things within powerbi we'll also be taking
a look at the data profiling tool which is part of the power query experience so you'll see that within the data flow Gen 2 and also within the power query engine in powerbi as well towards the end of the lesson we'll be finishing with five sample questions just to test your knowledge of this section of the study guide and as ever I'll be leaving links to further learning resources if you want to go into more detail about anything I mentioned in this video and I'll leave a link to the school community in the description below
if you want to grab those learning notes okay so just to kick us off I think it's worthwhile just going through those four types of analytics and looking at what they mean in Microsoft's own words so we'll start by looking at descriptive analytics so when we talk about descriptive analytics we're talking about analytics that interpret past data and kpi to look for Trends and patterns so we're looking at what happen happened in the past we're describing what happened in the past when we talk about descriptive analytics and we'll be going into some examples of descriptive
analytics and some visuals you can use to perform descriptive analytics but for now let's just look at these definitions the second one to know is diagnostic analytics so diagnostic analytics varies from descriptive analytics because we're not just worried about what happened in the past but we're looking at analytics that describe which data element will influence specific Trends and the possibility of future events so when we talk about diagnostic analytics we're not just talking about what happened but we're inferring why something happened so we don't just care about okay what happened in the last 12 months
but we're looking for more detail around particularly why certain events or certain Trends might have occurred now typically this uses techniques like correlation analysis and data mining at least in the words of Microsoft and there's various techniques that we can use to kind of INF use our visual reports in powerbi to try and expose some of this diagnostic information number three is Predictive Analytics so with Predictive Analytics we're going to be using statistics or machine learning as well to forast Future outcomes with statistical models and machine learning techniques as I mentioned now these analytics provide
context and Clarity for future decisions so in Predictive Analytics we're not just worried about what's happened in the past but we're using that data our historic data to think about what might happen in the future with some amount of certainty and finally the next level of analytics is prescriptive analytics so in prescriptive analytics we're not just predicting what's going to happen in the future but we're providing recommendations and we're recommending actions that might be the best course of action to either prevent a particular scenario or to increase the chances of a particular scenario particular outcome
for your business now with all of these four the clue really is in the name so if you're struggling to remember what each of these types of analytics is and it helps just to go back to the first word in each of them and really understand what each of them mean so in the first one we're describing so it's descriptive analytics we're describing what happened in the past with diagnostic analytics we're diagnosing so we're not just describing what happens but there's some sort of cause there we're thinking why something happened in the past so we're
diagnosing a particular Trend obviously with Predictive Analytics we're going to be predicting what happens in the future and with prescriptive analytics obviously the key word there is prescribe we're not just predicting what's going to happen in the future we're also prescribing a suitable course of action that you should follow to optimize a particular outcome in the future now one thing I'll say before we start here is that most of the content for this section of the exam comes from or at least is inspired by the pl300 exam for the PBI data analyst so that is
what we'll focus on for this section of the study guide for me it's a bit less about descriptive diagnostic predictive and prescriptive analytics although you could argue that the visuals that they're talking about here align to one of these four categories really it's more about powerbi and the visuals in powerbi when you should use which Visual and when you should use specific powerbi features so that's what we're going to be focusing on in this lesson and when it comes to visuals and choosing which ones you should use when well visual selection is somewhat subjective I
would argue for the exam recommend that don't try to be too clever when you're thinking about visual selection so they might ask you when should you use this particular visual or given this scenario which visual would you choose so instead of trying to be too clever instead I would recommend thinking about what Microsoft see as the main use case for a particular visual you know get inside the heads of the examiner and of Microsoft and think about how they want you to use the tools keep that in mind as we go through the following examples
okay so let's go through some of the commonly used visuals in powerbi and when you might consider choosing them or maybe not choosing them as well so let's start with the table and the Matrix visualization so these are really good for visualizing fine grain details in your data allowing your user the report user to explore the data themselves right and these are commonly used in drill through functionality so when you want to provide the user the option to drill through to find more detail about a particular metric you might allow them to drill through and
we'll be talking through that functionality in more detail a little bit later on they can also be used to display aggregate information so for example the revenue broken down by month for example one of the drawbacks of the table and Matrix visualizations that obviously it's difficult to spot long-term trends at least visually next we have the bar and the column chart so when you have one categoric variable and one numeric variable for example you might be looking at the revenue by region these are particularly useful they can also be stacked if you want to add
in another categoric variable into your analysis so maybe Revenue by region but then also by product type as well just to give it an extra layer into your analysis again bar and column charts they can be used to visualize time series information but it get a bit messy if you've got lots and lots of time periods along your xaxis generally it's better to present time series information on a line chart which brings us nicely into the next one which is the line and the area chart as I mentioned these are really good for visualizing time
series information you can fit a lot of information into one chart over large time ranges we can also use the legend to kind of break up a single line into multiple lines to compare how that metric is changing within each each category over time now another variation of this is the area chart on the right hand side there my personal opinion is I don't really like the area chart it's difficult to interpret all these different areas because you know the color of each area is kind of impacted by the color underneath it so it can
be difficult to interpret in my opinion from a user perspective again this is not about my personal opinion it's about what Microsoft sees as the core use cases for each of these charts next we have the card visualization now these are obviously really good for kpi metrics and you can also include percentage change metrics you can add a bit more context about what's happened is that sales amount an increase or a decrease since the last sales amount as well now out of the box by default it doesn't really give you that longer term Trend but
it can be coupled with things like a spark line so if you want to show the momentum of a particular metric over time adding that spark line can be really useful to give your users a bit of context about how that's changing over time and the card visual is something that has been developed quite a lot by Microsoft over the last few months and years and it's actually quite feature Rich now so you can really add a lot of information into these card visuals the pie chart donut chart and tree maps are a little bit
controversial but their main goal is to show ratios between different categories now the reason why pie charts and donut charts these kind of ratio charts can be somewhat controversial is that they're more difficult for the human brain to interpret the difference between an area which is what we're showing in a pie chart or a tree map for example and a more linear comparison that you get with like a bar chart another potential downside of this type of visual is that by presenting ratios it does somewhat obscure the overall numbers so you might be comparing the
sales in USA to the sales in the UK as a ratio we don't really by default get a view of the overall sales in each category now you can add that to the labeling and also potentially a tool tip to add that information in but I would argue if you're going to do that there's better visualizations to choose another downside is that we can't really see the trends over time so similarly with the card visual we're only getting the point in time metric for that particular metric right we're not seeing how that metric has changed
over time which is normally more useful for the user next we come to combo charts now these are typically used to visualize more than one metric on the Y AIS so you can see here we've got the bar chart showing one particular metric and then we've got a line chart showing another metric on top of the same visual right now this can be useful in some scenarios but you do have to be careful because sometimes this allows the user to come to a certain conclusion about correlation between these two metrics which you might not actually
be correct that correlation and it can also be difficult for a user to interpret which access is showing which metric you know often you need a really good Legend to explain the differences between these two axes these two metrics as well next we come to the funnel visualization now these are really good for showing some sort of movement through a linear process example here would be tracking website conversion so at the top of your funnel you might have a website viewer visiting your website then a next stage in that linear process might be okay they're
going to a product page then the next stage might be okay clicked on buy they've added something to their cart for example and then the next stage in the process might be they've actually bought the product and so on and so on so when you've got linear processes and you're trying to visualize some sort of movement through that process funnel visualizations are really useful now if I to be picky I think you could argue that is difficult for some users to grasp the scale of the difference between each of the levels in this funnel but
again that's probably not something for the exam that's just personal preference next is the gauge chart and the gauge chart is useful when we want to show progress of a particular metric towards a goal so say you have an annual revenue Target and you want to show halfway through the year that we're actually at 53% of our Target so we're on track for example next up is the waterfall visualization so the waterfall visualization is useful for showing a running total over either a time period or within specific categories right so the goal here is really
we want to to understand which periods or categories contribute most to that change to that overall figure right and again from personal experience the waterfall chart I think should be used carefully because it can be in my opinion somewhat difficult for users to interpret but that's just based on my experience next we have the scatter chart which is generally used when you want to visualize two numeric or continuous variables now again you can add more information to the legend of these types of visuals if you want to you know add a further layer of analysis
so you might be visualizing someone's age versus their height and then your third variable that you want to add into that analysis might be the country that they grew up in or the country that they live that might be an extra variable that you want to add into this analysis and you can do that by color coding the dots also changing the shape of those dots as well now obviously you need to be careful when you do that adding a third variable because you know having a scatter chart with lots and lots of different categories
of different colors can be quite difficult to interpret from a user perspective another kind of warning with these types of charts is it can sometimes lead the user to come to conclusions about correlation between these two variables and as we know correlation doesn't always equal causation so that's something to Bear In Mind as a report author you might want to compare these two variables when in reality they might not actually have a causative relationship next we've got custom visuals so custom visuals help you go beyond the out thebox visuals that come with with powerbi and
there are lots and lots of custom visuals that you can explore the app Source now these can be really useful if you want to use slightly more Niche visual types that maybe haven't made their way into the core powerbi visuals set yet and of course bear in mind that if you use a custom visual these are normally built by Third parties and sometimes they involve licensing and things like pay walls you might be able to use it for a short period and then you have to pay but that's something just to be around custom visuals
now the final visual type we're going to look at is the Q&A visual in powerbi which allows users to ask natural language questions about the data in your underlying data models now this sounds good but in practice I think it's difficult to implement well obviously the questions currently need to be carefully articulated to kind of match your data model so it requires the user to understand the columns and the tables in your data set to craft good questions that can give good results basically currently it's only available in English and Spanish as well if you
enable it in the admin settings so that's a summary of some commonly used visuals now let's look at some powerbi features that can help us as report developers to go kind of an extra layer in our analysis to help with maybe some of the diagnostic elements or the predictive elements that we're looking for in this part of the study guide starting off with the drill down functionality so drill drill down basically allows your user to explore your data through layers of a hierarchy and that hierarchy can either be explicit so as a a powerbi hierarchy
or it can be implicit so maybe you haven't actually defined it as a hierarchy but there is some implicit hierarchical relationship between these variables now a classic example of a drill down might be to visualize at the top level of your drill down Revenue in a particular country and then you might want to to drill down and look at Revenue by specific State and then within that state you might want to look at Revenue by store in a specific state for example now alongside drill down and drill up functionality you also have drill through and
drill through is a powerbi feature that allows users to drill through from one report page into another report page and it carries through that information they click on so in example here you might want to drill through this SharePoint category and take them through a Different Page in the analysis pre- filtered based on whatever you drill through on obviously with this drill through you need to think carefully about the user journey in your report and the navigation Journey that they're going through you might want to add in things like back buttons to make sure that
they don't get lost in your report now with both of these features again this is just from personal experience it requires your end user to have knowledge of drill down and drill through so I would argue this is a bit of a downside for these features but something to bear in mind when you're authoring these reports next up grouping so grouping allows report authors to group two or more categories within a visual we can create groups of months in this example or anything really that makes sense for the particular visual that you're creating now when
your data is continuous so it's numeric you might want to use binning and binning basically allows you to create different bins for these continuous variables now an example of this might be to create a salary bin so rather than just have salary as a number you might want to create salary ranges so from 0 to 30k 30k to 60k for example and obviously that opens up different visual types that you might want to explore and use for this type of data some other features to be aware of are reference lines so this allows report authors
to provide a static reference line across the x or the Y AIS to give the user some more context so it might be the average sales amount for sales people so maybe you're you're visualizing how all the different sales people have performed and you can see all of the data it might be useful to show report users what the average is so they can make a comparison give them some context for that comparison another feature here is around visualizing errors so if your data set contains some sort of errors in them normally it's around prediction
uncertainty or measurement uncertainty then we can use error bar to show that uncertainty to the user now these can either be markers or they can be lines or they can be shaded areas maybe if you're doing like a Time series forecast that kind of thing you can have a shaded area for the uncertainty of a particular prediction obviously this requires data on errors or prediction uncertainty next we have what if parameters so we can use what if parameters to perform some what I would say is kind of rudimentary scenario analysis so giving your report uses
the option to change a particular variable particular parameter in this case we've got discount percentage and they can see what impact that has on a particular metric in the chart below now to actually get what if parameters to work you need to do quite a lot of data engineering and potentially prediction algorithms as well so it can be quite difficult to set up a good what if analysis finally we've got time series forecasting and this can be accessed in that analy Antics pane so the third icon when you're setting up a visual is the analytics
Pane and here if you have a Time series data set you've got revenue numbers for the last 24 months for example powerbi gives you the functionality to create a Time series analysis obviously this is quite limited and I would argue you should be careful here you know if you want to be doing any sort of serious time series analysis I would argue that it shouldn't be done solely within powerbi but the functionality is there and you you might get asked about it great so let's finish off this lesson with a look at the power query
data profiling tool okay so let's just explore the data profiling tool which comes with the power query engine so we can use this in a data flow Gen 2 or also in the power query engine in powerbi powerbi desktop if you're using that as well now what I've got here is just a query on this Revenue data set and I want to explore some potential data quality issues in this data set now to do this we can go to the view Tab and have a look at the data view enable column profile and then we
got a few different options here so show column quality details if we just do that one to begin with and let's just explore what that gives us so we can see we've got these three kind of categories so it goes through each column and it gives us a percentage of how many values in that column are valid how many are errors and how many are empty so it gives you a really quick indication of column quality Now by default you can see down here that the column profiling is based on the top 1,000 rows So
currently it's looking at the top 1,000 rows and for this dealer ID column saying that none of them are empty there's no errors and 100% are valid now if we change this to the entire data set obviously going to take a bit longer to calculate but then it's going to look at your entire data set every value in this column and it's going to perform that same categorization as you can see this data set looks pretty good we got 100% valid for all columns no empty and no errors so that's good if we go back
back to the data view now we click on the column value distribution let's have a look at what that gives us so now we've got this quite high level view on the distribution of values within this column and if we make that a bit bigger here we can begin to see how many of these values are distinct now to explain the difference between distinct and unique I think it helps to look at this column here so we've got true and false values in every Row in this column now it's showing as a distinct count of
two which is obviously referring to true and false so it's kind of like a distinct count of the values in that column unique means values for which there is only one row so here there's no unique values because there's true in more than one row you know these are not unique it's included in multiple rows and false there are actually multiple falses so if in our sample size here we only had one false value then it would be in fact unique so we'd get that one in the unique column but currently because we have more
than one false value in this column it's not unique so zero unique values in this column so next up to explore we have the details pan so if we enable the details pane here and then we click on a specific kind of distribution we've got here make this a bit smaller so we get a bit more information about the distribution in a particular column so we can see the counts the error counts the null counts we can also see the distinct count unique count empty string counts and the minimum and the maximum values now obviously
this is a minimum maximum of a string value but it would also work with numeric numbers as well I think this column only has unit sold one yeah so the Min and the max is also one so it's not a particularly good example here and we've also got things like the average standard deviation number of odds number of evens so it give you a bit more detail about what there is in that column now once you've got a pretty good idea about the kind of distribution of column you might be able to spot things like
duplicate values in here what we can do is we can right click on it and we can remove duplicates or you might want to remove errors as well you can do that from just right clicking on this top section here and it gives you a quick way to remove or replace these errors or duplicates as well okay let's just round off this video by going through some practice questions to test the knowledge of what you've learned during this section of the exam of the study guide question one a company annual report shows a net profit
of $34 million now the company has 12 business units each with their own net profit or loss amount which of the following visual types could best be used to visually show how each business unit contributed to the overall net profit metric is it a a line chart b a scatter chart c a matrix chart d a waterfall chart or e a question and answer visual pause the video here have a think and I'll reveal the answer to shortly so the answer here was the waterfall chart as you remember when we were going through the waterfall
chart one of the core use cases of that waterfall chart is to break down one top level metric into its constituent parts to allow the user to explore how each in this case business unit contributes to the overall net profit metric now the other visuals that we've listed here you know you might be able to glean some of that information but as I mentioned previously what we're after here is what be the best visual to visualize this information and for me that's the waterfall chart question two you want to add measurement error bars on a
Time series line chart to show the potential error in each measurement where would you go to add this information to your visual is it a the analytics pain B the format visual pane C view Tab and show error bars D the build visual pane or e in the model view so the answer here is the analytics pane when you're configuring the settings for a particular visual there's obviously three panes or three different windows that we can use to declare different settings and the error bars functionality is included in the analytics pane it's not going to
be the build visual pane that's where you add your different variables different columns to a particular visual format visual that's where you change things like the text and the border and all that kind of thing it's not going to be the model view CU that's where you define relationships and things like that and it's not going to be in the view tab show error bars that doesn't actually exist that functionality I just made it up question three you want to visually compare two continuous variables age and height of survey respondents in one chart which of
the following visual types could best represent this data is it a a line chart b a stacked bar visual c a scatter visual D question answer visual or e a matrix visual so the answer here is C the scatter visual obviously when you're visualizing two continuous variables so age and height then probably the best visual to use for that would be the scatter chart put your height on the y axis or your age on the AIS and then you can plot different dots for each of your survey respondents again the other visual types not really
going to help with that kind of analysis You could argue that the The Matrix would potentially show you all of that but it's going to be a really big chart if you've got lots of respondents so the answer here is C the scatter visual question four by default the data profiling tool reviews the top n rows of your data set to show you potential data quality issues so what here is n so how many rows is it a 10 B 100 C 500 D 1,000 or E 10,000 so the answer here is D 1,000 rows
as you can see here from that visual by default the column profiling is based on the top 1,000 rows obviously we can change it to include the entire data set as well but by default it's 1,000 rows question five which of the following features of the data profiling tool can help you identify duplicate values in a column that you plan to use as a key to join on is it a column quality details B column value distribution C column duplicate analysis D column key constraints or E Group Buton so the answer here is B the
column value distribution so as we mentioned one of the key use cases for the the value distribution is to look for duplicate values because if you've got a key that you plan to join on you want to be looking for duplicate values in that key and in the column with no duplicate values is going to look like this with a just a flat line so every value is going to be distinct here on the left hand side this would indicate that there are some duplicate values there's some values here that have a count of more
than the others right so this would be a potentially problematic key on which to join on but this date ID column all of the values here are unique at least in the sample of data that you're using for profiling the column quality details a doesn't give us this information the column duplicate analysis C doesn't actually exist neither does d the column key constraints and E the group buy okay you could potentially use a group buy to look for duplicate values but it's not a feature of the data profiling tool itself congratulations we're nearly there only
one section of the study guide to go in the next lesson we'll be looking at the final section of the study guide we'll be looking at how to use SQL to analyze data via the Lakehouse SQL endpoint in the data warehouse and also via the xmla endpoint as well so make make sure you click here to join us in the last lesson in this series I'll see you there