hey everyone welcome back to the channel today we're continuing the DP 600 exam preparation course and we're going to be looking at SQL the data warehouse and how we can schedule things to run in Microsoft fabric this is video six in our Series so we're nearly halfway we've covered a lot of ground already but there's some way to go still until we get to the end of the course in this video we're going to be covering creating views functions and stored procedures in that data warehouse experience how we can add stored procedures notebooks data flows to a data pipeline then how we can schedule data pipelines and also schedule things like data flows and notebooks and throughout this section of the course obviously we're going to be focusing on what you need to know to prepare for this dp600 exam a lot of these topics can go really deep but we're just going to go through what I think would be sensible to learn for the exam in case they come up now this lesson will be pretty much 100% practical so we're going to be going into fabric having a look and creating all of these things ourselves at the end of the lesson we'll be doing four sample questions and as ever I've got some key points and links to further learning resources if you want to learn more about a particular topic and really brush up on your skills in a particular area for the exam okay so let's start off by looking at creating functions and stored procedures and Views in the data warehouse experience what all these things are when you should be using them how you should be using them and things you need to bear in mind for the exam so we're starting off in a work work space here and I've created some fabric items the most important one being this data warehouse dp600 data warehouse if we open that up and have a quick look around just created some really simple tables we're going to be using one or maybe two of these tables for this tutorial specifically this db. employees table so only got four rows but that's good enough just to show you the functionality of a function and a store procedure and a view so what we're going to be doing is we're not actually going to be working in the fabric online experience we're going to be using SQL Server management studio so what we're going to be needing to do is go into the settings of the data warehouse collect this SQL connection string copy that and then go over to SQL Server management Studio you can download that for free I'll leave a link in the description or in the school community and then we can connect to our fabric Data Warehouse from within SQL Server management studio now I've already set up the connection here but if it's your first time using SS SMS and you haven't connected it before just click on connect to a new database Engine add in your server name here use the Microsoft entra multiactor authentication it will ask you to authenticate with fabric in the online experience and then you should see your databases listed here we've got this dp600 data warehouse that we were looking at previously and you can see that we've got some tables we've got the employees table the gold table and some other tables in here as well so let's start off just by exploring what we've got here right so let's just do a very simple select star from db. employees and that's just going to return all of that data just so we can have a quick look at it make sure it's all connected correctly yeah so we got our four rows there three columns employee ID name and age perfect so say we've got this table here db.
employees and for a powerbi report that we're wanting to create we actually want to do a bit of transformation on this table we don't want the raw table just as it looks like here we want to do some sort of transformation it doesn't really matter what that transformation is for this demo maybe we want to do a where statement so where name is like Jack for example this is just going to bring us back all of the employees that are called Jack and we're using this percentage Wild Card operator here and what that means is if the first four characters are ja a CK anything after that is going to return in our results set here as you can see we've only got one Jack in the data set so that's coming back correctly now in our powerbi imagine we want to query just this result set but what we can do is we can save this query as a view now this is a bit of a toy example but sometimes you have a view that might have really complex transformation right it's not stored in the database but whenever we query it we want the transformations to be done at query time so all you can do to create a view of this is add in create view give the viewer name dbo do employee get Jack and then use the keyword as right then everything that follows that is going to be part of your view and if we just execute that we can see that that's been successfully completed and then we can just do select star from dbo do view employee get Jack and if we execute that we get the same result as before so now when you're creating your power report you can just query this view rather than the underlying table to get the transformed data back okay so that's the basic Syntax for creating a view what I've done is just written down some key points to understand just in general but also for the exam as as well so for aiew the transform data is not stored right it's just the transformation logic and the code the SQL code is stored and then every time you query it maybe in a powerbi report it's going to perform that transformation at query time now you can create a view from another view so you can query another view from for example this view employee get Jack we can query that if we create another view that's possible we do have to be careful with performance sometimes when you string multiple VI together and they're doing lots of heavy processing then it can have an impact on performance and it also can be difficult to understand and maintain if you're constantly querying other views from other views that's something to bear in mind now with a view we can't specify any sort of parameters okay it's just simply select statements you're just going to be able to query it like so and whatever the results is you get those back now this is possible with functions and stored procedures as we're going to see in a minute and as we mentioned reading a SQL view into semantic model will fall back back to direct query mode so direct Lake mode is not possible and that makes sense right because direct Lake works by reading the underlying Delta tables and in a view those Delta tables don't exist right so that's something to bear in mind with with a view so if that's a view now let's look at a function so say we want to parameterize that view that we had right we want to pass in a parameter maybe first name so that we're not just getting back Jacks we're can potentially use it to get a list of employees given any first name let's have a look at what that might look like so this is a declaration of a function now you notice it's similar in some ways to the view but there are some key differences we're going to start by calling create function we give the function a name this one's function employee first name search then we're going to open some brackets and the brackets are really important in a function it's a bit like a function in Python for example you're going to open those brackets and pass in a parameter now the parameters are optional you don't have to pass in a parameter but it is an option we're defining our parameter with this at symbol so at first name and then we give it a data type just for our char2 and this is a default value this is saying we have one parameter and the parameter name is first name the parameter type is far Char 20 and the default value is an empty string now the next really important piece of syntax here is this returns table so if you've used any sort of SQL functions in other flavors of seq you know there's a few different types of functions now in fabric we're talking about table functions which means that it returns a table right we're not talking about Scala functions because that's currently not possible in Microsoft fabric version of tsql We're going to be calling a function and it's always going to return a table so in our syntax we have to Define that right we say returning a table as return Open brackets and then we can pass in whatever we want to declare in our select statement now this is very similar to our view declaration but here we're making it Dynamic right we're parameterizing it and we're using our first name parameter so let's just run this Okay so we've created our function let's just have a look at what that looks like so if we go to programmability in SQL Server management studio and in our functions you can see that in SQL Server management Studio they do actually make the differentiation between Scala valued functions and table valued functions so as you know R1 is going to be in this table valued functions here it is here dbo function employee first name search and we can call it like this select star from dbo our function name Open brackets with the parameter so if we call this execute and you see there Returns the same as what we had before but it's parameterized so now instead of just Jack we've created a bit of a parameterized function here so we can call Jack we can also call Sarah you know whatever that needs to be we're just encapsulating all of that logic into a function and we parameterized it so that you can use it in multiple different places in your tsql code base and it just packages up that logic into a nice reusable function so what are some of the key points with the function as I mentioned it's useful for packaging up logic that you might want to reuse in different places and it can be parameterized now with a function you can only do select statements so you can't actually update any rows you can't do any sort of insert statements it's only ddl flavors of SQL only select statements so to call a function we can only really use a select statement like so or we can call it from within a stor procedure or from within a view which is kind of underneath this ddl as well if you've got a ddl view and by ddl I just mean select statements basically it can't be orchestrated directly right so we can't use it in a data pipeline directly although we can embed it in a stored procedure as we mentioned previously it allows one or more parameters specifically input parameters and the output type should always be a table okay so in fabric we're only going to using table valued functions and it's always going to return a table okay so that's the function now let's move on to the stored procedure so this is a very basic definition of a stored procedure we've got create procedure we give the procedure a name as and here we're just doing a select statement right and all this is going to do there's no parameters you'll notice when we execute a stored procedure we're just going to use this exact which is execute so that's one of the differences between a store procedure and a function and a view the way in which we call it right so at the moment we're not really doing much with this store procedure it's just like the most basic store procedure possible we're just passing in a select statement and we're executing it and we're getting back the results set the full results set now let's step it upper gear and add in a parameter so we've got a very similar thing to what we were looking at before create store procedure now we've got dbo Spore employee uncore get by first name and we're passing in or we're declaring a parameter the parameter is called first name again it's going to be of type varar 20 underneath that we're declaring what we actually want to do in this stored procedure so we want to select all of the employees where the name is like first name which is going to be passed in as the parameter and we're adding this kind of wild card operator onto the end so that anything that goes after that whatever the surname we're going to return that as well so let's just execute this to create the stored procedure ah yeah so let's just drop it drop procedure if exists dbo dot okay so now we've dropped it we can recreate it again just to show that it works completed successfully and then if we want to execute that again we're going to use the exec command pass in the name of our store procedure and our parameter which is Jack if we execute that we get this so that's the basic definition of a store procedure what it looks like now let's think about some of the key points here so with the store procedure we have the ability again to Define input parameters but also output parameters as well I didn't show you that in this lesson maybe that's one for another lesson but we can actually Define output parameters which are useful in the data pipeline scenario maybe we'll look at that another time on the channel probably all you need to know for the exam is that it is possible now they're called using this exact Command right it's not as part of a select statement you can call other store procedures from a stored procedure so you can create three stored procedures and then create kind of like a master stored procedure that calls each of these other stored procedures in series now one of the key use cases of a stored procedure is to give people access just to the stored procedure and not to the underlying data so you can use it as a security mechanism just by giving people access to the store procedure and you know they're only getting access to the results of that store procedure and not any of the underlying data and one of the key points in fabric to understand is that the stored procedure is the one of the only things that we can embed into a data Pipeline and we can pass the parameters that we've seen here from other notebook activities so that's something to really important to understand with store procedures they become a lot more useful because they can be orchestrated as part of a data pipeline then they become really useful for data transformation and all of these kinds of things in your data warehouse and in your architectures in general now the one I'm missing here is the ability to do insert S updates deletes now this is what's unique to the stored procedure at least in this list here is that here we're just using a select statement but we can use stored procedures for updating and inserting the underlying data sets so they can be really powerful tools that we can use for any sorts of data transformation data loading all these kind of things are possible with a store procedure okay so let's just focus now on the stored procedure and we mentioned here it can be embedded in a data pipeline so let's go back into fabric now and have a look at what that looks like okay so here we are in a data Pipeline and you'll notice if we go over to the activities tab there's a lot of different options for activities now the three that they mentioned for the dp600 study guide are the stored procedure Activity The Notebook activity and the data flow activity here we're going to go through each one in a little bit of detail have a look at some of the settings that we can apply when we're creating these activities in a data pipeline then we're going to go on to look at how we can schedule these things starting with the store procedure so when we drop the store procedure activity onto the data pipeline canvas we can have a look at some of the settings that we get out of the box right so most of the configuration happens within the settings tab you can navigate to your specific data warehouse that you want to connect to obviously noting that the data warehouse you connect to currently has to be in the same workspace as your data pipeline we've mentioned that quite a lot on the channel around the some of the crossw workspace limitations with the data a pipeline so just be wary of that then we connect to the warehouse and then we can choose from a number of different stored procedures that are in that data warehouse we can Define any sort of parameters that we've got here there is this option to automatically import parameters so it's going to look into your store procedure it's going to pick out any parameters that you've got there here it's found one called first name it's of type string it knows that and currently we're just hardcoding in a value here called Jack but one of the benefits of the store procedure is that you can Dynamic content right so you're going to be able to parameterize that stored procedure activity and pass something into this value here using data pipeline parameters on the general tab we've got just the name of it so you can update the name you can make it active or deactive we can add a timeout so this might be quite useful for some data pipeline activities maybe you got a really longstanding stored procedure it might take half an hour to run you might want to add a timeout here at 1 hour because if it runs for 1 hour or longer than 1 hour you know that probably something has gone wrong and you don't want it to lock up your database your data warehouse so that's something to bear in mind here the the timeout another one is the retry so this retry setting is available on quite a few of the data pipeline activities and as the name suggests it's going to try it and if it fails that activity it will retry the number of times that you specify in this box here we can give it an a retry interval so we're going to say okay we're going to try it once then we're going to wait for 30 seconds and then try again secure output and secure input as well this is just going to specify whether you want to Output into a log the results of that activity or in the input as well so that's the stored procedure activity now let's look at this notebook activity so we've got a notebook here and again if you go through to the settings you can Define your workspace and your notebook here I've got notebook load to Silver similarly with the stored procedure we can declare any parameters so if you got a parameter cell in that notebook you can pass parameters into the notebook from other activities in your data Pipeline and again we've got very similar settings on the notebook as we had with the store procedure activity we've got a number of retries so with the notebook a retry is probably more important or more likely you're going to be using this because with a notebook you're obviously using the spark cluster and also potentially querying rest apis so there's a lot more that can go wrong I would say in a notebook than in a store procedure so the retry functionality here I think Microsoft recommends that you set the retry to two or three just so that if your spark cluster is busy and you're doing lots of computation on it and the session times's out or something goes wrong with your notebook execution it's going to retry it so always recommended to add in one or two retries into a notebook activity again we can specify the retry interval down here and secure input output so these are the same as the store procedure activity now the final one we're going to look at is the data flow activity here similarly you're going to go over to the settings table find your workspace and your data flow select a particular data flow that you want to run and that's basically it if we go back to the general tab you'll see that the settings here are exactly the same we give it a name description we can set the activity state to active or deactive we can give it a timeout and a number of retries and the retry interval so that's a little bit about the stored procedure activity the data flow activity and the notebook activity in data pipel lines now let's look at scheduling a number of these different items in Fabric and the different options that are available to us there so here we are back in our workspace and let's look at how we can do scheduling of different items within Microsoft fabric now there's a few different ways that we can schedule things to work in fabric these ETL jobs mainly we can schedule them to run maybe every hour or something like that depending on your use case we're going to be looking at scheduling data pipelines data flows and notebooks in a bit more detail here so with the data flow we can actually schedule from a workspace so we can click on the settings of a particular data flow just clicking on these ellipses Three Dots here clicking on the settings of that workflow go down to the refresh settings configure a scheduled refresh and we can turn that on and we can change the update frequency to daily or weekly or if you want more fine grained refreshes than that we can add specific times 1:00 a. m. maybe 2 a.
m. 3:00 a. m.