hello all my name is krishnan welcome to my youtube channel so guys we are in the seventh tutorial of amazon sage maker now in my previous tutorial you had actually seen we had deployed the machine learning model as endpoints i'd also shown you about the billing management how you should actually check the billing itself so right now it is showing us point zero four since i've been working from past four days and this charge has actually happened in just sage maker so now after the deployment uh of the model as end points we are just going to do the predictions now in predictions first of all and again guys we are going to solve this in two to three steps like first of all we'll do the prediction then we'll create a simple confusion matrix to see what is the output how is the performance and then we'll probably take the further steps now after this in the prediction first of all you need to import a library which is called a csv underscore serializer because when you are giving the data to the endpoints right the endpoint will be accepting some input so the input is usually in the form of an excel data set that is like a tabular data set that should be converted or serialized that whole csv needs to be serialized and it needs to be passed given to the model and the model will actually give you the output so if you go on the top we had already done the split of the train and test data set over here right so i have my touch data set now what i'm going to do is that first i'll be importing uh this library which is called as sagemaker. predict predictor import c csv underscore serializer then from the test data what i'm going to do is that i'm going to drop my depend uh my dependent features that is yes underscore no y underscore yes from access is equal to one and i'm going to convert this into an array and after this i'm going to save it in a variable which is called as test underscore data underscore array after that whenever you are using csv underscore serializer you also need to set up this model content type okay so this model content type should be set as text or csv since our data set is completely in the form of csv format then i also have to set my serializer and this serializer is again cv underscore serializer which we have actually imported and the reason is that my data set is actually a csv file after this i can actually use something called as xgb underscore predictor which is my model dot predict right and then i'm going to give my test data after that i also have to decode it with the help of utf hash dash 8 encoding the reason is that again this encoding will be required this decoding will be required because when we are actually doing the prediction it will be some encoded format uh and that encoded format will be decoded by using this format itself right udf8 now once i gets my predictions this spread from this prediction i will be taking the first you can say that i'll be taking the first okay the first part of that particular data so that we will be getting the highest value with respect to a binary classification based on the probability okay so once we do this and once we try to find out the prediction array. shape now let's see once i execute these guys so you will be seeing that i will be getting some shape this many records i have in my test data set uh and if you really want to see my predictions underscore array you will be able to see this is the output of my test data okay this this is basically the output of all the test data now if i really want to check or the next step obviously everybody knows about it we will be creating some kind of confusion matrix again this whole confusion matrix code is actually taken from the aws documentation what we have done is that they have actually done implemented some cross tab they have taken the prediction array they have assigned some columns and in short what they have done is that uh based on the purchase and no purchase they have actually calculated all the features that is required now you have to tell me what is this particular formula what is this particular formula what is this particular formula and again what is this particular formula because we know false positive true positive true negative and all but what this formula actually specifies you have to tell me okay so just try to execute this and here you'll be getting an amazing um confusion matrix like kind of stuff so here is your predicted you can see no purchase purchase no purchase purchase so with respect to this you're getting 91 percent with respect to this where the predicted value is purchased but the real value was no purchase right it is somewhere around nine percent uh over here here you can see that the accuracy is very very less because probably this kind of problem statements are imbalanced data set uh similarly with respect to this here you are getting somewhere around 66 percent but again this purchase and your observed is no purchase at that time so you can see in the left hand side you can consider this as an observed value in the right hand side you can see this is as a predicted value so in the observed if it is no purchase but it has been predicted as purchase so this value you will be getting somewhere around 34 again which is very very less we should try to increase this and increase this value with the help of hyper parameter tuning right so in short uh the classification accuracy that you are actually getting is somewhere around 89.
7 but i will definitely have a look on to the precision recall and all the other values because this is purely an imbalanced data set and this values is very very less this accuracy is very very less now after this guys always make sure that once you do the prediction from the end point don't run it continuously because the charges will be going on because once an endpoint uh endpoint address is actually created you need to delete those right so for deleting you have this specific code okay so for deleting you have this specific code and when you are deleting you basically delete all the end points right so you have to say that sagemaker. session. delete underscore endpoint and in the bracket you give their estimator which is your xgb underscore predictor dot endpoint okay and you also specify your bucket name so that whatever folders has been created with your data set even your model files everything will get deleted so here you are actually getting the information of your bucket name and then bucket to delete dot ops dot all dot delete if you use this function everything will get deleted so two main things that you have to do after your model is deployed in the endpoint address make sure that don't continue it for a longer period of time instead focus on deleting those endpoints as soon as the training is done later on if you want to really practice again for the free trial you can do it you can create there and you can test it but after this step you just try to delete all the end points now right now if i go to my sage maker you'll be able to see that my sage maker have all these particular files right bank application and all so i'll just reload it you can see over here so aws series maker you have this bank application now what will happen if i execute this okay yeah again i'll show you guys inside this all the folders are present right this is my folder this is my folder everything is present right now if i execute this this will go and delete each and every folder right xjboost this test.
csv output model train.