hi i'm priyanka vergaria and this is ai simplified where we learn to make your data useful in the previous episode we learned how to create data sets in vertex ai and trained machine learning model using automl in this video we help our friends at fuel symbol build custom machine learning models their team of machine learning experts have decided to write their own training code to predict the fuel efficiency of a vehicle join me as i walk them through the different ways in which you can run custom training in vertex at and finally we will kick
off our training job in the console and see how it's done now what do we need to create a custom job before we submit any of the custom jobs to vertex ci we need to create a python training application or custom container to write the training code and the dependencies there are pre-built containers to run or code if we write our training application in python using tensorflow scikit learn xd boost or pytorch if you are using pre-built containers then you can just provide the path to the training package in cloud storage you can also build
your own custom containers if you wish to and in that case you provide a location to the container from your container registry or your artifact registry if you're not sure whether to use pre-built or custom containers then check out the link i've included below in both cases you would need a folder within cloud storage bucket to put your model output artifacts into when running custom training jobs on vertex ai we can also make use of vertex's hyper parameter tuning service if you're not familiar with hyper parameters there are variables that govern the process of training
your model such as batch size or number of hidden layers in a deep neural network in hyperparameter tuning jobs the vertex ai creates trials of your training job with different sets of hyper parameters and then searches for the best combination of hyper parameters across a series of those trials now to run the training you need compute resources it can be either single node or multiple worker pools for distributed training this is also the step where you would select the type of machines the cpus the disk size and the disk type and the accelerators such as
gpus and once that training is done we need the trained model as an endpoint to be served for predictions in vertex ai we can do that by using pre-built containers that are supported runtimes such as tensorflow scikit-learn pytorch or you can also build your own custom container that is stored in container registry or artifact registry of course if you just need the model artifacts to be downloaded then you can skip these steps altogether or you can just use the model for batch predictions now that we have the basic understanding of our custom training choices let's
help fuel symbol train and serve a custom model that predicts the fuel efficiency of a vehicle we will be using tensorflow to build this model and then we'll package our training code in a docker container so let's see in the google cloud console before we begin make sure to enable the vertex ai api compute engine and container registry apis in this example we will use notebooks to create the code for our docker container but you can use any environment you like to do this create a new notebook instance with the latest version of tensorflow enterprise
for this particular demo i am training without gpus but depending on your use case choose gpus if you need them now that our instance is created let's open the jupyter lab and navigate to the terminal window in our notebook for the purpose of this demo we will use the training code that already exists in tensorflow docs to predict the fuel efficiency of a vehicle it uses the auto mpg data set from kaggle to predict and i've included the link below to this page what we will do here is we submit the training job to vertex
ai by putting our training code in a docker container and pushing the container to google container registry using this approach we can train a model built with any framework now our first step is to create a docker file this docker file uses one of the deep learning container tensorflow enterprise docker images now the deep learning containers on google cloud come with many common machine learning and data science frameworks pre-installed the one we're using here includes tensorflow enterprise pandas scikit-learn and others now after downloading that image this docker file sets up the entry point for our
training code and we haven't created these files yet but we will do that now before that we need a cloud storage bucket to export our trained tensorflow model into vertex ai will then use this bucket to read our imported model assets and to deploy the model now for our model code let's create a trainer directory and a train dot py file we would add our training code to the train dot py file here i have adapted the code from the tensorflow docs for the model to predict fuel efficiency for the vehicles now if you're interested
in using the same code follow the code lab that's linked below we need to make sure we update the bucket variable with the name of the storage bucket that we just created earlier now we need to build and test the container locally let's run the container within our notebook to see if it is working correctly now it's building now and now that it is built let's run this container there it is working fine in a production environment training will likely take longer which is why you would want to run it on the cloud now let's
see how to do that now we push it to google container registry so vertex ai can access it with our container pushed to container registry we are now ready to kick off a custom model training job now we go to the training in vertex ai and select no managed data sets since our data set is being pulled from the source in our model code now vertex ai gives us two options for training the models with auto ml we can train high quality models with minimal effort and machine learning expertise and we covered this in our
previous episodes so check them out now here we are using custom training give the model a name and in container settings we have two options now pre-built containers support pytorch cycle learn tensorflow and xg boost it requires us to package and upload our application code and dependencies to a cloud storage bucket now because we have our own custom container on google container registry we will select custom container here which supports not just the common machine learning frameworks but also non-machine learning dependencies libraries and binaries now we provide our image url and our cloud storage bucket
and we will see hyper parameter tuning in later episodes so let's leave this unchecked for now and continue in compute and pricing we set up a standard machine since this model trains quickly but you're welcome to experiment with larger machine types and gpus if you like just remember that if you use the gpus you will need to use a gpu-enabled base container image the prediction container section is where we specify how we would like to serve our model now here we use pre-built containers and we provide the path to our cloud storage bucket and we're
ready for training we can check to see the training in progress this could take some time i have fast forwarded the process here for you now that our model is trained we need to deploy it to an endpoint to make predictions now give it a name and since this is a demo we will just leave the traffic split at 100 and enter one for minimum number of compute nodes depending on the use case you can set the maximum number of nodes for auto scaling when the traffic on the endpoint is high i'm selecting a standard
machine type here pick the one that makes sense for your use case and deploy deploying the endpoint will take a few minutes and when it is done we can see it in our endpoint tab now that our endpoint is deployed we can start making predictions we will get predictions on our trained model from a python notebook using the vertex ai python sdk but you can get predictions from any environment you'd like so i'm back in our notebook instance and creating a python 3 notebook from the launcher now we install the vertex ai sdk then add
a cell to import the sdk and create a reference to the endpoint we just deployed replace the endpoint and the project id with our values of the endpoint and project id and then finally we make a prediction to our endpoint with some test data and there's our predicted value with that we've helped our friends at fuel symbol train a custom model using vertex ai we provided the training code in custom container and used tensorflow model here but you can train a model built with any framework using custom containers then we deployed the tensorflow model using
pre-built containers and we finally created a model endpoint and generated a prediction in the next video we will build vision model using automl in the meantime let's continue our discussion in the comments below and don't forget to like and subscribe you