hello everybody today I'm going to provide an introduction to diffusion models a family of models that have recently shown tremendous promise in the image generation space while many approaches have been implemented for image generation some of the more promising ones over time have been model families like variational autoencoders which encode images to a compressed size that and decode them back to the original size while learning the distribution of the data itself Gans which pit two neural networks against each other one neural network the generator creates images and the other neural network the discriminator predicts if
the image is real or fake over time the discriminator gets better at distinguishing between real and fake and the generator gets better at creating re looking fakes autor regressive models which generate images by treating an image as a sequence of pixels the modern approach with autor regressive models actually draws much of its inspiration from how llms handle text one of the newer image generation model families and the focus of this talk is defusion models diffusion models draw inspiration from physics specifically thermodynamics and while they were first introduced for image generation in 2015 it took a
few years for the idea to really take off within the last few years though we've seen a massive increase in the research space an industry with diffusion models diffusion models underpin many state-of-the-art image generation models that you may be familiar with today diffusion models show promise across a number of different use cases unconditioned diffusion models in which models have no additional input or instruction can be trained on images of a specific thing to generate new images of that thing such as faces another example of unconditioned generation is super resolution increasing image quality conditioned diffusion models
give us things like text to image generating an image from a text prompt an image editing or customizing an image with a text prompt let's dive into diffusion models and talk at a high level about how they actually work in this talk for Simplicity let's focus on unconditioned diffus now this is a really really interesting idea it's quite different than any other image generation approaches destroy structure and a data distribution through an iterative forward diffusion process learn a reverse diffusion process that restores structure and data the goal is that by training a model to denoise
that model will be able to take in Pure Noise and from it synthesize a novel image now I know there's a bit of math not a on this slide so let's break it down a bit we start a large data set of images let's take a single image shown here on the left we start the forward diffusion process to go from x0 the initial image to X1 the initial image with a bit of Noise We can do this over and over again iteratively to add more and more noise to the image this distribution we call
Q and it only depends on the previous step we can do this over and over iterating L adding more noise ideally once we do this for a high enough T we have reached a state of Pure Noise the initial research paper implemented this with t Wars 1000 now we want to do this in Reverse how do we go from XT noisy image to XT minus one slightly less noisy image at each step of the way we also learn this reverse diffusion process that is we train a machine learning model that takes in as input the
noisy image and T and predicts the noise let's take a look at this from another angle we can visualize a training step of this model we have our initial image X and we sample at time step T to create a noisy image we train a denoising model to predict the noise this model is trained to minimize the difference between the predicted noise and the actual noise added to the image in other words this model is able to remove noise from real images to generate an image we can start with Pure Noise noise and send it
through our denoising model we can then take the predicted noise and subtract it from the initial noise if we do this iteratively over and over we end up with a generated image another way to think about this is that the model is able to learn the real data distribution of images it has seen and sample from that learned distribution to create new novel images as I'm sure we are all aware there have been many advancements in this space in just the last few years while many of the exciting new technologies on vertex AI for image
generation are underpinned with diffusion models lots of work has been done to generate images faster and with more control we have also seen wonderful results combining the power of diffusion models with the power of llms for incredible context aware photorealistic image generation one great example of this is Imagine from Google research while it's a bit more complicated than what we've talked through in this session you can see at its core it's a composition of a llm and a few diffusion based models this is a really exciting space and I'm thrilled to see this wonderful technology
make its way into Enterprise grade products on vertex AI thank you for watching please feel free to check out our other videos for more topics like this one