[ MUSIC ] Trupti Parker: Hello, everyone. I'm Trupti Parker, Product Manager in Azure AI at Microsoft. Marlene Mhangami: Hi, everyone.
I'm Marlene, and I'm a Senior Developer Advocate with the Python Azure team here at Microsoft as well. Trupti Parker: Yeah. Hope you had a good lunch today.
Today we are going to learn about the journey of GenAIOps and how Azure AI Foundry and Azure Developer Toolchain is helping you provide support all along the way. We all have heard that the opportunity with AI is vast, but then what really is the holdup? There are a lot of new challenges that our customers are struggling and facing while harnessing the power of gen AI.
Gen AI is reshaping the enterprise operations. However, as organizations are scaling their applications, they're struggling to manage their AI application workloads and while making sure they're complying to privacy, security, and responsible AI policies. All of this with the rapid evolution of the field may make it seem like it's a moving target.
It's really difficult to find which is the right tooling you want to use and then find adequate guidance for that. Therefore, it has been incredibly clear that there is a need for new tools and processes, as well as a change in mindset in how technical and non-technical teams should collaborate together to manage their AI practices at scale. Let's take a step back to understand what operationalization really means.
From business perspective, what does business mostly care about? It's all about making sure the customers are happy, right? In order to do so, it's all about adapting to the changes that are happening in the industry.
Making sure you're capitalizing on the advantages of LLMs and making AI-driven business decisions. For IT, it's all about navigating the technical challenges while harnessing the power of gen AI, but really not reinventing the wheel too much of software development life cycle. How many of you have heard the concept of MLOps?
I see some show of hands. Awesome. How many of you have heard the concept of GenAIOps?
Awesome. I see a couple of hands. Yeah, so we're going to learn about it today, so you don't need to already know it.
But as we move towards the adoption of gen AI practices, we are witnessing a shift from traditional LLMOps and MLOps to now GenAIOps. As gen AI models are a bit non-deterministic and dynamic in nature, it requires new strategies like prompt engineering, vector search configuration, data grounding, and so on. With that, we need to start thinking about the new target audiences that have come in picture.
It's not just about researchers and data scientists. It's also about app developers and AI engineers who can now start leveraging this. We also need to think about the new metrics that have emerged.
Previously, it was all about data science, sensitivity, accuracy, and those kind of metrics. But now we need to also be mindful about the responsible AI metrics, such as toxicity, harm, bias, while managing your workloads. We also need to think from collaboration perspective, the new assets that have emerged.
Now prompts are also kind of assets that we need to be thinking about as it's very crucial for the adoption. Luckily, with all this gen AI improvements, your traditional MLOps practices are not going to waste. You can still start there and start adapting to the new technologies.
Marlene Mhangami: So when we think about the process of building out LLM applications, it works a little bit differently from your traditional application. More specifically, with the gen AI lifecycle, we think about these three sort of iterative loops. And the first thing is that it's more experimental.
So when you're getting started, you want to have this ideation or exploring phase. In this phase, you just want to get started right away with having ideas, throwing those ideas at an LLM, and testing that LLM with some inputs, not many inputs. Maybe you'll just hard code some information and try some different prompts out.
And in this process, you just want to figure out whether your idea is actually feasible and that the LLM is giving you the response that you would like. And then once you have completed that ideation process, you want to be able to move and bring your idea from a proof of concept into an actual application. And you'll start the development phase or the building and augmenting phase.
And in this phase, what you're going to do is you're going to test your application across a range of different inputs. So you might go ahead and do some prompt engineering and try out different user inputs and see how you can improve your prompts in that application. And then after that, you also want to make sure that the output from the LLM is as high-quality as possible.
With that, you might do some RAG and make sure that the LLM output is grounded in as much information as you can throw at it. And then from there, once you feel pretty confident about your application and you think it's ready for users to start testing, you want to move on to that final phase where you're bringing your app into production. And in this phase, what's important is that you want your users to have a positive experience.
And one of the ways that you can do that is to monitor your application and make sure that the performance is steady and good quality. And something else that you can do is pull that data in from your performance metrics and also make sure that you're evaluating that data, bringing it back into your development environment and iterating so that you make your application as good as possible. Trupti Parker: Awesome.
Thanks, Marlene. So this all seems great, but now let's actually try to understand from step-by-step process, how do we actually get started? Right?
The first step is really understanding what is your business use case, understanding the why and the idea of your project. Once you kind of understand that, then you start exploring the different models that we have and then start exploring and finding which is the best suitable for your business use case. Once you feel comfortable with the models and the technologies that you're using, then you can go to the next phase when you can start experimenting and ideating with that and also do performing the local debugging options.
Once you feel comfortable with that, then you can actually start evaluating your application, put on some own your personal data to see that you can get the good results as for your business use case and it's tailored to your application. Once you're comfortable with evaluations of your application, then you can start with the deployment and take your application to the production environment and put it in the real world. With that, we also need to start thinking about the responsible AI, the cost management aspect that comes into picture, and making sure you are scaling as per your needs and we are providing the right flexibility for that.
Marlene Mhangami: Great. Trupti, this is all really helpful information in theory, but as a developer, I would like to know how I actually take these theories and actually put it into practice. What tools do I have available for me to do that?
Trupti Parker: That's true, Marlee. This is where a lot of us are right now in this journey, right? How do I actually get started?
How do I actually start building these things? That's when the tooling that we have been building comes into picture. Then the interesting thing that's happening in the technology world right now is that traditional app development and AI development, along with the AI personas that we were showing earlier, are sort of becoming and coming together and becoming a single thing.
What it means is that now AI just becomes another tool in your developer world. So as you start experimenting with this, our tooling must represent that change. We have some new and upgraded tooling to help you throughout the process.
Using Azure Foundry, you can kind of get started with the prompt engineering piece. You can also use our Azure AI templates to kind of get preset templates for experimentation purposes. Once you're feeling comfortable with that, with our tracing and evaluation portfolio, you can start experimenting and seeing what suits your portfolio, and then finally you can go into a production with our tools as well.
So now let's actually get started and see one such demo in action. Malene, over to you. Marlene Mhangami: So we're going to go ahead and look practically at the tools, using these tools, what this looks like for developers.
And for me, the first stage is the ideation stage like we mentioned earlier. And in this stage, I really don't want to be thinking about having to download all sorts of SDKs or having a super complicated setup. And so fortunately for us, we have three really great options for you today that you can take a look at when you are starting with ideation.
I am going to hope that it shows on the screen. Great. Oops.
So the first one is the Chat Playground in Azure AI Foundry. Some of you probably have seen this interface before. This is a fantastic way to get started without even having to open up VS Code or an IDE, whatever IDE you have.
And one of my favorite features about this UI is this Generate Prompt feature. And in this feature, what I like about it is that you can type in a single sentence and give it some instructions in order to build a base system prompt. And so for example here, we can see this prompt that I generated earlier.
And basically what I asked for was a system prompt to be able to generate queries to help me search on the Internet. So we'll see what that looks like in practice. And I also asked for it to be in JSON output.
So I'm just going to copy that. Oh, Ctrl-A, Ctrl-C. And then I'm going to show you the second option you have for ideating.
And this is the AI toolkit for Visual Studio Code. And this is really exciting because this is a new extension that has been just released, I want to say, at Ignite. It's the first time I'm really having a lot of time using it.
And this is actually part of the wider Azure AI Foundry SDK, which is a unified SDK that allows you to be able to get all of the features that you have in Azure AI Foundry and to bring that into your local VS Code development environment. So for example here, if we want to be able to use this, I've already grabbed the extension, and we can go to the extension toolkit here and we can click on Models and we can see this Azure model catalog. And then from there, I'm going to go ahead and choose GPT-4o.
And once I've chosen that, I'm going to go ahead and paste the system prompt that I had generated earlier. And I'm going to ask, "Can you tell me what trends are" -- what camping trends, let's say camping trends, people like camping here -- "trends that are there in winter. " And then let's run that.
Hopefully it works. Trupti Parker: In winter, you want to be adventurous. Marlene Mhangami: Exactly.
And probably not while it's snowing. But here we can see that it generated a list of queries. And this query, these lists of queries is what we can use later on, you'll see, as we build out a more complex application.
So this is a great way to also start out locally in VS Code with a playground, a similar attribute there. Another tool that we can use is Prompty. And Prompty also gives us access to this same model catalog and allows us to be able to also do a little bit more robust prompt engineering.
And here we can see a basic Prompty file. And what this Prompty file does, if I press that button, it generates for me this basic Prompty. And it has on the top here some YAML, some information about the Prompty.
It includes the model configuration that I'm going to use. I'm going to use GPT-4o. And it includes the name of the person that created this Prompty.
And this is some context that the model will have and then also a question that we're going to ask. So to run the Prompty file, I'm just going to press the play button. And hopefully it runs successfully.
Great. It ran as expected. And so we see some output here.
We had asked it, "Can you tell me about your tents? " And it gets that information for us based on this context. And what I can do if I want to edit this, I can say the name should be Marlene, which is my name.
And then I can say, "Give me more information. " And add emojis. More emojis, but I think that should work.
And great. As you can see, that took effect. That worked.
And it's saying, "Hey, Marlene. " And it's adding more emojis. And it's giving us more information.
So this is a great way for you to start iterating on your prompts. You can start doing some prompt engineering. You can also go ahead and test this out directly in Prompty with those different models.
And then compare and see, okay, how does 4o perform here? How does the Cohere model perform here? And that allows you to make better decisions for your applications.
So something that I also find developers can find difficult is being able to take a prompt. Now you've written your prompt, but how do I actually put it into the code? And a great solution that we have with Prompty is that you can just left-click here, and then you can say "Add Code".
And then you can choose which code you want. So, for example, we'll use the Prompty runtime in this example. And here, the code is immediately generated for us.
So this will allow you to run the Prompty directly in your code. This, in particular, I'm looking for, this one is the line that actually executes the Prompty file. So the Prompty is the basic Prompty there.
And we can also go ahead and add some Langchain code. I love Langchain personally. It's one of my favorite frameworks.
And here, our Langchain code is generated for us to use right away. So this is great. What we've seen is three different ways to ideate.
Then we've seen a way to go directly from your prompt into code. And now, what do we do when we want to scale up? We want to take our code, and we want to actually add infrastructure, or we want to make it so that we add new features or different functionality.
Trupti mentioned earlier the Azure Developer CLI, which is a tool that we have that's available to developers. And just with these commands on the screen to the left, you can get started once you run those commands and you run the command azd up, you can get started with moving your code directly from your local dev environment and into the cloud with just that one command, which is really cool. And then another thing is that you have some templates available to you.
And with azd, if you just run, I will show you the command that you can run, we have this templates gallery. So if you use this specific URL, you get all of these really pretty templates that you have available for you to use. And today, in the rest of this talk, we'll be looking at Creative Writing Assistant, which is a multi-agent solution.
So that's the one we're going to be looking at. And to be able to install it, you just need to run this command, azd init -t contoso-creative-writer. And it will download the template code for you.
And as you download that code, this is what is included. So the Creative Writing application is a multi-agent application that allows you to be able to generate blogs by running the parts in this diagram that you can see on the screen. And this application has several agents included in it.
The first is a research agent. And this agent generates queries, like we saw with the Prompty earlier. And what it does is it generates those queries and then sends them, chooses which functions to run to be able to call and get information from Bing Search.
And then after that, it grabs that information, those trends from online, and then also pulls in some products as well from an Azure AI vector score. And then it combines all of that research from online and the products into a helpful article using GPT-4o. So it writes that article out with that.
And then finally, it uses an editor agent to be able to read the article and decide whether or not the article should be accepted. So we're going to take a look at what that looks like in a minute. So, wait, I'm going to switch here.
One thing that is helpful, part of this is that if you look at this diagram, we have a number of resources. And if it was just Marlene putting together this application, deploying those resources would take a super long time because I'm not a DevOps person, not an infra person, and I don't want to have to think about it, like deploying the resources. And so part of using these templates is being able to have these resources deployed on your behalf.
And one thing that you can do is when you run that command azd up, those resources are automatically deployed for you. So here is a list of all of the resources. I just ran azd up.
I ran this command that I had earlier. And then I ran azd up, and then these resources were immediately provisioned for me. And we can take a look at the website that was generated by clicking this link.
And it will take us to our container app that has a URL. And when we click that URL, here's our application. And we can click on the example button there.
And when I zoom in, you can see some information. So we're first asking it, "Can you find the latest camping trends and what folks are doing in winter again? " We're asking that's the research we want to include in our article.
We're also saying, "Can you include some tents and sleeping bags? " Because this is an article that's going to help an outdoor company that sells outdoorsy things. And then finally, we're saying, "Can we have an article between 800 and 1,000 words?
" So to be able to see this in action, let's click the "Start Work" button. And hopefully, if the demo gods are in my favor, it will work. Oh, it's kind of working, but not quite.
This seems a bit odd. Trupti Parker: Yeah. Marlene Mhangami: Can you see that the article is being cut off?
This doesn't seem like a good thing. This seems like a bad thing. Something is going on.
What are the tools that we have to debug this problem? Trupti Parker: Yeah, let's look into that. But firstly, I want to call out that when you did the azd up, it was amazing to see how typically it would have taken weeks to set the infrastructure.
But now it was set within a couple minutes. So as we take a look into the code, let me give you a glimpse of what's actually happening in the backend. As Marlene was talking about, so it's a multi-agent application.
We have multiple agents for writing, for researching, and then finally editing to making sure the product that is getting generated, the article that is getting generated, is as per the standards they are requesting for. So as you can see here, for example, the writer. prompty, it basically gives the idea about you are an expert writer, kind of provides the context of what kind of article it should write, and then generates the article as required.
We have also enabled telemetry to make sure that you can get visibility into your system. At Ignite, we are launching tracing in Azure AI Foundry to help you provide more visibility into the system to find and fix the issues. Let's take a quick look at how it is set up, and then I'll walk you through the key capabilities that it has.
As you can see here, with just this one line of client. telemetry. enable, I am able to enable telemetry for not just our inference APIs, but also the Azure OpenAI and then Langchain.
It's incredible because now our tracing is orchestrator-agnostic, which means whether you are using Azure OpenAI or you're using semantic kernel, Langchain, or any other orchestrator tools like Prompty, you will still be able to use the same tracing capabilities. The other cool thing is that we are also sending the data to Azure Monitor, which we'll kind of see in a bit what kind of magic it creates. So now that I set up this telemetry, let me actually see what's happening in the backend.
We have multiple tools to help you kind of visualize this telemetry while you are in the local debugging phase. As a developer, it's crucial to understand and see this debugging right into your local development phase as you want to make sure your product is running in a healthy fashion. So if we kind of go into the Prompty trace viewer, which gives you a visibility into your traces, right here by just downloading the extension of Prompty, you'll be able to see the traces.
If I click on one of the traces, let me give you a quick walkthrough of what's happening in the backend. So if you see, firstly, the research article gets called, and it kind of does the research on what are the latest trends that are happening in the market related to article, and then it kind of creates an article on that basis. It sends that information to the writer's agent, and then writer's agent writes that article.
We can here quickly get a visibility on the total time, prompt tokens, and the completion tokens that are being needed in each of these calls. As something was failing in the writing part of the article, maybe let's take a quick view of maybe something would have gone wrong there. Here, I can see that the max tokens that I was sending was just 20.
That seems a bit too less for an application to generate a 1,000-letter, 1,000-word article. Marlene Mhangami: Yeah. Trupti Parker: So maybe let me fix that, and let's see if that kind of solves the problem.
I'll change that in the Prompty for the max tokens, and let me rerun the application. While this application is also running, I'll give you a quick glance at what's happening in the back. When the researcher agent is called, it does the research.
It kind of finds the information from the Bing Search API and then kind of sends that information to the writer's agent, and the writer's agent then actually writes the article. Oh, sorry. Our demos aren't -- demo gods didn't work out.
Marlene Mhangami: Yeah. Trupti Parker: Yeah, sorry. So it takes a bit of a while to actually generate the article, as it's a 1,000-word article.
Let me give you an idea about some of the other tools that we also have for tracing. So along with the local debugging tool of Prompty TraceWeaver, we also have Aspire Dashboard, which is an open-source, open-telemetry-based tool, which helps you give visibility of our tracing right here in the local debugging option. Our tracing is also open-telemetry-based, which means it adheres to the semantic convention format, which means you'll be able to get consistent traces across your distributed systems.
Not only that, we are also able to attach model evaluations and human feedback to the traces so that later you can do user-driven improvements on your application. As this application is running, it will help us generate the new article, which will be updated as per the requirements. So as you can see here, it's kind of fetching the information from the Bing search API, and then the marketing agent will actually start correlating and creating embeddings on top of it.
Once that is done, then it will finally generate the article. Marlene Mhangami: Great. While it's generating as well and the updates are going through, I can talk about evaluations.
And evaluations are also a big part of making sure that the output of the LLM is high-quality. So Trupti mentioned before that the issue here was that we had a low number of tokens. We only had 20 tokens that were passed through to the writer Prompty.
Because of that, the article was being cut off. And so what you can see on the screen there is that each time the article is actually run, we have some evaluations that are also being run in the background. And the first group of evaluations that we have are evaluations of the actual text.
And we can see here on the screen that we have a relevance evaluation, a fluency evaluation, coherence, and groundedness. And all of these evaluations there, as you can see, are low. They're scoring fairly low values.
And the reason was probably because of that low amount of tokens. We also have new and improved in the newer SDK, the evaluations SDK, we have some content safety evaluations as well. So is there any harmful content in your text?
And if there is, it will also show up on the screen. We can see that nothing harmful is being generated, but groundedness is high because the researcher agent did its job. But then the others are quite low.
And I'm going to fix that. I'm going to just update that to 2,000. Then I'm going to rerun those evaluations by running the orchestrator, which puts everything together.
And Trupti can also -- I'll switch back so that she shows you the output. Trupti Parker: Yeah, as you can see here, it was just maybe an Internet glitch. The article that is being generated, it's 1,000 words, like how much we were expecting.
Then we can see that it's kind of as per the expected standards. So here, you can quickly get started with your local debugging solution, see, find and fix the issues, and then kind of move to the next phase of evaluations like Marlene was talking about. Marlene Mhangami: Yes.
So I will switch back to my terminal. And something we saw earlier, I showed you I'm running the evaluations in the background for the content safety and the text evaluation of the actual article after increasing the token amount to 2,000 and hopefully generating a full article. And something else we can see is that we've also added image evaluations.
So one of the really cool, I think, implementations of this is that when we run image evaluations, you can also filter out the images that are added to your application. So for example, I've added a "Upload image" button that allows you to upload any image that you have and see it will evaluate that image based off of a number of different criteria. Here, let me actually see if I -- yes, I ran this earlier on this image here, which is quite a cute camping image that I just generated.
And I saw that in the content safety scores, we have low violence, low harm content or anything. We also have these protected material present evaluations as well. And what that does is it looks for any copyrighted material, any sort of, I don't even know if I can mention some of these bigger animation companies, but any of their content in your image.
And if it does, it will also notify you about that and let you know that that is in your content. And what you can also do is click the button there and it will direct you to AI Studio. And in AI Studio, you can also take a look at those evaluations in real time.
This is the old version, so I'm not going to spend too much time on that. I haven't fully updated there. But AI Studio will give you a good -- I'm going to wait for the other one to run and we'll take a look at it there once it's run.
Anyway, I'm going to just go back here to AI Studio. But this is another option for us to be able to view. I'm just going to change it to int.
ai just so we go to the updated version of AI Foundry. And what it allows you to do is also have a really nice interface to be able to see those evaluations in AI Studio, so you can see them in graph format. And so here we can see, as expected, we have very low content that does not meet our standards for that image.
And if we were to change that image and just upload any other image, we would probably get a different result. So that is what we have for evaluations, Trupti. So something to note is that not only do we have evaluation capabilities in your local environment, but we also have evaluation capability in places like GitHub Actions.
So if you want to run continuous improvement or CICD pipelines, we have that option as well to run quality checks in that way. So, Trupti, can you tell us a little bit more about how to move forward in the production process? Trupti Parker: Yes.
So as we all know, that in the generative AI lifecycle, the crucial piece is iterating and experimenting again and again. So even though when you are evaluating your applications in your local environment, you also want to have the same kind of evaluations even when the applications are in the production environment. So we provide those capabilities through our Azure AI Foundry, through our local evaluations, as well as through our GitHub evaluations.
Now, when you're comfortable with your application and you want to take the next step and move to the production environments, we have an AZD GitHub config through which you can set your pipeline config and set your GitHub workflows to play around and deploy your applications right in the VS Code. As you can see here, I have these different buttons of "Deploy", "Evaluate", "Validate Actions", which I can hit and then I can really get started with my deployment workflows right through VS Code. For the sake of time, I already kind of started because it does take a couple of minutes to get it started.
So once I've clicked through it, let me give you a glimpse of how the evaluate workflows actually look in GitHub. As you can see here, we are testing different variants of the application. Here we have provided some data for grounding purposes and then we kind of see how the application is performing in your production environments.
The same metrics that Marlene had been showing you in Azure AI Foundry earlier, we are able to see that in our GitHub workflows in the GitHub environment as well. You can also take a look at the image evaluation results and finally, you can also download those evaluations and artifacts so that you can use it for further offline collaboration. Now that we think about the evaluation piece and we finally move your application up to the production environment, the next phase that is crucial in generative AI lifecycle is continuous monitoring and continuous evaluation piece.
So as you think of continuous monitoring, we want to think about how is the application running in production environment? How is it really? Is it kind of finding any issues with respect to gen AI or are there any other issues with respect to the other components in the application?
For that, we are also launching tracing in Azure AI Foundry. With that, you'll be able to connect your application insights data in the way that I had shown earlier and then you can start visualizing your data right here in our Azure AI Foundry tool. What's cool about it is that you will be able to get a magnified gen AI version view of your application through Azure AI Foundry.
You'll also be able to filter your data on the basis of if you have attached any evaluation metrics, you'll be able to also see those right here. If I click to any of these traces, let me show you how the flow kind of works. So you'll be able to see the entire flow intact with the gen AI specific applications and then you'll also be able to get input, output, raw JSONs that's allocated with it.
With that, you'll be able to get better rendering of your traces through our Azure AI Foundry tool. You'll be also able to kind of get the metadata that's associated with your application. With that, you'll be able to kind of get an idea of what was the SDK versions, what was the different factors when developing the application of the different SDKs, et cetera, that you have been putting in.
Not just that, you'll also be able to get a good view about the models that were being used in the application, the number of tokens that was used in the application and what's the duration of latency that's happening for the application. Along with that, you'll be also kind of seeing that you'll be able to attach the evaluations and see the traces with the evaluations right here. As a data scientist, it's very crucial to kind of get an idea about how is the application running, what are the evaluations that are happening and kind of trace the application accordingly.
So as a user, you may kind of put some user feedback in the application and we will be able to perform more user-driven improvements in the application by putting the user feedback right here. So once you get the view, that's not the end of the story. As we move up into our application development lifecycle, it's crucial to get a larger context view of the application along with gen AI, as AI is not everything of your application, right?
So from that perspective, we are also launching this AI Insights Workbook, which helps you get a better idea and a good flow of your entire application. You can kind of, out of the box, take a quick look of the prompt tokens, completion tokens, total tokens and the total calls that were generated for the application. You'll be also able to see the user feedback that was being sent so that you can perform user-driven improvements.
Along with that, the cool thing that I love about it is you'll be able to see what are the top traces that were failing, what was happening with that and then if I click through it, it redirects me all the way back to Application Insights so I can get a larger context view of my application right here through this view. Remember I was mentioning about the semantic conventions. As you can see here on the right, this format, these attributes are basically in those semantic convention format which helps it trace across the distributed systems.
So this kind of gives you a good idea and a better idea about the entire picture of your application. So once you're comfortable and you kind of get this view, the story doesn't really end there, right? We also want to think about the experimentation piece.
So what happens with that is that once your evaluation is completed, once you're doing the right monitoring in place, we are also launching A-B experimentation and customers can sign up for private preview for this. So along with your GitHub evaluation, you can now start testing different variants of your application, provide the right audience for that and see which features are really performing well. You can quantify and measure your data quality, understand if your application is performing as per expectation and then start experimenting even further.
Lastly, we also want to think about the responsible AI and security best practices. With Azure AI, responsible AI is not a theory, it's a practice. We want to make sure that we follow security, privacy and compliance across all the products that we build with AI being no different.
Throughout the life cycle of GenAIOps, this is something that needs to be taught while you are developing your application. You want to make sure you're following the right responsible AI practices and making sure you're leveraging content safety, data labeling and security and protection so that you are being worry-free for your data in your application. So now that we kind of think about this, Marlene, do you want to summarize GenAIOps life cycle?
Marlene Mhangami: Sure. So we saw a lot today. I just want to show you the final application.
It worked as expected. And what we learned about today was -- oh, and also to show you that that change actually worked and our scores and evaluations are much higher than they were initially. And what we saw today was three different options for being able to ideate.
That was the Chat Playground in Azure AI Foundry. We also saw the Chat Playground in the new and just recently released Azure AI Toolkit for VS Code. And we also had the ability to ideate in Prompty.
And then we also talked about using AZD-CLI to be able to launch our application and take it from the development phase into production. And Trupti showed us how to be able to debug our application using tracing and also monitor our application in production using all of the tools that she showed us today that are launching. And then finally, we talked about security and responsible AI.
And something that we're excited about is to actually see customers using this infrastructure in a real-world situation. And today, we have a really exciting spotlight from Dentsu who have leveraged Azure AI Foundry to be able to put into practice this tool chain for a business use case, a real-world business use case. So today, I'm going to welcome onto the stage Callum Anderson who is the Global Director of DevOps and SRE at Dentsu.
So welcome, Callum. You can come on stage. Please give him a round of applause.
Callum Anderson: Thank you very much. [ APPLAUSE ] Trupti Parker: Thank you so much, Callum, for joining us today. Hope you're enjoying Ignite.
Can you walk us through -- yeah, we'll get your slides on. Can you walk us through the business challenge that you're facing and how did your company navigate? Callum Anderson: Absolutely.
So at Dentsu, we do a lot of work in the machine learning and advanced analytics space. So clients like Microsoft rely on us to help them understand where's the best place to spend their next marketing dollar. And while this work is valuable, it can sometimes be quite time-consuming.
So sometimes the process to get an optimized budget can take a couple of weeks. Trupti Parker: That seems pretty daunting. How did your company navigate this situation?
Callum Anderson: So we looked at generative AI to help us solve this problem. And we partnered with Microsoft to build a multi-agent application running Azure OpenAI models, a mix of GPT-3. 5 and 4o, and we used Langchain and PromptFlow for orchestration.
So we already had quite a lot of maturity in Azure, so we were able to leverage quite a few existing systems in our organization. For example, our CICD and source code management was already happening in GitHub with actions and runners to build artifacts, so we could reuse that. And our application services were already being hosted in AKS, fronted by Azure Front Door and Application Gateway.
And everything is connected with private link and private endpoints. And then in the Azure AI Foundry, we've built out our agents. The agents we created collaborate to replicate the workflow of the marketer, analyst, and data scientist.
And we're leveraging a few more Azure technologies for this application too, such as search and a vector database in Postgres. And all of our AI infrastructure is behind API management, which allows us to better manage tenancy and track things like token usage. Trupti Parker: Wow, that seems very exciting.
Do you want to show us in action how this actually works? Callum Anderson: Absolutely. Let me show you the application.
So let's say I've come to this platform with a question about how to allocate funds for an upcoming media campaign. The intent agent picks up my business needs and starts prompting for additional information. And now the data science agent is able to take that information and run the appropriate model and parameter set to generate an optimized budget.
Awesome. The data science agent has now been able to produce an optimized spending plan, including a breakdown of how and where to spend marketing dollars to maximize the clicks KPI. There are different outputs, such as the charts you're seeing here, which are all created on the fly by the agent based on the response they get from our ML backend.
And you can also chat to the output to ask further questions, such as, you know, why spend in a particular partner might have increased. So this means that we can go from that previous workflow that took up to two weeks to just a few minutes and help clients like Microsoft better spend their marketing dollars. Trupti Parker: Wow, that's amazing.
Can you actually tell us what are some of the key learnings that you learned along the process? What are some of the challenges that you faced? Callum Anderson: Absolutely.
So I think there are three key lessons we took away from this engagement. Let's start with developer practices. When you have a system like this where agents depend on each other's output, error handling and debugging can be really challenging.
Good engineering practices help you mitigate this challenge. So testing your components thoroughly as well as establishing good logging practices so you can kind of see what's going on at each step. Trupti Parker: Right.
Callum Anderson: And you can achieve things like graceful degradation of systems by introducing automated fallback prompts to handle common errors, which has a bit more determinism to these kinds of systems. Secondly, data governance. So at Dentsu, given the nature of our business, we often deal with client-owned data.
So data governance is vitally important to us. We were really fortunate that we had an in-place governance framework that we could use, and that requires us to individually tenant each of the client applications. Without that, I think our team would have spent a lot of time individually evaluating the right decision permissions across data sets.
And finally, tooling. There are some really great tools available to you in the AI Foundry and in the wider ecosystem. For example, in the AI Foundry, you can see logs, metrics and traces for your completed DAG run.
And what we found was that our team used the AI Foundry quite a lot for things like demoing and collaborative activities. But because we're a pro-code team, we spent most of our time in Visual Studio, which has an integration with GitHub and a really great extension for working with gen AI applications. Trupti Parker: Wow, that's really exciting.
So this is a perfect example of how you already had some MLOps practices, and then on top of it, you had the best developer AI practices and data governance that you can start leveraging. So thank you so much, Callum, for joining us today. This was really helpful.
Callum Anderson: Thanks very much. Trupti Parker: Hope you all had fun at Ignite today. And hope you all learned something new today about how can you get started with GenAIOps.
Thank you so much.