All right, ladies and gentlemen, the moment you've been waiting for. Please welcome John to the stage. Thank you so much for the introduction, Sheamus. And it's nice to see a packed house for this agentic AI conference or Agentic AI workshop. It isn't necessarily an Aentic AI conference, but it does feel like it a lot of the time, doesn't it? We're going to start off in module one with defining agents and then You love coding exercises. So Ed will lead a coding exercise um uh in the second half of this module in which you will recreate
deep research. OpenAI's deep research. You'll recreate that using the OpenAI agents SDK. In module two, we'll talk about design principles for building effective agentic systems and Ed will lead a coding session where you will create an engineering team of agents using crew AI. And then finally in the third module we'll talk about how To be developing agents. We'll talk about MCP from anthropic which people are really excited about today. It seems to be one of the hottest topics in data science right now. You will develop alongside ED an autonomous uh a set of autonomous traders.
So you'll actually use the engineering team from module 2 to code up autonomous traders that will uh act on simulated financial market data. And uh we should point out right now that we are not financial adviserss And uh any uh advice that your agents provide you on financial transactions uh Ed and I are not liable for those. Uh you did it with your agent. Sue your agent. Okay. All right. So let's start off with module one on defining agents. So it is a tricky thing. I just came from the keynote next door and it was
pointed out that this there really isn't a clear definition but for the purpose of being able to talk about this in a session like this on agentic AI we are Going to give you a definition and this one actually comes from anthropic. So AI agents are programs where large language model outputs control the workflow. So hopefully that sounds kind of intuitively sensible to you. In practice, this describes an AI solution to some kind of problem that involves any or all of the following. Multiple LLM calls LLM with the ability to use tools, and I'm going
to talk about what tools mean. There's an environment where LLMs can interact or agents can interact together. There's a planner that coordinates activities and critically this is what makes it agentic. It has autonomy. So it may have some guard rails but you're not dictating exactly everything that should happen. The agent has some autonomy to figure things out on its own. So again any or all of these could be common in an agentic system. All right. Who knows This guy? Andrew Ing. All right. Yeah. Basically everyone knows who he is. So, Andrew Ing was recently on
my podcast in episode number 841. And Andrew Ing believes that there's unprecedented opportunity in 2025, this year, to derive business value by focusing on building AI applications that use agentic workflows. Specifically, the anecdotal evidence at my consultancy, Y Carrot, suggests that he's right. I don't want to understate This and it is early days for the company, but 100% of conversations that I have with prospective clients leads to next steps. There isn't anyone who says, "Nah, this isn't what we're looking for right now, or we'll get back to you in 3 months or 6 months." Everybody
says, "Great, let's get into an NDA so I can explain more." And then once they do that, they say, "Let's get into a master service agreement so we can start getting to Work and flushing out a contract, and then let's get that contract going as quickly as possible. I can't wait to get this agentic solution, this AI solution in my platform. Although that said, not everyone is convinced. I should give that, you know, basically except for this slide, we're going to be 100% gung-ho on Agentic AI this whole time. Who knows Andre Burkoff? He wrote
the 100page machine learning book. Yeah, we've got some hands up. So Andre Burkoff was also on my show recently in episode 867. And Andre believes that LLM's agentic systems, including the kinds of multi- aent systems that we'll be talking about in today's training, are overhyped. I'm personally more in Andrew Ing's camp. Um I there is certainly some overhype, but the LLMs that power agentic systems today, they do have flaws and they do have limited capabilities, but those are being Overcome at a crazy pace. So here is um a chart of some of the leading models
uh GPT401 uh Sonnet 3.5 Gemini 1.5 from Google um showing their performance on different benchmarks. The key one that I want to highlight is humanity's last exam HLE which is shown in in white and black there. So HLE is a multimodal benchmark. It has 2500 very challenging questions across over a 100 subject areas curated by a thousand subject matter experts Across 500 institutions, 50 countries and they maintain a private test set of held out questions to assess model overfitting and of course prevent training to test. Ah I do see people taking photos of the slides.
So you can actually um there's a GitHub repo that Ed will bring up shortly which will have a link to the slides. You can also go to my website johncone.comtalks. Uh so johnccone.comtalks and there you can also access the Slides. If you're watching the recording in the future you can make your way to today's date which is May 15 14th May 14th uh 2025 and there's a link uh to the slides from my website there. So the whole point of HLE of humanity's last exam was to create a benchmark that is very hard for today's
LLMs and it's supposed to take many years to overcome. And so you see that all these leading models the best performance you're getting from 01 which is this um chain Of thought model that can reason that can uh that can reflect on its responses and come up with better ones. It performs at 10%. The all the others are around 5%. And so the idea was that this should last a long time that humanity's last exam should be difficult to conquer. And yet when you contain an LLM within an agentic workflow that allows step-by-step chain of
thought processing, checking accuracy of work, real-time internet searches like OpenAI Deep research, which you guys will be recreating today using the OpenAI agents SDK. Boom. All of a sudden, you have over 25% accuracy. And there's lots of extra headroom in that because you can extend your inference time. You can compute for longer on your responses. And so mere months after humanity's last exams release, it suddenly looks conquerable. And that's thanks to agents. So you have this uh this wind at your back of LLM capabilities and then Wrapped in agentic frameworks, they're particularly powerful. Here's another
chart um an exponential chart. So if you check out the y-axis, you can see that it goes from 1 second, 4 seconds, 15, then a minute, and then you're into hour ranges quickly on the y ais over here. And um the point of this chart is that it's showing how cognitive tasks that would take a human, you know, in the bottom left corner, just a few seconds or around here a few minutes, and now We're starting to get closer and closer to an hour. cognitive tasks that would take a human that kind of timeline are
being handled with a 50% success rate by today's models. So, we're inching closer to an hour. And so, people will use this chart to justify that artificial general intelligence, an algorithm that has all the capabilities of a human, is merely months or years away. Now, I wouldn't take it that far because there's a number of limitations here because we're Talking about a 50% success rate. that's not useful in a lot of enterprise applications. Um, and this is actually specific to tasks that you can easily create multi-step training data for. So, math problems, machine learning problems,
coding problems, those kinds of things. Yes, more generally, there's still a long ways to go. I think we're at least years away from AGI, but nevertheless, the point of this chart should be exciting for you because You're interested in building agentic systems. And so you have this wind at your back of all of this power that's getting better all the time. So my general advice to you is that it's never been a better time to develop AI agents. There is unprecedented opportunity for creativity and impact because of this. My overall guidance is that while agents
aren't perfect or suitable for all enterprise use cases today, there are a ton of use cases that they Are useful for. So, especially on repetitive tasks, multi-agent systems can be inexpensive to design and deploy thanks to the kinds of techniques that Ed will be showing you today. They allow you to substantially augment, improving the outcome of business processes, or to even fully automate tasks today that are done entirely by humans. It's like robotic process automation, RPA, on steroids. And it allows measurable operational metric improvement and or ROI return on investment over condensed time frames. And
so if you can find some lowhanging fruit, get that operational improvement, demonstrate that ROI, that creates a flywheel, an AI ROI flywheel where management says, "Wow, look at this improvement. look at this ROI from that small agentic project that we invested in. And so then they'll invest more and you can tackle bigger and bigger projects, get the I the AI ROI flywheel spinning faster and faster. And Just in terms of other broad reasons why it's an unprecedented time to be developing automated systems and projects obviously advances in LLMs um like I just talked about in
the preceding slides as well as the kinds of frameworks that we have access to things like PyTorch Lightning and Hugging Face and the Agentic platforms uh agentic frameworks that Ed will be showing today. Cloud compute and infrastructure like AWS, Google Cloud, Azure allows us To scale more rapidly than ever before. Serverless computing and containerization techniques like Docker and Kubernetes simplify deployment and management of these automated systems. Data massive open source data sets are available online repositories and increasing digitization across industries means that we have way more data all the time. It's something like every 6
months the amount of data on the planet doubles which is insane and Provides you with lots of fodder for your AI systems. Um, open source software, GitHub repositories, extensive documentation allow for easy sharing, reuse, and collaboration on automation projects. Lowcost hardware mean that you could have sensors, microcontrollers, Raspberry Pies, edge computing devices that the agents are taking action with or recording data from. Uh, that make it you know the the the the landscape of potential um is uh is accelerated by This lowcost hardware. Uh user-friendly development tools. Uh so drag and drop programming interfaces for
people who don't code are increasingly common and even just the tools that we have like Ed will be demoing today. They make it so easy to build agentic systems and it will get easier and easier as we develop more and more abstractions connectivity. So high-speed internet, 5G networks, wireless networks. It won't be long before everyone on the planet has High-speed internet. Um allowing these kinds of agentic systems to make a big impact everywhere in the world. In some countries, in some regions, there's a supportive regulatory environment. In the US, where we are today, uh, governments
and regulatory bodies recognize and support technological innovation in particular. There's strong market demand, as I've already been describing. Uh, you know, at Y Carro, we are experiencing this. The market wants These kinds of solutions. And due to things like labor shortages in a lot of countries, uh, desire for productivity improvements, safety requirements, cost reduction pressures, the this is a huge uh factor. um that's supporting you and finally availability of educational resources like this training like the open data science conference um through online courses tutorials like Ed's Udemy course you get it free or very inexpensive
training on the most cutting Edge approaches so these are my 10 reasons why it's never been a better time to develop AI agents collectively these advancements and conditions uh yeah make it an exceptionally favorable time to be implementing automated solutions ions with agentic AI as some examples in case uh you know you're looking for some you can talk to your favorite LLM uh to get ideas you can describe your particular business uh and it will provide lots of agentic ideas For you uh but to get your brain cells churning um code generation is a straightforward
example so generating pseudo code um checking for architectural soundness uh transforming that pseudo code into actual code all of these things can be handled by agents. In fact, as you'll see in module 2 today, you can have agents be a whole team of software developers working for you on whatever software development task. Medical diagnosis is a cool one That potentially has a lot of impact. So, you can have specialized models for neurology, for uh for dermatology, whatever kind of situation that's multimodal, can take images, video, can hear the patient's voice and be providing uh better
medical diagnoses than ever before. a scientific literature review. You can have research papers analyzed by multiple LLMs focusing on different aspects of the paper and then you can actually have an An agentic AI system a multi- aent system design and run experiments for you. So this has already been done for example in machine learning where you know some budget is provided to the AI system to do a literature review come up with experiments and run those experiments on compute on cloud compute and then write up the results fully autonomously and it won't be long before
that kind of thing is happening in wet labs as well. So there's already talk of this. There are people working on it. Big pharmaceutical companies for example can have agentic systems designing experiments not just to run in silicone on computers but run in the real world uh using you know robotic arms these kinds of things in order to run biological chemical experiments in a wet lab uh fully autonomously and then write up the paper. As you're getting the impression, we can replace more and more Tasks starting starting with things like uh customer support. And so
more and more tasks can be handled by agents. It may be possible in our lifetime, maybe in a decade or two, to have a billion dollar firm with no employees. Seriously, it's possible. All right, that got a chuckle, but uh I wouldn't be shocked. Maybe we need someone to sue. So then there needs to be a human to sue or maybe we'll have legislation that agents are people too. Um all right so we already kind of gave a broad definition of what agents are. Let's uh get into that in more detail. We're going to we're
going to get into even more detail as this uh training goes on. But let's start by um bifurcating agentic systems into two specific types. So the first one is relatively simple. We can call these workflows. These are systems where LLMs and tools are orchestrated through predefined code paths. So there's a Relatively high level of control that the agentic system designer has in this workflow case. In contrast, agents, and this is kind of funny because they're both under this agentic systems umbrella, but within agentic systems, you have workflows, which are relatively simple, relatively constrained, and then
agents proper. And these agents proper are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they Accomplish tasks. So, there's a greater level of autonomy here that allows Anthropic to call these actually agents. And later on in this training in module two, I'll start off by going into examples of these kinds of workflows and go into agents in more detail. But you have this in your back pocket now as just kind of a frame of reference. So when Ed or I are talking about workflows, you have this in mind
as relatively Constrained systems within the broader agentic systems uh umbrella. Let's talk about tools. I've mentioned tools a number of times. Let's uh concretely define what these are. They give LLM's autonomy um because they give an LLM the power to carry out actions like querying a database or messaging other LLMs or running the robotic arm in a biological wet lab. And this might sound spooky. So it might sound like, oh, what if I arm the OpenAI agents SDK with a tool that Is accessing the database on my computer? Open AAI can reach right into my
computer. That might sound kind of scary, but as Ed has worded it, the reality is more mundane than that. So, uh, the popular perception of tool calling is that an LLM can reach into your computer, access your file system if it needs to. And so, it would be something like this. This isn't what happens. Don't memorize this diagram. This is the misconception diagram where You have some code um that provides a prompt to an LLM. So the code over here provides a prompt to the LLM and this is the the misconception is that the LLM
then executes uh the tool use say in your file system on your computer. It gets a response and then provides that response back to the code. In fact what happens the mundane reality is that an LLM responds with the actions needed. So the code or you send a prompt to the LLM. The LLM provides a response back to Your code and then that executes the tool and so it is much more constrained. Here is an example that Ed came up with that I love. So this is you can right here. This is the entire chat
with the chat GBT. There's no other instructions in the background. You can just write a chat right now where you say you are a support agent for an airline. You answer user questions. You also have an ability to query for ticket prices. That's a tool. You're arming it with a tool. Just Respond only. Use tool to fetch ticket price for London to retrieve the ticket price for London or for whatever city the user names. Here is the user question. User, I'd like to go to Paris. How much is a flight? So that you are arming
this simple agent uh with a tool here with your explanation. And so it replies back use tool to fetch ticket price for Paris. So that is like here in this diagram. So that would be the response back to the code and then the Code uses the tool some kind of ticket price checking API in order to get that price for you. Now um despite these kinds of constraints there are risks to agent systems even the workflows which are relatively simple but then especially when we get into agents proper there are risks. Agents can be unpredictable.
So they can head on an unpredictable path. They can have unpredictable outputs and they can have unpredictable costs. But Thankfully well planned monitoring and guard rails effectively mitigate against this risk. So you can monitor your agent agentic systems in real time. You can even have agents monitoring your agents to be making sure that they're not misbehaving in some way or that users aren't misbehaving in some way. And then on top of that, you can have guardrails that ensure your agents behave safely, consistently, and within your intended Boundaries. So, Agentic AI frameworks in particular, I'm going
to uh cover some now, and Ed will obviously get into them in detail. They help us offset risks, including by setting up guard rails and by having structured outputs. So, new frameworks are coming up all the time, but on this slide, I'll cover the most important ones today. These first two they're this they are the simplest of the frameworks that I'll be discussing today. So having no framework obviously That's the simplest there's no abstractions you connect to LLMs directly via APIs and the benefit of this is that you see exactly what's going on under the
hood you control every aspect of prompting or using the agents. The other um very simple uh kind of framework which actually isn't a framework at all. So MCP model context protocol from Anthropic. This isn't a framework, it's a protocol. And so it's extremely popular today, released in Late 2024, but exploded in terms of GitHub stars in the past couple of months. It's open- source and it's a standard protocol for connecting agents with data sources and with tools. So it eliminates the need for glue co for glue code. you can simply conform to the MCP protocol
and the common analogy is that it's like a USBC USBC uh for connecting you know you know that you use for connecting any kind of hardware you just have this one specific Format MCP is the same idea for agentic workflows and agentic systems and in module three this afternoon Ed will show you in the final project how to conveniently link lots of components together using MCP The next layer of complexity in terms of frameworks, these are actually frameworks. That makes it that's a good uh uh bonus here. And so the OpenAI agents SDK, this is
Ed's favorite. You'll be Starting out with it in the project that's coming up in just a few slides. Um in fact, I think it might be the in the very next slides you'll be getting into uh doing this demo. Um, so, uh, after this slide, I'll be passing over to Ed, and he'll show you hands-on how you can use this to recreate OpenAI's deep research. The OpenAI agents SDK is lightweight, it's simple, it's clean, it's flexible, and it's new. It actually came out after we created the syllabus For this course today. Um, and so, yeah,
brand new, but very popular, very useful. Crew AI is also a favorite of Eds as well as mine. I'm friends with Rob Bailey, Tony, and Jiao uh who are uh in the leadership or the co-founders in the case of Rob and Jiao uh for Crew AI. They're burgeoning partner of my consultancy Y carrots and this this uh framework has been along has been around longer than the agents SDK and it's a bit heavier weight than the SDK. uh and It's designed as the name suggests it's a framework especially designed for multi- aent systems a crew
of agents is where the name comes from and uh it allows a lot to be handled through configurations so YAML files as opposed to code um and it will be used today again in module 2 to create a crew of complimentary software engineers that can handle a software engineering project for you in module three you'll use that team of engineers That crew of engineers to create um a trading uh platform or to to execute trades to to come up with trading guidance that again Ed and I are not uh financially responsible for legally responsible for
for you. Finally, the top layer of complexity um we have Langraph and so this is from the same people that make Lang Chain in the top left and then Microsoft Autogen and so both of these have a steeper learning curve but can be powerful. um it feels Less like you're interacting directly with LLMs than with uh than than if you use any of these other frameworks or not frameworks. Um and we won't have time to cover Langraph or Autogen in detail today, but Ed does cover them in his 17-hour complete AI agents course in Udemy.
So you can you can check that out. But we we only have four hours today. So we couldn't we had to pick what we thought were the most valuable uh and most interesting most uh timely Frameworks for you to be working with. So yeah, OpenAI agents SDK, Crew AI, MCP we will be digging into today. And in fact with no further ado, Ed, come on up and get us going on the agents SDK. Well, hello everybody. So, I want to start what I have to say by giving you all a piece of life advice, a
tip for your life. And that is you always want to avoid situation in life when you are up presenting in a room full of people like this and you Are following this guy. This this is a tough act to follow. John, that's amazing. That's a fabulous introduction. Thank you so much. I as I say, horrible act to follow. Uh and if you're not already subscribed to Super Data Science, then then honestly, you need to subscribe now. You need to watch all of the How many episodes is it now? Coming up to 900 almost 900 episodes
that will keep you busy for your journey back. But hello everybody and let me get Started. But I'm going to do a little uh introduction first as well. Myself, my name is Ed. Uh I am, as John said, the co-founder CTO of an AI startup called Nebula, which I can tell you all about if you're interested later. Uh and I spent most of my career at JP Morgan. Uh anyone here from JP Morgan? Oh, lovely. Oh, yes. Wonderful. I see. Is that Mimmude? Yes, it is. is that at the back is one of uh sorry
is is a great former friend and colleague from JP Morgan. Wonderful. Uh so current friend former colleague. Thank you John. Uh and uh wonderful. And uh yeah so I started out in uh in London and then I moved to uh Tokyo and then ended up in New York. And before Nebula, I founded a company called Untapped, which was another AI startup. And Untapped was actually acquired a few years ago. And this is a picture of the announcement in Time Square a few years ago of our acquisition, which was a super magical Moment for me. And
I show you this because ju just to prove it, I want everyone to squint. And if you look really closely, you can see me. And my picture is right there. So take a look at that. And I also show this to you because not just me, but John is there too. That is John right next to me cuz John was part of this startup untapped as well. So we are both here on this picture. And yeah, you can bring up the slide on your computer and zoom it in if You don't trust me. Uh and
uh this is another place where I am. I'm on LinkedIn and some people are quite koi about connecting on LinkedIn and they say, you know, connect with me. I'll connect back if I recognize you. I am not koi. I I welcome all LinkedIn connections. I love building a data science community. So you should feel free to connect with me and if you send a message I will reply. People are suspicious that I've written an AI agent That writes the replies for me. And I assure you I have not. If I reply it is me. I
will confess that that I have mapped some keys. So some some of it might be copy and pasted. At least 50% of the content will be really from me. So please do reach out. do connect with me and I will get back to you. And I also include there's like obligatory personal life picture. This is me showing me in front of a plane that I had just flown. You might be thinking This is me saying I'm really good at flying planes. Uh but quite the opposite. I want to tell you that my great skills when
it comes to LLMs and agents are only surpassed by my complete inability to do anything that requires hand eye coordination. So, so if at the end of this you find yourself on the on the flight back and you look in the cockpit and you see that it's me there with the with the stick, you want to be looking for a parachute. But but if you find yourself in a conference and I'm talking about agents and LLMs, then that is my wheelhouse. You're in the right place. All right, that is my intro. It's finally time for
us to get to the detail and talk about OpenAI agents SDK, the first of the frameworks we're going to look at. And the great thing about it is that it's really lightweight. It's simple. It's it stays out of the way. It's not opinionated. It lets you do things the Way you want to and it doesn't require you to to go through a big learning curve to get there. And yet at the same time, it makes difficult things really easy, which is exactly what you want. So for anyone that's used like OpenAI's chat completions API to
to use tools, you've had to write like a lot of JSON gump that you have to put together to tell OpenAI how to reach back and call the tool as John described it. All of that is just taken care of for you by OpenAI agents SDK and it tells you what it's doing so you can see it. Nothing is hidden from you. Uh but it makes it all super simple. And as John said, OpenAI agents SDK is definitely my favorite. Love it. As you will see, Crew is coming right up behind it. Crew is second
favorite, but this one is just the best. I love the fact that it has minimal terminology. So you have agents. That's one of its words, an agent. that is something that represents a autonomous Interaction with an LLM. You have handoffs. That is the term for one agent passing control to another agent. And then you have things called guard rails. Super important. As John had mentioned, this is one of the critical controls you need to put in place to be able to deploy agentic systems to production. And guardrails are built into OpenAI agents SDK. If you
want to build an agent system in open agents SDK, there are three steps you have to follow. And We're going to do these three steps in a second. So keep them in mind. First of all, you create an instance of an object called agent, which is going to be your agent. Secondly, you use a context manager called trace to describe what you're about to do so that you'll be able to look at it in monitoring tools and see everything that's gone on. And then thirdly, you call runner.run. That is how you kick off your agent
workflow. And so with that, it is time for us to Do some coding. We're going to do some coding of a of a uh building a deep research project just as John described, which is such an important such a foundational use case of Agentic AI. Uh but I have to give you uh just just a little bit of prep for this. So, first of all, I have some good news, which is why I'm showing you this picture of of like a cake here, which is that John and I have some real treats in store for
you. The three projects that we've lined Up are delicious. We've baked something great, and I can't wait to show them uh to you. But there is some bad news. You know those those those cooking shows where they put a bunch of ingredients on a baking tray and they sort of made them all together and then they put them all in the oven and then they say and here's one that I put in the oven earlier and they bring it out and he say hang on a second that looks completely different. Uh so I am going
to be doing a bit of That. I have to warn you. Uh it's because this is a 4-hour session and we're not going to be able to go through everything and do all of the the coding in in gory detail. And you'll have to trust me uh that that that I've been I've been honest. I really have run this myself. But you don't have to trust me because we put this all in GitHub and we would like to ask you to go through and do this yourself, but do it afterwards. You could bring it up
here and watch While I'm doing it, but I don't think it's realistic to say try and execute it while I do because some people might have problems and there's going to be too it's going to be too hard. So, I'd say follow what I'm doing. I'm going to give you intuition for why I'm doing what I'm doing and and what's going on and then later come back and do it yourself, but please do it yourself and then contact me or contact John if you get stuck. We are here to help you. We Come with a
package. That's what you get for coming and putting uh four or six hours of your time today into this is that we are around to help you out. So, so connect with me or or or send me a message on LinkedIn or wherever and I will help you right away because it's so cool to imagine everyone being able to take advantage of this and running it yourself. And as I say, we're going to be moving pretty fast through this. So don't don't worry if you don't pick Everything up because you'll be able to later. And
the final little caveat I'll give you is again to tie to something that that John had explained, which is that agentic systems are by their very nature somewhat unpredictable. And so out of all of the demos that I ever give, this is some of the hairiest because I never quite know what's going to happen. So you have to bear with me. If if things go crazy, if the agents don't behave, if our engineering team Decides to go and build something completely different, we'll have to figure it out together. Maybe maybe they'll build a bridge. uh
uh we'll we'll we'll be there together. So So be prepared for that. All right. And with that, we're going to get in and we're going to bring up some code. And this here, if if you went to our GitHub and you cloned our GitHub, this is what you'll be faced with. So uh before anything, I want to do a quick visual Check. So, this is going to be a bit challenging, but can I ask can I can I have someone in the back row here? You are in the worst possible place. How can you see
is this is this legible? It's a that's a yes. Okay. Okay. Excellent. All right. Thank you. If it ever stops being legible, then please shout and I'll make it bigger. So, so when when you've uh cloned the repo, this is what you'll get. You'll see a readme, which is where We have a few details about this. Uh, and uh, yes, there is a link to my Udemy course. If if you uh, this if if you decide you want 17 more hours of this, then then that's where you'll find it. And then there are setup instructions
for Windows, for Mac, and for Linux. And they're super clear. Now, I've done this in cursor. So, we're using uh, an HTKI platform for coding, but you can also use VS Code or whatever you're comfortable with, of course. Um, but we Will be using Cursor for this today. And it's wonderful how often it fills in exactly the code I'm about to do. There's also a set of guides. There's a troubleshooting guide, too. So, there's lots of stuff. But once you've got all of this set up, there's three folders, deep research, engineering team, and trading floor.
Come into deep research and then select the first lab, lab one. And that is where we will begin with the OpenAI agents SDK. And we're going to Start with uh an agent equivalent of a hello world example uh which is asking an agent to tell us a joke. Something which LLMs are perhaps not wonderfully skilled at yet. I don't think it's part of humanity's last exam, but we will see. All right. So, you remember that there are three steps to opening our agents SDK. We're going to do them right now. I'm going to start uh
by running some imports. I'm also going to load in my secrets, my environment variables. If You're not familiar with loadm, it's explained in the readme. All right. And now this line here, I'm going to hide the files there. So hopefully everyone can see this. This this is the first step, the key step. We are creating an instance of agent. It's being called jokester. We give it a name. We give it something called instructions. And instructions is basically the system prompt that will always be used for this agent. So instructions is their name for A system
prompt and we're saying you are a joke teller and we give it the model that it should use under the covers. This is uh the uh GPT 4.1 mini that we're going to use today. One of the the latest models from OpenAI. And in the setup instructions there are guides for how you can switch this out for DeepS or Gemini or Grock or whichever models you most prefer working with. So we run that. So we've just created an instance of agent. That was pretty easy. So now We have the other two steps. So we're going
to set up a trace telling a joke so that we can monitor this. And now we call runner.run. We pass in the name of the agent and we pass in sorry a reference to the agent and we pass in what is effectively our user prompt. Tell a joke about autonomous AI agents. And then we will print the result. And that is all it takes to get an agent running. So let's see what it says. Why did the autonomous AI agent bring a ladder to work? Because it wanted to reach new levels of self-improvement. Okay, I
guess uh yes, we're not not yet ready for uh uh artificial super intelligence just yet. We have some ways humans are still uh still at the forefront here. Um but that is how easy it is to make your first request to an agent. And now I've got a link to bringing this up in the tools that OpenAI provides so that we Can look at a trace of what happened there. So here we go. There's a line for telling a joke. You can see I I told one or two jokes before this. And uh we now
click here. There's just one row here. And you'll see over here it's showing us on the right. Um that's probably hard to see, but let me zoom in a bit. There we go. Uh you can see that it shows the system prompt, it shows the user prompt and no surprise, this was a very trivial example. It was one call to an LLM. You See the system prompt and the user prompt and you get to see it in traces. But never fear, we will have some more sophisticated traces coming very soon. All right. So let's go
back here again. That was uh a simple example. We are now going to go and build ourselves a deep research. So I want to start by just saying again this is such a huge example. This is so important. John showed you what happened to humanity's last exam when we started to Incorporate deep research. So so this is a a huge deal. Uh and uh it can be applied to any business area and you can find ways to take this and apply it to your business area. So you should be thinking about how could you make
a specialist deep research agent that could do deep research particularly in your field. Uh so going to do a bunch of imports and now I have something to explain. So the OpenAI agents SDK comes out of the box with three special tools That are called hosted tools which means they are provided by OpenAI that you can you can just use them out of the box for a price. So the first of them is called the web search tool and that is a tool which allows you to ask OpenAI to run a web search on your
behalf. The second of them is called a file search tool and that allows you to upload a bunch of files to open AAI which it will store in a vector data store and it can look that up effectively running rag for you on OpenAI's servers. And the third one is called computer tool which allows uh OpenAI to automate a running a computer screen taking screenshots clicking in places. Now for now we're just going to use the first one the web search tool. So keep that in mind. So, we're going to be making four different agents,
but following exactly the same process as our as our jokester a minute ago. Four agents. And I I need you to pay attention to these four because when we Go through them, you'll need to remember how they fit together. So, the first agent, the simplest one, the search agent. This is an agent which can take a query and it will then look that it will use the the OpenAI search tool to search for that uh on on the internet. So, it's a a search agent is able to run a search query on the internet. The
second agent, the planner agent, that's able to take a question from a user, a general question. We're going to ask it to Recommend Agentic AI frameworks. Take a question like that, and it's going to say, okay, can I come up with a bunch of things that I could search for on the internet that would help me answer that question? So, it's taking a question, and it's turning it into a bunch of search items. That's our planner agent. The third one, the report agent, is going to take tons of information that's been taken from lots of
searches and turn that into a summary report on the Results to answer the original question. And then the fourth one is a fun extra one. The push agent is one that will be able to send a push notification to a mobile phone because that's just fun. So, we're going to do that, too. All right. So, with that, our first agent then, our search agent. So, here it is. So, it's exactly the same as before, but now there's some more detail. We've got a a beefier system prompt. You are a research assistant. Given a search term,
You search the web for it. And then some more here. So, then we create the search agent. We instantiate an instance of agent. We give it a name. We pass in these instructions. And we have a new thing here. We're passing in some tools. And we are specifying a special tool. This is OpenAI's tool, the web search. We're giving it a model as before. And this little setting here is telling it, we need you to use this tool. It's not optional. You've got to call it at least Once. So, we run that. Let's give this
a try. Let's let's just call runner.r run again. We're going to say what are the most popular and successful AI agent frameworks in May 2025 and check that it can search for that on the internet and come back with a result. So, this is now going off. It's effectively running a Google search on OpenAI's boxes. And this is the response it comes back with. It's saying that Langchain, Langraph, and Crew AI are the three that came back From that search. And we could go in and look at that trace. We'll do it very quickly. You'll
see that it's done exactly what we expect. Here it is. It's in the OpenAI dashboard. You can see if you click on it that it used the tool uh the web search tool right here. So, it's time for the second agent. This is this is where it gets real. This is the one that that takes us from a normal LLM world to an agent world. We want it we're going to tell it that we want it To do five different searches. The instructions, you're a helpful research assistant. Given a query from the user, come up
with a set of web searches to answer the query. And we tell it how many. Now, what we're going to do now is use a feature called structured outputs. Let me just check how many people here are familiar with Pyantic. Great. Great. Most most people if you're not familiar with Pyantic, there is a guide in the guides folder that will Tell you about Pyantic. But Pyantic is basically a way that you can write Python objects which describe a schema of of of how you want information to be represented. And we use Pyantic to create some
Python classes here. So web search item is a subclass of base model and it's got two fields and these are fields that we're going to want OpenAI to respond with. One of them is query. This is an actual search query that we want it to tell us about. But before That we've got another thing reason your reasoning for why this search is important to the query. And here's the thing we're not actually going to use this. This isn't actually anything that we need to know. The reason we put this in here is that we want
to force the model to start to tell us its thinking. And we want it to do that before it gives us the query. And that is like a a trick, a simple trick to turn a normal chat model into operating in a reasoning Mode where it has to think through what it's doing before it tells you the outcome. So, it's this this cool trick that that when you're writing structured outputs, when you're when you're describing how you want it to respond, you should always force it to tell tell you its reasoning before it gives you
the answer, and you'll just get better outcomes. It's it's still it's just about generating the the most likely next tokens, but if it's told you its Reasoning, it biases the model to be more likely to output tokens consistent with its reasoning, and so you get better outcomes. It's it's it's crazy. So that is the web search item with its reason. And then we have something called web search plan which is simply a list of web search items. And here we have our planner agent. So we give it its instructions. And here we have one new
thing for you to look at here. Output type. We're telling our agent we Need you to output using a particular type of object using this object here. This pyantic object that describes your output. So we run that. And now we're going to try this out. We're going to say, "What are the most popular and successful AI agent frameworks?" And what we're hoping for here is that it's going to come back with some things that it would like to search for. Here they are. The web search plan. It would like to search for things, but before
it Tells us that, it's telling us its reason. And here you can see its reason. And then here is the query that it would search for. And because it's told us its reason, the search is going to be that little bit better. All right, everyone following me so far? some nods. Great, great, great. And and a thumbs up. Oh, yes. Yeah. The question was, what do the other search terms look like? What are the other reasons? Let's keep going. Uh so, um let's do some Code. Look for results in final output dot. Hang on. We're
just going to make sure we get this right. Searches So for search, we're going to print out the rationale. Let's get this right. We print the reason and the query. We're hoping for five. We're hoping some good. Here we go. So we did indeed get five. uh to find the most popular AI Agent frameworks as of May 2025 was the reason and there is simply the search and you can see all of its uh reasons and searches there and sure enough that does appear to be five. It has obeyed its instructions. Great question. Thank you.
All right, so now on to the third agent which is the writer agent. This is a pretty simple one. You're a senior researcher tasked with writing a cohesive report and we ask for a summary and again we use structured outputs. We Create a subclass of base model. We ask for a short summary and then a full markdown report and follow-up questions. And then we simply have an agent which takes those instructions. This time we'll use GPT4 mini and we'll output into that type. And then our final agent is going to be a tool that we
will build right now. It's called the push uh agent and it's going to make a little push notification to my phone. So, I'm going to turn my phone off mute and hope no One calls me. Uh there we go. Volume up. Okay. So, uh we are going to I'm first going to show you this little nice function here called push. And push is a function which uses a a platform called pushover which is very simple and free and easy to set up which can send push notifications to your phone. And I'm going to say push
hello ad. Here we go. It came through but it didn't make a noise. It says hello Ed. Uh let me try and see if I can if I do It a second time if it will make a noise so you can believe me. There we go. Very nice. So, we will have that noise coming a few times. It's a uh just check one more time. So satisfying. Okay. Lovely. All right. Now, this is a function called push. It could be any function. It could be a function that writes to your database. It could be a
function that writes to a flat file. And we are simply going to Add a decorator at function tool like that. And now I'm going to run this again. And now push is no longer a function, but because of that decorator, push has been turned into something called a function tool. And if you look at this, you'll see that in here is a ton of JSON. And for people that have worked with tools before and you've had to write all of this JSON and it's a real pain. Uh, this is so simple. I love the fact
that you Can just turn any function into a tool this way through the OpenAI agents SDK, but it still shows you what it's doing. It's not like this is hidden from you. You can always get access to the JSON itself as well. But that is how you make a tool with open air agents SDK. And this is where you pass it in. We just say tools tool like is push like that. And now we are equipping this agent with the ability to send a push notification to my phone and make that cute noise. All right,
we've basically done the work. Now we just have to thread this together and figure out which order to call the different agents. So these three functions here simply they're calling runner.run. We're going to have something that is able to plan out the searches given a user's query. It will run our planner agent. This one is then these these two here are going to end up calling the search agent for each of the search queries. And I'll let you look at This, but it is as I say just boilerplate code that calls runner.run run. And I've
also got one that calls the writer agent to write the report. And here is the one that calls the push notification. And it's going to make my phone make a noise. All right. And that is it. It's time for showtime. We're going to ask this question. What are the most popular and successful AI agent frameworks in May 2025? And we're going to put this in a trace. And we're going To run that code that's now going to call each of those agents in turn. And it's telling us what's going on. Now it started the research.
It's planning the searches. It's going to perform five searches. And now it's connected to OpenAI. It's done the searching. And it's now thinking about writing a report. A report to describe the most important or the the uh the latest most popular and successful AI agent frameworks in May 2025 and writing The report because it's got to output all of those tokens. This is the longest part of the process and the steps we're seeing here is basically what OpenAI's deep research is doing when it's running and when it's if you've used it before when it's building
that you hear that and here is the results popular AI agent frameworks in May 2025 there's an outline there's an introduction an overview langchain langraph crew open It's it's a it's noted itself as a popular agents framework we Hope so. Uh, and then some others. It's mentioned small agents. I know a lot of people love small agents from Hugging Face. We won't have time to cover it today, but it's a great one. And, uh, and Autogen that John mentioned earlier as well. There's a section on market trends. Look how substantive this is. Challenges and considerations,
future directions, and conclusion. And at the bottom, it has Cited its references. And let's have a quick look at the trace to see this. And what you'll see now, of course, is that the trace is more interesting than that the the joke trace I showed you at the beginning. This is called a research trace. You'll see the planner agent was called first and then the search agent was called. And you'll see that it ran five searches at the same time in parallel. Then the writer agent was the one that took all of the time. You
see Here the timeline. This this blue line here is showing you how much time it was taking. And then finally the push agent. It shows it using the push tool right here and the final response. So, I can't stress enough, it's so important when you run these that you then come back to the traces and you look through it. Uh because the the the uh you can just sort of trust that your agent system is working as expected. Um but you should always come in and look at it and see For sure that it's it's
doing what you thought. So an excellent question which is okay so we've seen how you can run this collect traces and then go into open AI's tools to look at those traces but what if you wanted to log that yourself in your own database or maybe show it to the user in some situations or have it in files. Can you do that? And the answer is yes. Uh OpenAI agents SDK has made that super simple. There's an easy way to to to um catch uh the Tracing it's doing and write your own tracer. And in
fact in module 3 we will do exactly that. we're going to build something. When we have our our traders, our live traders trading there, we're we're going to want to see what they're doing. It's going to be super boring if they're just sitting there thinking, we're going to want to see that on the screen. So, we're going to write a custom tracer that's going to write to a database and then show us on the screen What each trader is doing. So, great question and uh and that's the answer. And you'll see if you look in
the third module uh code, you'll see a a Python module traces.py Pi uh that shows that happening. All right, we're almost at the end of the first module. I've got one more thing to show you and then it's uh back to John. I want to show you that it's also super easy to put a little user interface around this. Uh now I happen to be a huge fan of Gradio. Uh Who here knows Graddio? Hands for people that know Gradio. Yeah, Gradio is wonderful. It's good to see. Gradio is now owned by Hugging Face. It's
a lovely platform that means that that even horrible front-end developers like me, you never want me near a front end except if I have gradio, I can do it easily. And so with just a few lines of gradio that you can look at yourself, we can put a little user interface around this. And so I can just say uh I can Just bring that up and we can have a look at a user interface around the deep research tool that we just built. Here it is. Uh what topic would you like to research? It's going
to ask us. We're I'm going to say um what should I attend at ODSC East on Thursday, May the 15th, 2025. Something that might be useful for everyone here. Let's see. I press run. The first thing it does up here is it Prompts me to show me the trace if I want to see that. So, I can bring up and watch the trace while it's running. Uh and uh it's now saying search is planned started to search. I'll make this a bit bigger. And you'll find this in a folder called deep research in the first
folder. And deep research.py is the gradio code to bring this to life. And you'll see it. It's uh insane how easy it is to take an agent workflow and just throw up a user Interface to bring it to life. It's now writing the report, which is always the the the longest part of the process, but Oh, done. There's the the ping. And here we go. ODSC East 2025. What to attend on Thursday, May the 15th. Introduction, conference overview, keynote sessions here. Uh looks like blueprint for impactful agentic AI in the enterprise. That sounds like a
good one. Uh and uh workshop offerings, networking Opportunities, and recommended strategies for maximizing attendance. And a conclusion. How about that? So if you have a moment in the lunch break perhaps to bring up the code and run this yourself, you can plan out your day tomorrow with the help of your very own deep research agent. All right, and that wraps up the first module and I will pass you back over to John. [Applause] [Music] Nicely done, Ed. I did notice that uh our session was mentioned there in that in that research. Did you see that?
Yeah. Uh so it didn't Yeah. You know there is not a 100% accuracy as we were talking about at the beginning of the session. It is not Thursday uh today but um don't mind it uh giving us that support. Nice. All right. So thank you. I hope you enjoyed that first hands-on demo. Isn't Ed great? I just like Yeah. Yeah. Another round of applause. That's organic. Fantastic. All right. So, brilliant. So, we've now finished module one where I stood up here and defined agents a bit for you and hopefully made a case for why it's
never been a better time to be thinking about building agent AI agent systems personally in an enterprise whatever. And Ed just completed our first coding session where We recreated OpenAI's deep research functionality using the OpenAI agents SDK. Now in module two, I'm going to start things off with a relatively uh short theory session on designing agents. So best practices for developing these the kinds of patterns that we have and then Ed will come up and I think it's our most substantial uh module 2 is our most substantial coding session. So that might end up being
um you know we might Do that in two parts uh depending on how long it takes one before lunch and one after lunch. Um, we also actually, you know, now would be the perfect time for us to maybe do some audience questions. We've we've finished module one. It seems like we're making good time. Um, so we do we have a um we have a microphone here. Um, we can uh pass it to you if there's we can do a few questions in a row and people could actually hear your questions. Um, is There any any
pressing questions? Hi, thanks for the session, Ed. Uh I know we quickly talked about structured outputs. Um so I haven't um hands-on used pyantic objects before but I did use zstring I believe uh earlier. Do you is pyantic the recommended approach for structured outputs or are there others that you you would suggest tinkering with for the openi agents SDK in particular? It does expect a pyantic object so just a subclass of base model but it's super Lightweight. It's like a data object. It's a way to to define your field. I think Lang graph is more
permissive. You can use a lot of different types for for langraph but with open agents SDK they they do expect a subcluster based model to go in there. Thank you. Great session. Uh great session. Super. Uh one question before OpenAI agents SDKs came uh there was one pertaining question with regard to debugging. How to debug multi- aent systems? that's Been the pain point for for the developers. Not sure how the traces will help us or maybe if there is anything that you can talk about on that. Yeah, I so the the short answer is uh
that that it's it's hard that uh generally in working with LLMs uh I often find that that that people come at LLM engineering with a software engineering hat on looking looking to build a lot of code and build lots of systems. A lot of questions I get is about how do I attach To this database or or uh or how do I use rag in this way and people forget that fundamentally it's a data science discipline and we need to act like scientists first and foremost and working with agentic systems it's even more so the
amount of of research and development and experimentation you need to do to convince yourself that it is following a particular formula uh and to to start small and simple and with one agent and make absolutely sure that You've you've worked with that agent in detail. You understand when it performs, when it doesn't perform, before you start adding more agents before you start growing and then for sure working with traces, working with guard rails with open AI agents SDK and uh just measuring and monitoring at every step of the way. Uh but generally working with LLMs
requires a lot more of that science hat than than a lot of of development. But working with agentic Systems is even more so. So so there's no shortcut to experimentation and starting small is also such a big a big trick. A lot of people send me solutions which involve many agents collaborating and it's not doing what they want and they say why isn't this working? And the answer is rewind. Start start small. Build something very small. Make sure that works and gradually add to it. Thanks Ned. Great session again. I have a question in terms
of uh one pricing. How do you keep track of um the costs that each of these agents uh cost you over a period of time? Um and then the other one is how does the interaction um with these structured data that you're going to probably discuss? What about visualization? Are there any integrations possible um so business users can understand that data without having to actually look at it? When it comes to visualization, the I do believe that Google's agent Development kit, which is which is very new, comes with a lot of really exciting visualization tools,
which is one way that that that it's really standing out. With OpenAI agents SDK, they they probably put the onus on you to be building that kind of scaffolding and to be writing the uh the code that is that is um presenting that in a format that's going to be useful to the to the users. So again, in the in the third module, when we build our trading platform, we Do want to be showing our users what the agents are thinking about. And the way that we do that is we intercept OpenAI's tracers logic and
we build our own and then write that to the database and visualize it ourselves using colors and things like that. Uh so it's it's engineering work that we've written to to add to to to the environment. Wait, and what was the cost question? Oh, the cost question, of course. Uh yes. So uh that's another super important question And uh it is a big risk. I think John mentioned the risk with aentic systems is that the costs can be unpredictable and particularly if they get into a loop they can just keep chugging away and your costs
can keep building. Uh so uh there there's there's a few controls there's there's some controls that you put on in the agent framework. So, OpenAI has a has a a a maximum number of turns that any one agent's allowed to run for that defaults to 10 and after That it will fail. And you can override that and make that a bigger number. In the trading example, I make it 30. Uh but you have to do that cautiously and always being aware of of of your limits and what that might cost you. But by far the
most important is putting the constraints on the uh on the on the API itself. In OpenAI, you can of course set up limits. You can set up limits for your whole organization or for each individual project and you can manage And track those limits and make absolutely sure that it will stop uh when it when it uh uh uses up those limits. So it's super important to monitor your spend and watch your limits. Now, the OpenAI limits monitoring system isn't foolproof that if you have like a tight loop that calls it a lot, you can
spend more than the amount of credits you put on OpenAI and you can go into like a negative balance and if you then top up with more money, It will it will use that negative balance which which is something to watch out for. I've seen people be burnt by that before. Uh so always monitoring, watching and uh making sure you've got controls at each step is very important. I think I might add on to that that you can also as Ed's using today in in the demo that we saw already, you were using a very
inexpensive model 4.1 mini. And so that also it reduces the stakes. It's like it's like playing lowblind poker Where you're you know how much of a bill you can run up is relatively small. Um, same thing with using like a DeepSk. Um, whereas if you're using 01 or 03 mini high or something like that, you could all of a sudden very quickly be using dollars per second. Um, so yeah. So yeah, so in module two here, I'll be going over uh key design patterns for designing agentic systems, um, workflows as well as agents proper. And
yeah, at the end of the module, you'll code up an Engineering team with crew AI. And yeah, that'll be a nice um foundation for a module three when that engineering team will uh develop the software to have autonomous traders using anthropics MCP. Right? So module two on designing agents. Recall from earlier this slide is a recap. So you have already seen this slide. We talked about how anthropic distinguishes between two types of agentic systems. One of them is workflows which are relatively simple. These are systems again where LLMs and tools are orchestrated through predefined code
paths. Predefined obviously being the code the the key word. And then agents are systems where LLM's dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks. So more flexibility there, potentially more power, but also more risks, more to be aware of. So let's go through the workflows first. The first workflow that I'm going to explain is called prompt chaining. And all of this, by the way, comes from a uh a December blog post, December 2024 blog post from Anthropic called Building Effective Agents. So you can Google that easily and get
more information on any of the things that we're describing here. Building effective agents is the blog post from Anthropic. And so the first workflow design pattern that they describe in that blog post is called prompt Chaining. This is the simplest of all of them really. Uh and in this one we decompose into fixed subtasks. So you think about your overall problem that you're tackling. You break it into subtasks and have LLM handle those subtasks individually. So here's a diagram of this workflow. You've got the in on the far left and you'll see the in on
all of the workflows that I cover, all five of them. This is the beginning of the work Of the workflow. So you can imagine this is some kind of user input. Um and then out happens at the end. That's the end of the workflow. And so that could end with an agent taking some action or outputting some tokens uh for example to the user. The yellow boxes um these represent LLMs. So throughout all of these diagrams that I'm going to be describing, yellow is always an LLM and blue is code that you've written. So um
critically here for this one for the Prompt chaining workflow the blue box the code box that isn't essential that isn't an essential component of prompt chaining you could actually have just lms tackling subtasks but to show you to give you an illustrative example that does include code you can optionally include it so an example flow here with prompt chaining your prompt your user input in goes to the first LLM and then the LLM1 the leftmost yellow box it passes it its output to Code that you've written and this code that you've written acts as some
kind of gate um it uh processes it or passes along uh depending on what you needed that code to do to LLM2 and then LLM2 does its specialized processing it passes along some output as an input to LLM3 and then finally LLM3 its output flows out of the system for example to a user or to a database or maybe taking some kind of action. The number of LLM calls is arbitrary, but The key idea is that you are breaking down the overall task into subtasks and you are deciding uh exactly what that LLM does, what
its prompt is or sorry, what its uh instructions are and so how it therefore deals with a prompt. Why would we use this kind of workflow? Well, we have more control over each of the subtasks. So this could be a good place to start on some problem that you're tackling. Um, and the reason why you break it into Subtasks is because you can get more effective responses each step of the way. You can check your traces to ensure that each of those subm models is doing what it's supposed to be doing as opposed to relying
on an LLM say to handle all subtasks on its own. That is an option, but you have less control. And by doing it this way, by breaking it into subtasks each step of the way, we can have guardrails to ensure that things are going as we anticipated, Making it easier to debug. Following along with the question that we had um in between modules. So some real world examples. Um a content creation pipeline is is something you could use here. So actually both of these real world examples I'm going to give you happen to follow exactly
this schematic. But again remember the number of subtasks that LLMs are handling is arbitrary. Here we happen to have three. We happen to have uh code in here. You don't need to have That. But both examples I'm going to give just to make it easier to follow along will exactly follow this diagram. So the first one is a content creation pipeline. So you can imagine a marketing agency could use LLM1 to generate article outlines and then you could have a gate to check if the outline meets brand guidelines. So you could have some very some
deterministic brand guidelines that you need to have followed. You don't need an LLM for that. And then LLM2 if uh if the content passes the gate, LLM2 will write a full draft based on the outline that the gate approved. And then LLM3 can take that draft and polish it um and maybe add in SEO optimizations or something like that and then output the finished article with those SEO optimizations. As a second example, maybe uh close to the heart of those here, lots of hands-on practitioners is a code generation workflow. So here uh LLM1 Uh so
the user provides some code that you would like to have generated by this workflow. LLM one generates pseudo code, so converts requirements into pseudo code. And then the gate checks for architectural soundness. LLM 2 transforms the pseudo code into actual code. And finally, LM3 adds documentation and test cases to the code and provides all of that as an output for you. Cool. That's workflow number one, prompt chaining. Workflow number Two is routing. So here with routing we direct an input into um one specialized subtask ensuring separation of concerns. So it'll be easiest for me to
explain what that means by showing you a diagram. So here your input goes to an LLM which um triages which routes the input to one appropriate LLM uh for the subtask. So in this case here again the number of subtask specialized LLMs you have is arbitrary here we happen to have three and so LLM 1, LLM 2, LLM 3 Each of these is specialized in a different subtask and the LLM router is aware of what the specializations of each of the three are and routes the input to the most appropriate of the three. just one
of those three um executes its specialized subtask and you get one output out of this um pattern. Uh yeah, this is a powerful and common approach and as some real world examples u medical diagnosis support. So earlier I was talking about How uh agentic systems could revolutionize uh medical diagnostics. So here's a concrete example. So uh patients symptoms are passed into the LLM router and the LLM router is aware that LLM1 is a cardiology specialist. LLM2 is specialized in neurological cases and LLM3 is specialized in dermatological cases. And so the LLM router gets um some
query about acne. And so it sends it to LLM3 because that's the dermatological specialist. And then you just get one output um for that particular request. LLM one and LLM2 sat dormant. They didn't have anything to do. As a second example, uh legal document processing, the router um could c could categorize legal documents. So, uh it would send contracts to LLM 1. It would send compliance documentation to LLM 2 and it would send litigation materials to LLM 3. So each of the three LLMs is specialized in different tasks, Contracts, compliance, and litigation. And so one
of those would be selected and uh then would prepare a response, would prepare maybe documents and provide those as an output back to you. Cool. Hopefully straightforward so far. That's routing. Our third superficially looks a lot like the routing as you'll see. It's called parallelization. The key here is that unlike routing, we will actually use multiple LLMs in Parallel as opposed to just choosing one. So we break down our task and run multiple subtasks concurrently. Here's what that looks like. So now here um so we have multiple LLMs running in parallel. But there is one
other key change here. You may remember from the preceding slide that uh we had a an LLM acting as the router previously. We'll get back to a workflow that has that again momentarily. But in the meantime to give us a bit more control, you can Have a coordinator that's actually code and so it isn't an LLM here. So we have code as the coordinator and then we also have code as an aggregator. And so the aggregator um there are a number of different things it could be doing here. It could be just taking different pieces.
So, LLM 1, LM2, LLM 3 could generate different parts of a report. And it could be up to the aggregator to just simply concatenate those components and provide them as an output. But you Could also end up in situations where maybe all three LLMs are actually doing the same subtask. So, for whatever reason, uh, like we actually saw with the deep research example, uh, Ed had in the demo, five times you did five different searches, but you had the same task being asked for. And so, you could have the same kind of thing here where
all three LLMs instead of being specialists in different tasks, they're actually tackling the same task, they Have the same specialization. And the aggregator then it might take the mean in some way. Maybe you have it doing math and you have it take the mean of the responses, the mean of a forecast. Or maybe you could have a whole bunch of LLMs. You could have uh 10 LLMs going and you're expecting, you know, a whole number as the result. You could actually take the mode. Um or, you know, maybe for some reason it's picking colors out
of some different color options. And Again, you could have the aggregator select the mode uh the most common response from those different LLMs. To make this concrete, you could think about uh one example would be a scientific literature review where research papers are analyzed by multiple LLMs uh that focus on different aspects. So the first one does uh uh an evaluation of the methodology of the research paper. The second one is specialized in doing a statistical in in Evaluating the statistical analysis. And then the third one is specialized in interpreting the results. And then the
aggregator can simply synthesize each of those specialized analyses into a comprehensive review. As a second and final example, you could have a financial investment analysis where market data are provided um to the coordinator which sends that market data to all three of the LLMs. The LLM, one is specialized in tech analysis, the Second one in healthcare, the third one in energy, and each one of those LLMs provides different sector specific insights on the raw market data, and then the aggregator combines those together to create a diversified investment recommendation that balances risks and opportunities across all
three sectors. There you go. That's number three. The fourth workflow design pattern is the orchestrator worker which again is going To look superficially similar. Here complex tasks are broken down dynamically and combined. So this is a copy paste of the preceding diagram except that now the orchestrator and the synthesizer they are LLMs. So we're have we have more flexibility, more dynamism. We start to feel less like maybe these these these start to feel less concretely like a strict workflow and more and more like a proper agentic System where agents are involved in more and more
of the decision-m they have more and more autonomy around how this task should be accomplished. So yes, uh this is our most dynamic workflow so far has fewer constraints and so it would have the same kinds of the the examples that I gave you for the preceding one. Uh you could use those again. So I had the examples of the scientific literature review or the financial investment analysis. I'm not going to Repeat those over, but you know again you could have three LLMs that are specialized in three different areas. But instead of having your code
decide what goes to those LLMs, you have an LLM orchestrator do it. And then similarly, and you could imagine how much more useful this would be, especially for the synthesis, to have an LLM doing that instead of code that you've written because even in those examples that I gave, the scientific literature review, The financial investment analysis, you don't just want to be concatenating results. it would be nice to have it, you know, to have an intro added and an outro, uh maybe redundancies removed and all of that could be handled very nicely by a large
language model. Finally, our fifth workflow design pattern is the evaluator optimizer. So with this one, uh which is sometimes also just called an evaluator or a validator. Um in this one, the LLM output is validated by Another LLM. This is a super useful workflow. You should be thinking about using this all the time. I mentioned this even earlier in module one where if you want to be reducing error rates, you want to be improving the utility of some agentic application in an enterprise, you can have a second LLM validating outputs. um which means that you're
going to dramatically reduce your hallucination rates because if if let's say you have uh you're already working With a reliable LLM um that has a you know it it has a 99% accuracy rate and then you work with a validator that also has a 99% accuracy rate overall for that whole pipeline for that whole workflow you can multiply those probabilities by each other 0.99* 0.99 to give you you know a very small probability ility of an error making it through the whole workflow. And uh yeah, so you could use this uh in a deep research
setting like we just had in Module one where you validate outputs and maximize the chances of an accurate response resulting in things like that huge jump in accuracy that we saw in humanity's last exam. So here's what the workflow looks like. Um it's a very simple diagram uh where you have your input going to some LLM generator. that solution is provided to an evaluator and then it rejects with feedback back to the generator if needed or if not needed then the solution is accepted and goes As an output. So here now we have a loop
for the first time in these workflows and again this is a step in the direction of being a fully agentic system because this could potentially go on for quite some time and of course it doesn't need to be this simple of a diagram. You could have lots of LLMs code components. It could be combining lots of workflow design uh patterns together. But fundamentally the idea of this evaluator workflow is that you have An evaluator u checking the correctness of work. Um as an example this is probably an obvious one code generation and review. So you
could have a generator that's gen that's writing code while an evaluator checks for security vulner vulnerabilities performance issues adherence to coding stand standards that your organization has. Um and uh the generator could then refine based on feedback from the evaluator. So you could do some back and forth kind of Like pair programming with a more senior uh software developer uh advising you um until the evaluator is satisfied and the output is accepted. Um and it's a final example and actually the final example that I'm going to be giving uh entirely before we get to Ed
uh go back to Ed and have some concrete uh hands-on work. Um you can imagine educational content development where a generator is creating lesson materials and the evaluator is assessing the effectiveness Of these materials, age appropriateness, alignment with learning objectives and iterating with the generator until it's satisfied and you get some output from the system. Okay, so those are our five workflow design patterns. Remember that these workflow designs we have decided some overall flow and that is what distinguishes them from proper agents. So agents are open-ended. Um you you don't you can't necessarily create this
specific flow of how information will go From input to output. You can have your agents deciding that autonomously dynamically. There will be feedback loops. You saw in in the preceding workflow that was the only workflow that had a feedback loop. But agentic proper workflows could or sorry agentic proper systems uh could have uh multiple feedback loops with lots of uh with lots of processing happening multiple times in different places throughout um throughout the the Process throughout the flow. And um this means that you can also be running up your costs much more easily. Um and
so you know there's more things that you need to be careful about and as I already have stated the most critical aspect of proper agents is that there is no fixed path. So this means that this is potentially much more powerful but you also have risks around robustness, safety and cost and it's more challenging to ensure that this will run In production effectively like you want it to and that it will meet the time and the cost constraints that you have for your particular application. Thankfully, the kinds of agentic frameworks that we will continue that
Ed will continue to demonstrate today allow you to reduce your risks and increase and optimize your chance of a successful agentic application. So, here is the diagram. It actually looks a lot like the final uh workflow that I showed you, the Evaluator one. Um, but in this case, you just have complete flexibility. So it's it's actually kind of a boring uninteresting diagram because anything can happen in this. Um it's a diagram that basically shows anything can be happening here. So you have your input from a human. Um instead of having an output, we have an
environment here. And so uh this is some outside world that can be interacted with. This could be databases. This could be the internet That you're searching over. This could be hardware. um an internet of things device that is being interacted with and so infinite flexibility here in terms of what the environment is but the idea is that an LLM or many LLMs take actions in the uh within the environment the environment provides feedback which could be data could be images uh could be video and uh based on that environment feedback the LLM continues to take
actions So this can go on for Some time or um it reaches some end state. It's it decides that it has met the criteria that it that that were asked of it and it stops and provides uh that output back to some code. Um it could be a grad app uh that then renders um the outputs nicely for you. You know takes the markdown and converts it into something that looks pretty for human users. Yeah. So there's no more specific design patterns to cover. It's kind of funny because we, You know, you're here for to
learn about agents and I've now talked about these different kinds of agentic systems where you could have workflows or a proper fully autonomous agent. And for the workflows, I had five examples to go over. But for the agents, which is kind of the most exciting, there's nothing else to show. And that's because that's a reflection of the dynamism that these systems have. you can't capture that in, you know, in specific kinds of uh in in More specific kinds of diagrams. So the main idea here, the final one I'll leave you with is that with well-designed
agents despite the flexibility, we can minimize risks, use agents safely, keep costs and execution times under control, and all the while make use of agents tremendous flexibility and dynamism. So to bring that to life, here again is Ed. [Applause] Such a treat hearing John's explanations. I love this. It makes me Feel super excited to be showing you this, too. Wonderful. Uh so now it's time for me to tell you about Crew, my second favorite uh agent framework. And I do I do love Crew and it has a lot in common with OpenAI agents SDK. And
it has some ways in which it's quite different as you will see. But first of all, something it has in common is it it also uses the word agent as its kind of basic building block. An agent is an autonomous unit with an LLM and with Access to tools. But here's the first difference. If you remember with OpenAI agents SDK, when you defined an agent, you just gave it instructions like a system prompt. With crew, you give it a role, a goal, a backstory, and potentially memory. Now the role, the goal, the backstory that all
ends up just being part of a system prompt, but crew force tries to encourage you to think about it this way and to break down the way you describe your agent With more steps. So then a task, this is a new kind of thing that we don't have an analog in in OpenAI agents SDK. A task is an assignment. It is something that you give to an agent. Every task has a a onetoone uh assignment to an agent and it has a description and potentially an expected output. And a crew in crew AI is just
what they call a bunch of agents and tasks. That is what a crew is as as Jon had explained at the beginning. And crews Can be set up to work in two different modes, a sequential mode and a hierarchical mode. And a sequential mode feels rather like the the workflows that John was describing. It just runs tasks in the order that you define them assigned to their agents. The hierarchical mode is when it runs a separate manager LLM and that manager LLM gets to choose which order the tasks are executed in. So there's more autonomy
in that mode. And generally Speaking, I would I would think of crew as being something which is similar but somewhat more opinionated than OpenAI agents SDK. It has some more terminology and it's a bit more prescriptive. And and I would say that that's neither it's it's both it's got some pros and it's got some cons. And I hope that you're going to really get a sense of that. I'm going to make that tangible for you so you see that. And the great thing about it is that it's got what people like to Call batteries included.
you get a lot for investing a bit more time into it. And you'll see some good examples of that in just a second. But first of all, I want to explain the five steps. If you remember with with OpenAI, I talked about three steps. There's going to be five steps. And we're going to do these five steps together for a couple of projects culminating in our very own engineering team. Hoping to get that done by 12:30 so we can leave it running During lunch, which will be really cool. Uh so what are the five steps?
Uh first step is you create a project with crew. you you create your new project by running this command crewi create crew and then the name of your project and that sets up like a whole directory structure with a lot of scaffolding. It puts in lots of template classes and configuration files that you can then go in and edit. So you should expect quite a lot to be generated when you do that. The second step is you go into this this scaffolding, all of this code that's been generated, and you find a couple of configuration
files that use the YAML format, a nice human readable format, and there's one called agents.yaml and one called tasks.yaml that describes your agents and your tasks, and you fill them in to to uh uh arm crew with that information. There are other ways to do it, but this is the way that's that's recommended by crew and is the the Easiest. The third thing you need to do is go to a module that's called crew.par. PI and this is where you write some code that stitches together your crew built out of agents and tasks and you
refer to the config the YAML that you just wrote. And then the fourth step is there's another module called main.py. And this is where you do the final touches to get your crew ready including specifying the inputs the the original requests that You've got for your crew. You write it in main.py pi and then you just run crew aai run and the whole thing kicks off and you and you watch it with amazement. Uh so that's the process. Those are the five steps. Now we are going to be doing something quite interesting. We are going
to be giving the ability for some of our agents to write Python code. We're going to make them into coders. And this is something that is challenging. It's complex because we're Going to want to build an agent that can write code and can execute that code in a Docker container so that it's executed safely and then come back and look at the output and be be interacting with the Docker container. So, we're going to have quite a lot of work to do to make this work in in like a reliable way. Um, so this is
going to be quite hard. Except it's not. This is going to be really easy. This when I say that crew comes with Batteries included. This is exactly what I mean. In crew, you just write these these these two uh parameters. You pass them and you create an agent. Allow code execution equals true and code execution mode is safe. And it will handle everything else for us. It's going to set up the as long as you're running Docker, it will create a Docker container. It will make sure that the code happens in there. It will provide
the outputs back to your agent. And so It's it's amazing that you get all of that out of the box. Now on the flip side, it's sort of harder to debug when it doesn't do what you're expecting because you have a bit less insight. It's it's it's not as as simplistic as OpenAI agents SDK when you have to do every step. So that's really the deal that you're making with Crew. You get a lot of this for free, but you are signing up for this ecosystem, and it is a bit harder to debug in my
experience When it doesn't come up with with with what you're expecting. Now I did make one sort of subtle point here. People tend to call these things coder agents. And when I I I used to think that a coder agent meant like an agent that outputs code as you think like a coder agent is like a coder agent. Uh it's it means more than that. A coder agent is the expression for any agent that's able to write Python code and use that to to meet its goal. And its goal might have Nothing to do with
writing code. So, for example, you might have an agent where you can it's a customer support agent and you can say to it, uh, I'm I'd like to know the the 50th decimal place of pi for that. That's the kind of customer support agent it is for some reason. Uh, and the way that it answers that question is it has the ability to write a little bit of Python code, run that, find out the answer zero, and return that to you. And so, it's not returning Python code to you. It's returning the final answer. It's
just able to use Python to achieve it. And that's what's called a coder agent, which is just just maybe worth understanding the distinction. But having said that, what we're going to do right now is we are going to make agents that write code, that kind of coder agent. And we're going to make a whole team of them. We're going to use crew to build an engineering team with an engineering Lead that can sort of decide the design, a back-end engineer, a front-end engineer, and a test engineer that can validate it. And the team is going
to work together autonomously. and we're going to leave it doing its thing over lunch, which is which is awesome. So, that that's the plan for right now. And and as I say, it's a little bit risky. This this works in 80% of my practices. So, uh keep your fingers and toes crossed for me, please. Here's our files Again. Close the first folder. some of these tabs that are open and we're going into the second folder engineering team. Now again I do advise uh I suggest don't try and do this with me. Take take sort of
uh insight for what I'm doing and as I talk you through it but then please try it later. I would love that and see whether you get the same results I get or if you're part of the unfortunate 20%. And hopefully not. All right. So the first thing I'm going To do is bring up a terminal in in cursor or in VS Code is the same with control and back tick. And then I'm going to go into the second folder and then I am going to type crew AI. This was step one of the five
steps. Crew AI create crew and we're going to call our project team. It's going to be an engineering team. And we run this. And crew is now going to ask us what provider are we going to be working with. And I will select OpenAI, but you Can choose whichever one you wish. And I will go with a cheap model, uh, GPT41 mini, as John mentioned, to keep costs down. Uh, there's also, uh, a nano variant of this of this model, GPT41 nano, which is super cheap. Uh, but unfortunately uh, it doesn't it's not it doesn't
have the coherence to be able to deal with a challenge. we'll do today. So, mini is about as as cheap as you can go for this challenge. Um, okay. So, it has now uh I'm going to exit this Terminal so we can see what's going on. Um, it has now set up a new folder team that's appeared here in the in the file explorer. And if I open this up, it's created a whole bunch of new folders and files for us. And I'm going into the source folder and I'm going into config. And here you'll
see two YAML files as I promised you. There's one called agents.yaml and there's one called tasks.yaml. And I'm selecting agents.yaml. And up Comes the the template. It writes something for you. So you can see like a default. And I hate those defaults. So I'm deleting it right away. Starting fresh. Uh so what we're now going to do is type in here our our YAML for our for our first agent. We're going to start with just one developer before we have a whole team of them. Let's start by making one coder. Uh, and I'm now going
to type all of this in. Um, except Oh, hang on. Not that here. Yeah, I've set up a shortcut key so I could just paste it all in. This is this is also how I respond to LinkedIn. Uh, so here is a a a YAML example. Let me hide the left. So this is how you define in YAML an agent in crew. You give it a name backend engineer a role a python engineer who can write code to achieve requirements a goal and here is here is the goal. We want it to meet some business
requirements and if you look here you'll see that there is like a tag A templated tag in curly braces called requirements. And in fact you'll see there's some others. There's one called module name and one called class name. So there's three templated tags and I want you to remember this because at the very end we're going to be able to choose what they should be. That's going to be our inputs to our crew. So this is where we've defined our three and as you'll see there's also like this backstory which gives sort of a nice
Background. You're a seasoned Python engineer with a knack for writing clean efficient code. Now as I say all of this crew is going to use to generate a system prompt. And again this is both the sort of the the pro and the con. The downside is that we sort of have to to conform to this way of thinking about it and we're signing up for the fact that Crew uh has an idea for a really good way to make system prompts out of these components. But the good side of it is That Crew has put
a lot of work into this. They've done a lot of analysis. They've done lots of data science to figure out that this is a really effective way to have a system prompt that organizes an an AI agent. And we're taking advantage of all of that. So that's why we do it that way. And at the bottom here, we specify the LLM. And actually for this one, I'm going to use Claude 37 Sonnet latest. Uh for probably for many of us, Claude is the favorite, But particularly when it comes to coding, Claude is really great. And
Claude 37, I'm getting some nods. It's it's amazing. I I I use many of them all the time, and Claude 37 always beats the others for me. So that's what we're going to use for our LLM engineer. All right, going to the tasks. All right, we uh are going to here, sorry, I'm going to delete the stuff that was already in there. And I'm going to paste in the task that we're going to Have here, the coding task. Uh and uh again, it has a description. Um and here again, we've got this template requirements. It
has expected output, which is where we tell it we want you to make a Python module. And you can see I've got like an important line in here. And this is the result of experimentation. Before I put in this note, uh, it tended to output something which wasn't a well-formed Python module because it included back ticks and the Word Python at the top of it, which obviously broke. So I had to add this in to make it behave itself. And this is the kind of experimentation that you need to do with these agent systems. Here
is where I assign this task to an agent. And of course, I'm assigning it to the very agent that we just defined. And then here is where we specify the file we want it to output to. So I'm going to save this. So now we have one agent and one task. So if you remember The next step, the third step is we have to go into a module called crew.py. And I'm deleting it comes with a whole ton of stuff that I'm deleting everything that was there. We're going to make our crew. And this is
what the code looks like. It's very simple. you you tell it about the two config files that we just wrote and then you have a simple uh method for each of your agents and tasks. So in our case, we have a simple function for our agent and one For our task right here. And there's a decorator you use at agent for your agent and at task for your task. And again, you're probably getting I'm going on about this a bit, but you're getting a sense of this is this is the kind of uh framework that
you have to conform to. So there's a bit of learning. You have to remember to use that decorator. And if you don't, you get a slightly obscure error which is a bit hard to track down. So it's again, it's the big Benefit. You get a lot in the box, but you have to follow these this this this recipe to make sure that your crew works well. So for this agent, I'm I'm uh telling it to look up the the config for backend engineer we just wrote. And look here. Look at this. These two lines are
the two lines that mean that this agent is going to be able to write code, run it in a Docker container and look at the results. We say allow code execution and do it safely in Docker. And that's it. And that has defined our agent and our task. And here is our crew. Uh we we have a bunch of agents and tasks. And here we're saying we want this to work in a sequential way, not in a hierarchical way. I'm going to change the maximum execution time to be a bit longer. And the the the
fourth step is the main.py. And I'm going to uh paste in the code here. And main.py has a run Uh method in it. And run takes some inputs. We have to set up a little dictionary of inputs. And these here, I hope you recognize them. These are the three template tags that we had put in the YAML. This is where we're specifying the business requirements, the module name, and the class name. And then this is where it comes together. We say we want our team, we want the crew, we want to kick it off, passing
in our inputs, and that will then kick off our crew. So, and I should say all of this code, of course, is in the repo as well. Uh, so this what you're hopefully thinking is, okay, so what are the business requirements? What are we going to ask this thing to build? Uh, I'm going to kick it off and then and then I'll show you the business requirements. So, let's kick it off and then we're going to go for lunch just as we get to to to 12:30. Uh, so let me bring up terminal. I'm going
to go into that directory. I'm Going to go into the team directory and hopefully you remember that that fifth step, the final step is I just type crew AI run and it should just run and we will see it does some installation building the team. It's installing some packages. Now, while it's doing that, I'm going to scroll up and I'm going to tell you the business requirements. So, we're going to ask our crew or right now our single Developer to build a simple account management system for a trading simulation platform. The system should let users
create an account, deposit funds, withdraw them. It should allow them to record that they've bought or sold shares. It should be able to calculate the the value of the portfolio, the profit and loss, list the transactions, and it should have some error checking. It should prevent users from withdrawing money they don't have Or selling shares that they don't have or buying shares that they can't afford. All of those checks should be written there in the code and that is what we are assigning to our agent. And here it is. It's already it's thinking and you'll
see somewhere if we look higher up that it has it is using a code interpreter tool and it has started to run code in a docker environment. Our Python engineer is busy working and is going to be coding away. And we're going To leave our Python engineer coding while we go off and enjoy a fine lunch. [Applause] Okay, everybody, if I could get your attention, please. This time is what's known professionally in the industry as the death slot. This This is the after lunch slot. I I have rolled up my sleeves because this is going
to be sleeves rolled up, hands on time. I'm going to compensate for the death slots by being even higher energy Than usual. Uh and hopefully keep everyone excited. All right. But while you were out feasting on a delicious lunch, we had an agent working hard on writing some software. And the question is, was it successful? And as I told you, this is this is something which is not certain. There is a level of risk associated with these kinds of systems. So we shall now find out together uh whether or not it was successful. Uh let's
have a look. This is us back in cursor and what we'll see if I just close this here we will see that there is a folder output that has been created in our folder called team and if I expand output there is in fact a python module called accounts.py pi that is sitting there. And if I open this and I just close the window down here that's in our way, you will see that there is a ton of Python code in here. Uh and uh it's got comments and it's got type hints, which Is a
good way to know that I didn't do it. Uh it's been written by someone that's a much better coder than me. Uh so what you'll see here is something that look at look at how much code is written. All of this and you'll see there are methods like print holdings report, get holdings, print transaction history. It's basically everything that we asked for in our spec has been built by our agent. So that's pretty cool, but I want to take it further. I think it Would be more interesting if we put together a team to build
this rather than just our lone developer. And if we included a user interface and a tester to be testing it as well so that we don't just look at Python code but we actually get to see a result. So that is going to be the next step. That's what we're going to do now. It was able to code but can actually build an application. Let's see. All right. And again this all of this code is in a in The folder number two engineering team. And you will find there's already a folder in there called engineering
team that has the complete solution as as I coded it and as I ran it. But you could also replicate what I'm doing and do it yourself uh if you want to see it work. So we're going back. You remember that there the two YAML files. A quick revision. There's the agents.yamel, the task.yaml. We're going back to this file again. This is where we defined our Backend engineer. And I'm going to delete that again. We're going to uh write some new code. And here it is. I typed it fast. Uh we now have uh these
different agents. We have an engineering lead. The engineering lead is is something that can can review requirements and produce a detailed design and write that to a design document and it's going to use GPT40. We're going to mix it up and have a few different models. Here we're going to Have a backend engineer. The backend engineer is the same one as before. This is exactly the same same person we had before except there is a difference. It's going to expect to receive a design document. We're going to have a front-end engineer and when I say
front end, it's going to build a Gradio app because I love Gradio. That's the way to do it. So, we're going to try and have it build a Gradio app. And we're going to have a test engineer, a Python Engineer who is able to write unit tests and and then uh use those unit tests to test the product. So, these are our four agents. And now I'm going to make some tasks. Let's bring back this. Make sure I saved it. Bring up tasks.yaml. Delete what we had there before. And put in something new in here.
Um, there we go. So, again, this is a fairly simple example. We've got four tasks and each task maps to one agent. And it doesn't need to be this way. You can have Multiple tasks assigned to one agent. You can have more complicated setups, but this is a very common and simple way of organizing it. So, four tasks, one task assigned to each agent. The design agent gets a design task. It says that it needs to produce a design document and it needs to output that design document. It's assigned to the engineering lead and it
needs to be called design. It's going to be in markdown. Okay. Now, This code task is the task we're assigning to our developer agent. And there's going to be something different here and it's super important. We are adding this here. We are saying that this task requires some context. It requires the output from the design task. So whatever comes out of the design task is going to be provided as context to carry out the code task. And this has now introduced a relationship between these tasks. And crew will take Care of all of that for
us. We don't need to write any code to deal with that that context. It will come out of the box. That is that is the batteries included part of crew. All right. So far with me a front-end task. So this is going to build a front end uh and it's going to do it based on the backend code that was written. And so you can see what's happening here. Again, I'm using context. And this time I'm saying that for the front-end developer, the context Is the backend code because it's going to write a UI to
bring that to life and it's going to output it to app.py. And then finally, the test task which is going to write unit tests and output them. And now actually in the break uh Robert was talking to me and pointed out that that a cooler way of doing this that I should have thought of would be to use testdriven development and have it write the tests first and then have the backend developer write the back end That meets those tests. And that's a really great idea. And if someone wants to try and change change it
around to do that, I would love to see that. That would be a cool exercise. I bet it would work great. Uh we've got a bit of that going on because the design document is written up front, but this would be even better. So that's a an exercise. All right. So with that, we now go to our crew module. Here it is. We're going to rewrite this. And there is our code For our crew module. It's very similar to before. It just has these agent methods for each of our four agents. And then the same
for each of the tasks. You'll see again we're just using this code execution. I'm going to give it some more time so that it doesn't time out. And if you're following along in the code, you might need to change this as well. I discovered when I did a trial run yesterday that it timed out. So, so 500 is a is a better number. Let it run For a few minutes. Uh, so that is our crew module. And now you'll remember the main module. The main module is where we we give it the business requirements and
we kick it off. And the cool thing is that we don't need to change this at all. Exactly the same main module that we used for our single loan engineer. We can use the identical one now. No changes at all. This as it stands right now with team.crew.kickoff passing in our business requirements is Exactly what we need. So that is now an engineering team of our four agents. And now we're just going to see whether out of the box this just works. Uh, and so I'll come back here to the uh, terminal. Uh, actually I
think I'll give myself a new terminal window. Hold on. There we go. We'll go into the second directory. We'll go into team, which is the the directory, the project we just built, And we type out that fifth step, the command to launch our crew, which is crew AI run. And this is now going to run our whole crew. Now, the first thing it's going to do is it's going to write a design document. And so, what we're hoping to see is that in this output directory, we're going to find shortly that the the what's currently
there, the old Python code, is going to be joined. There we go. Did you see that? It's been joined by a design document. Let's open This document up and see it in preview. Here it is. It's a design document that describes what's to go in our module and it gives the method signatures and the dependencies and uh it's all laid out here for our back-end developer to work on. And that's that's pretty fun. Now, what I'm actually going to do now is what I warned you that I was going to do at the start. I'm
going to do the the the baking maneuver when I'm going to say we've put this in the oven, but I Ran this earlier because it does take about 5 minutes and I I ran it earlier, but I but I do assure you that this is legit. I really did run it earlier. And if any of you doubt me, then you should run it yourself and you will see that it's going to do what I something similar to what I do, although it's it's a bit different every single time. Uh and it doesn't always work. It's
about 90%. So uh I uh yeah you should definitely try and check that you are Part of the uh of the the good runs but I have already put the output in there and I'm going to show you what it comes up with right now. Let's have a new terminal window. So I put the output in a folder which I also checked into to get so you can see this yourself and it's just called output and you'll see in output there is a Python file there is the same design document and there is an app.py
and there is a test accounts.py. So I'm Going to go in now to that folder and go into output and now I am going to run app.py. So this is running the user interface for that account management application that we wrote. And I haven't touched a line of this code. This code came straight out of the agents. Let's see what it does. So it's running Gradio. So I press shift click here to bring it up. So here we go. This is a user interface. This Screen that you're looking at here was built by our agents.
So you'll see right away it's got a create account section. So, I can say like one, two, three. Put in an initial deposit of $1,000 and press create account. And I press that. And it says down here, I'm going to zoom out a little bit. It says account created for 123 with initial deposit. And it's got some more information down below. You'll also see there are two tabs up here. There's a Trading tab and a reports tab. So, I can go to the trading tab and I can say, "All right, I would like to buy
one share of Apple. buy. And it says, "Successfully bought one share of Apple at $150." And this is my account information and my holdings down here. I hope you're as amazed by this as I am. It's incredible. And then, so I'm now going to buy a share of Tesla stock. Tesla. It's not Really buying share of Tesla, but I'm going to buy a share of Tesla stock. And here we go. Bought a share of Tesla at $800. And there we go. You can see that it's that it's got the holdings down there. Okay. I'm now
going to come in and I'm going to say I'm going to try and sell three shares of Tesla. I do that and it says error insufficient shares to sell. So, just as we asked it, it checks if you've got the shares to sell before you do so. And I'm going to Try and buy 10 shares of Tesla and it's going to say error insufficient funds to buy shares. And now I'm going to go to the reports tab and I'm going to press portfolio view. And it it gives us that profit and loss, current holdings, and
transaction history. And it's showing us all of this information about our portfolio. And I got to tell you, I think this is absolutely extraordinary. It's incredible. This was built the whole Thing. I didn't touch a line of code. the business logic, the the uh front end, this user interface with these different tabs and the test code was written by a team of agents working autonomously on their own and generating this and I did nothing. And I I can believe like as a project manager, I've asked engineers to build something like this and it's the sort
of thing that maybe they they turn around in a day and you'd come back at the end of the day And and it would be done and you'd say great that's that's what I asked for. This this takes about 5 minutes. takes about five minutes and you just leave it running. You come back and it's done. So, I find this absolutely dramatic. Uh and uh and I hope you do too and and you should run this yourself. And the the final little thing I'll leave you with, which is also spectacular, is that every time that
I run this, I get a slightly different user interface. Sometimes There aren't three tabs. Uh one I did last night when I was testing this just had two or one time it did everything on one big screen. Sometimes they have different types of checks and different UI widgets. So it's it's really interesting that there is something it is kind of choosing its own adventure. There is the real autonomy that you see uh happening there. So again I urge you to try this for yourself. See it come to life. Yes. A Question. What the test? So
the test wrote Python unit tests in test for sure. Let's have a look. Uh uh uh it would be under Not here. What's the percentage? We could have had an agent to come up with that. Here it is. Test accounts.py. So, and this is also checked into git so you can see it yourself. Uh, hang on. Let me just close this window here. There we go. So, this is the code written by the tester agent. Yeah, I imagine it was just the standard unit test. Yeah, the standard pi not not piest. Uh the question was
which unit test library did it use? It picked just just uh uh the the standard unit test. So there you go. And check this code out. It is checked into uh to to the source code as well. It's a great question. So for a project like this, would I recommend OpenAI agents SDK or Crew AI? And it's going to be a typical answer to these things, so don't hate Me, but it's it depends. Uh, and it is it's very much a matter of preference. With Crew AI, you get a lot that comes in the package
in the sort of batteries included uh mindset. It was so quick to build that. Like this whole uh project, it was about uh 10 15 minutes worth of of working. It's really quick to get up and running. Now, I will admit that when I first did this, I made a couple of mistakes. Um, one of the mistakes I made was that when I was writing the task YAML, I had a typo that wasn't exactly the same as the YAML file in the agents code and it failed and it had like a stack trace that wasn't
immediately obvious what was going on and it took me a while uh to figure out that's what it was and and and then it was easy and of course now I'll know if I ever see that again what's going on. So you get a lot uh that comes in the package with with crew but you are you are using a more opinionated framework which has some Negatives as well and if you feel more comfortable with a more lightweight framework you have to put some more work into it but you have a lot more control finer
grain control over exactly what's happening um and we were just actually talking about that if you wanted to be able to debug the Python code it was making while it was doing it then if you used OpenAI agents SDK you'd have a lot flexibility to build in that kind of functionality. Uh so it's a trade-off And a personal decision. Um hi, I had a question. Um if I used an agentic framework like Langraph to do this, I would be defining the nodes um as the agents and specific edges to kind of define the flow of
execution between the different agents that I want the engineering lead to agent to first compute and then flow down to the backend engineer. So over here, how does crew AI know what the flow should be without specific edges? Yes. So it's a bit more simplistic with crew than it is with with a langraph or or Google ADK. The um you do explicitly assign each task to an agent. Um and then um you get to pick two different modes, the sequential mode or the hierarchical mode. And if you pick the sequential mode, then it literally uh
runs each task one after another in the order that they're laid out. But if one of your tasks says that it's input, it requires context from another task, then It won't start that until all of the context has been satisfied. So it will go through in order, make sure that it's run any prerequisites, and then it will run your task. So it's a reasonably simple sequential process. If you choose the hierarchical process, then you have to also choose a manager that can be an LLM or it can be an agent and then it delegates to
an LLM to decide which task to run in what order and then you have to to sort of use prompting to to to Figure that out. Um, so those are the two techniques and it doesn't have quite the same level of flexibility as like a LAN graph where you can really map out your whole dependency tree and have it be something that's uh where where where that is really flexible. Thank you. You had mentioned that each time that you run the the prompt that we just saw that you have a slightly different flavor to the
guey and I was wondering with each of those iterations were you using the Same exact prompt letter for letter? Yep, I was. uh and you can too like if you run it twice just from the code you you'll get different answers. Now of course you you probably know this but when you're calling an LLM you can actually set the seed you can pass in a random seed that's used to make an LM behave in a more deterministic way. If you set a random seed and people traditionally always set it to 42 as a as a
nod to to Douglas Adams. uh you set A random seed and you also set the temperature to zero, then you uh in theory will always get the same output, but it's not guaranteed. Uh it's just that the the provider will do their very best to try to to always uh give the same output tokens given some input tokens. But the the whole process of of uh running an LM has some uncertainty. uh for a start you're running some processes uh in a sort of very distributed way and things can run in Different sequence when they're
running on the GPU in in a in a big distributed setup. Um but there's also probabilities involved. If you have a temperature then it's it's sampling the output tokens from a distribution. So you could get a different a different output each time. Um, so yeah, the the short answer is you you can use random seed and temperature of zero to almost guarantee that you get the same UI every time, but otherwise you should expect differences. In this Example, we see it kind of from the beginning, you know, a uh proof of concept uh use case
that's being implemented. Could you talk a little bit about the experience of taking an existing project and how this might be implemented in that going forward? It's a great question and I would say that agentic AI is a bit of a brave new world that a lot of these things are happening for the first time. So I'm not sure that there's Yet like a a a clear order of play. The the thing that I would always stress to people is uh starting small. So if you have an existing system and you want to try and
introduce agentic elements uh certainly from from my experience starting by not being too ambitious but choose a very narrowly defined very clear part of the process where you think there's a compelling benefit to using agentic AI find a way to evaluate that benefit like have a metric that You're looking to improve and then build an agentic solution that just tackles that very narrowly defined uh part of your problem and can demonst demonstrate performance improvement and commercial improvement through that. And then when that's being successful, you can gradually start to extend the solution, have more uh
flexibility, um add add more agents and and perhaps take take more risks. All righty. So module one on defining agents and getting used to the OpenAI agents SDK as well as now module two on designing agents and getting a whole remarkable engineering team with Crew AI going on whatever engineering task you want. Both of those modules are in our rearview mirror. And so now it's time for the third module on developing agents. And who's excited about MCP? That's like one of the hottest topics right now. Yes, we have lots of hands and some actual cheers
which is nice. So MCP very very cool library right now. Let's get into it and then I'm not going to be up here very long. I'm just going to give a a bit of theory around how MCP works and then Ed will show you hands-on and we'll take advantage of our crew of engineers to build some autonomous traders with MCP. So MCP stands for model context protocol. It is most commonly described by in seemingly every blog post that you find about MCP describes it as the USBC for agentic applications. And you will Notice that due
to hallucinations and errors, the Gen AI model that created this pop art uh has a USBA cable. Uh MCP is nothing like USBA. It is only like USBC. All right. So, uh what is MCP not? Let's get some things out of the way. I mentioned this earlier when I had that slide with the six blocks of frameworks and things like frameworks and I said then that MCP is actually not a framework for building agents. It is not a big fundamental change to how agents work and it's not even a way to code agents. So then
why is it so exciting? Why does everyone want to learn about it? Well, it's because it is a protocol. It's a standard like USBC that tells you how you should uh set up some communication uh in a standardized way that makes it very easy plug-and-play USBC to connect to lots of different uh agentic tools that you might like for example. Yeah. So, it's a Simple way to integrate tools, resources, prompts all under the same protocol. Yeah. And so, there we go. There's that quote again. USBC port for AI applications. And so lots of people here
are excited. Maybe you shouldn't be. Uh there's reasons not to be excited. So MCP is just a standard. It's not the tools themselves. The tools themselves are what really do the work. So maybe that should be exciting. Lang Chain actually already has a big tools Ecosystem itself. And we already showed you, Ed already showed you earlier in today's training how you can take any function, add a decorator, and turn it into a tool. But of course there are reasons to be excited. Uh and this kind of follows on exactly I had a hard time not
even just saying it out loud when I made the first bullet here about how it's not a it's it's just a standard. It's not the tools themselves. But the key thing here is that it means that now All tools become extremely fast and easy frictionless to integrate and it's taking off. So yeah, Langchain may already have a big tools ecosystem, but MCP is surely going to overtake it if it hasn't already. So I've got a chart here of GitHub stars over time from star-enhistory.com, which has lots of GitHub star charts if that's what you're into.
And shown here in red is MCP. Also perhaps of interest is crew is the teal blue running over The top. So crew also obviously taking off though kind of more more linearly whereas there's this crazy uh MCP line hockey sticking upwards. Who knows how high that'll go. Bye bye now. And uh yeah and similarly this is a point that I mean all these points are really points that Ed makes uh and I am borrowing but uh he makes a great point here that HTML likewise was only a standard yet it is now ubiquitous. It's almost
like the currency of everything That that flows flows to us over the internet. So uh MCP could similarly be uh that kind of equivalently ubiquitous standard for the agentic world that we are now entering into. Core concepts of MCP. Um so I'm going to move a little bit slower on this because we're getting into definitions. The last slides were all kind of fun. These are a bit more serious. Uh and we have some key terms here. So the there are three key components to uh to MCP. The first one Is something called a host. And
so you hear this word host so much in computer science and connectivity. And so in this case to concretely define this, the host is an app. And so it's an application that is something like if you have clawed desktop on your laptop, you can run MCP from that. So you can use that as your host. Um you could also use cursor as your host. That's that software application. Um, critically, you can't use Claude on the web. Like, If you go to claude.ai, that isn't uh a host. You can't use that as a host, at least
not at the time of me standing up here. Uh, but the Claude desktop app will allow you to do that. You can also have your agent architecture be the host. So, hopefully that's relatively clear. I have some charts coming up that will also explain how these terms connect to each other. The second key component is the MCP client. And so the MCP client lives inside of the host. So The host is an application and it hosts MCP clients. And so a key thing here is that it connects one to one with an MCP server. So
for every MCP server that you have, I'm going to get to what that is in a second, but the server is basically what's actually providing what's actually doing the tool use or or providing a connection to the database. um that yeah the MCP tur server provides tools context and prompts and for every MCP server that you connect to you must Have also an MCP client running inside your host and so it's a simple onetoone identification and um so basically these MCP servers they have all kinds of abilities and you're like wow I would love to
have that ability for my agent to be able to call on that ability okay this MCP server does that I'll spin up an MCP client corresponding to that server inside of my host that's already running. As a concrete example, Google Maps is an MP is an MCP server with Geoloccation tools. And I thought there'd be no better way to understand that easily than to bring up the GitHub repo. So we're here in the GitHub MCP model context pro protocol repo um or account I suppose. And then in there there are lots of different repos uh
put there by tons of different people from all over the world and it's growing all the time. And so as an example, I'll zoom in here a little bit. If you have uh the Google Maps MCP Server running, you can use the maps geocode tool to convert an address into a coordinate or the reverse uh a reverse geocode uh converting a coordinate into an address and all the kinds of other things that you might imagine uh Google Maps might allow you to do, even directions between two points. So that hopefully gives you a sense of
a specific MCP server and the kinds of tools that you might want to have access to as a result of being able to access That server. So in this example, you have Google Maps on an MCP server with the geoloccation tools that you want for your agent to be able to to access. You can configure then say claw desktop to be the host to run an MCP client that corresponds to the Google Maps MCP server and that launches the Google Maps MCP server say on your computer. Now uh that might those final three words on
your computer might have surprised you because server sounds like something Remote. Um, and there's different options. So, let's talk about architecture. So, on your computer, you have this host running. Let's say this is the clawed desktop app or this could be cursor. And let's say we have two MCP servers running also on our computer that we want to be able to have access to. So we can pretend that this one here in the bottom right corner is Google Maps, the Google Maps MCP server. And The one on the left is accessing your local file system.
So in this case, we happened to have both of these MCP servers. We spun them up. We ran them on our own computer. So that's an option for you. And I'm going to talk about the different options in a moment. Remember from the preceding slide that for every MCP server, we must have a corresponding MCP client running inside of our host. So for the Google Maps MCP server, I have my Google Maps MCP client Corresponding to it. For the MCP server that's accessing my local file system, I have an MCP client for that as well.
And they are connected by MCP by the protocol. So the white arrows are really what MCP is. That's this is the protocol that makes it easy for the client to speak to the server um to have this painless way of connecting um to any kind of tool that you might like to separately on a remote server. You could actually have another MCP server running That for whatever reason due to workloads you know some kind of security requirements I'm not sure for whatever reason you want to have an MCP running on a remote server you can
spin that up as well. It could be your own uh it could be your own remote server or it could be uh some public cloud infrastructure like GCP or Azure or AWS. And of course if we want to be able to connect to that MCP server running on that on in the cloud say we need to have A corresponding MCP client a third one running inside of our host and again those are connected by the model context protocol itself. Now there's a key piece of the puzzle which is that when you have that Google cloud
or so is that Google Maps server running here that doesn't have all of Google Maps information in it. There isn't an open source repo that you can download where Google gives you all of their most valuable IP. And so even though this server is running on your own machine, there could be circumstances like with Google Maps where it still needs to reach out over the internet to some remote server that in this case Google is hosting to access the key data to be able to do that geoloccation to be able to find directions between two
points. Um so there are three different situations happening happening here uh in terms of local uh local remote decisions that you Have and so we'll go in a counterclockwise order in the local file system situation here we have both the MCP server and the host running locally and critically the work for this accessing your local file system that doesn't involve anything being sent over the internet, any remote access, everything happens locally on your machine. In the second example here, which I just went through in a fair bit of detail a Moment ago, you have the
client and the server running on your own computer locally, but in order for that server, say Google Maps, to be able to do its work, it has to grab information via uh some API over the internet. And then finally, the third situation is where you have the MCP server running remotely. And then this might also likewise like if you had the Google Maps server running remotely, then you'd have another line going off to another server Calling the Google Maps API separately. But I think you get the point here that there's these three different scenarios for
how you set up um your uh client to run. Now, the common misconception here is that MCP servers run remotely. And I think that's related to it being in the name. You have this natural association that an MCP server should be running on a remote server. But the most common reality is that you download an open source MCP server and you run it Locally. And so then you're either you're doing one of these two scenarios here where either everything can happen locally like when you're accessing your local file system or you have the local server
calling out to an API. And my final slide before we get to another code demo from Ed where he actually will be using MCP in code for you is why you might want to consider making an MCP server yourself. So you know there would be some labor Associated with making a server. Why might you want to do it? Well, you could share. So uh lots of companies, lots of organizations, lots of individuals are today creating MCP servers so that you maybe some tools that your company creates can be accessible to MCP and that could actually
potentially even be a revenue source for you. Um you could have a key associated with it. They have to provide you with their credit card details in order to use your company's Uh tools. um and maybe they're going to end up having tons of agents and maybe they're going to write those agents poorly and have a hard loop and give you tons of money. And so another reason why you might want to have an MCP server is it allows you to uh consistently so if you if you have an agentic application that is calling on
a bunch of MCP servers, you might just want to have consistency and say, "Okay, I want my own MC my own tools to be called in the Same way." Okay, so you have all that consistency and you might just want to understand the plumbing of what you're doing. And the main reason why you would not want to make an MCP server is that it it's just for you. So if if there's there's no point in going to the effort of making an MCP server if you're the only one that's going to be using it. Uh
because if you if you want to be using tools, there's easier ways to do it than spinning up an MCP server, creating an MCP server, sorry. Uh, and so the function tool decorator that Ed already showed us can trivially easily make any function into a tool. So that would be the place to start. All right, that is your MCP intro. Ed will get into the weeds. Marvelously explained as always. And now it is so exciting to be going into this project which is the epic project. This is what it's what it's all coming to. So
so let me tell you first about the Project and then we're going to get started with it. We're going to lay some some of the the foundations then we're going to have our our afternoon coffee break and then we're going to bring it home after that. So this project is called autonomous traders. So the uh the idea is we are going to build a bunch of agents that are able to make decisions about investing in financial markets. Now one reason that I picked this idea is that it's Commercial. I I find that a lot of
the projects that you see in in in the world of agents right now, they're all quite technical in nature. They're all different different kinds of ways that you could do things that that build a deep research or or something to do with technology. And you don't often see like true like commercial solutions. So I wanted something that would feel concrete and tangible and like something that could be could could uh be Something that would be applied to a real business like financial markets. It's going to involve six different MCP servers. That's going to end up
using 44 tools and two resources. So we're going to equip LLMs with a lot of power. Uh and that's going to be fun and see if they how they survive with that. There's going to be interactions between agents. So, there's going to be communication between agents and they are going to be autonomous. They're going to be able to To pick their own adventure. And then I do want to say one more time something that John and I have said a couple of times, which is that you should not use this for trading decisions, please. Uh
we we uh we we we do not want to be to be uh sued for for any financial losses that that result from this. But I will say of course if you do use it for financial decisions and you make lots and lots of money then we would love to be invited to your uh yacht launch Party. Uh so don't forget us. Uh but no do not use for trading decisions. Uh and uh I do also want to say that there there's going to be this is quite a big project. It's quite meaty. So I
am going to go through it again relatively quickly and I do ask that you use it to get intuition. use it a way of of taking down what's going on and then exploring it yourself later, remembering to to contact me if something goes wrong and we'll get it fixed up. And I realized I Never fixed up. I will fix uh whatever's going on there. And and uh afterwards I'll be around if people are trying these things. We can uh we can figure it out with that. I uh we are ready to start our project. Um
when I uh made this image, we there were only three traders. There are now four. So, this is now slightly out of date, but you get the idea. We are going to be building autonomous agents that know how to trade the market, and they are going to be Using real live market data. So, they're going to be using real uh stock prices from financial markets, but they're not going to be making real trades, luckily, uh at least not yet. With that, we're going back into cursor and we are going into the third project now, trading
floor. And you'll see we've got a bunch of labs to go through. And we're going to start with the first lab. Let me come up here and I'm going to make this nice and big. And welcome to the start of our third project and to the model context protocol. So as I say, do consider this to be like a teaser and uh take it as a as a exercise to do more detail. Now I do have a warning right away on this for Windows people. Uh so there is a current like production problem with using
MCP servers over Windows that that uh it uh tends to come up with a with a bunch of problems with connectivity and the solution is to use something called WSL Uh Windows subsystem for Linux and that it's a bit of a bore. It involves configuring a Linux uh process to run on your PC, but probably uh show of hands for people who have who are PC users here. And how many of you have WSL already? Okay. Oh, no, not as many as I thought. All right. Well, for the others, unfortunately, you will need to install
WSL, but it's pretty easy. And I've got a full set of instructions in the setup Folder, and let me know if that gives you any problems. All right, but on a Mac, you're fine. So the moral of the story is uh not not that I'm an Apple fanboy. All right. So we're starting with some import statements. Okay. So we're going to start with our very first MCP server and it's one called fetch which is an MCP server that Anthropic made available as as one of their kind of reference Implementations. So the way that you that
you uh use MCP uh we we're going to use I I I should have said before we're going to be using OpenAI agents SDK my very favorite. So we're bringing back OpenAI agents SDK and it makes it so easy to work with MCP servers. It's it's really great. Um, but whether you're using OpenAI agents SDK or or or or using something like claw desktop as John described it, one of the things that you need to do is have parameters That describe what kind of of MCP server you're working with. And here is one of such
parameters. Um, and this describes the MCP server called fetch. MCP server fetch. And it is in fact uh like a command that's that's similar to to to a Python uh command that that runs a tool using the Python package manager UV which is which is a super amazing package manager which I have instructions on in the in the guides for people that are new to uh to UV. Um and It it downloads uh uh something called MCP server fetch and basically it does a pip install of this behind the scenes. it pip installs this locally.
And this ties to what John was explaining that whilst it's called an MCP server, it sounds like we're going to be running something remotely. It's not. It's doing a pip install and that's going to run locally. And in some ways, you can think of pip as being a really nice analogy to the whole MCP ecosystem overall. The way That that pip uh the way that the pi has made it so simple to find a package and then just pip install it and have it running locally. It's like there's a sort of protocol there. There's a
way that you can go to Pi, search for a package, see how popular it is, and then do pip install to bring it locally. That's sort of what MCP is. It's a way to have people tools that people have been written uh that are available that you can then just do an equivalent of a Pip install. And this this is how you describe which one you're using. So, we're describing the fetch tool. And what fetch does is it's able to go and fetch a web page. Sounds pretty simple. It's able to go and get a
web page. But what's more ingenious is what fetch actually does. What fetch actually does is it runs a headless browser on your computer and then it uses Playright to then go and find it uses browser automation with Microsoft Playright to Go and request that web page and then it reads its contents and returns it to you. So there's a lot of code in there. That's that's a lot of heavy lifting and you don't need to worry about any of that. You just need to know these parameters. So I now got some some OpenAI agents SDK
code here that is going to uh connect to this MCP server and it's going to ask it what tools do you provide? So right now that just got installed on my computer. It ran it. It collected the tools and it only has one tool and that tool is called fetch and it fetches a URL from the internet. Nice. That's our first looking at our first MCP server. We're going to look at another one here. We're also going to specify the parameters. This is a more complicated one that's that's provided by Microsoft and it runs the
Playright uh Microsoft browser automation and it brings up a headless browser on your Computer and it gives you much more control over it. So if I look for its tools, you get all of these tools that will be provided to our LLM. Look at all of these. It can close a browser, resize it, upload a file, navigate, navigate forwards and backwards, save a PDF, lots of different things that it can do. So that's another MCP server uh with a with a ton of code. I've got one more to show you and then we're going to
put them all into action. This one is a another very Simple one uh that's called server file system and it's a tool which allows your agent to read and write files from your local file system and we're going to to to choose a folder called sandbox which I've got here on my on my computer and we're going to give it the access to read and write files from sandbox and let's just see what tools this server has. So it has tools like read file, write file, list directory. So these are all of the tools that
will be given to Our agent if it uses that MCP server. Okay, so we've got tools that allow us to fetch web pages, to navigate in a browser window, and to read and write files. Let's now put this together. There's a question. It's a great question. Is there a way to select which tools to give to the agent? uh there is which I wish to not give which Yeah. Yeah. uh so uh the answer is yes but it involves you you going slightly uh uh off you you can't use the default Mechanism. So the default
mechanism with OpenAI agents SDK is just to list the servers here. And if you list an MCP server like this, then it will look for that MCP server, get all of its tools, and provide all of those tools to uh the uh the the agent that you're controlling. If you wanted to subselect to either reject or accept different tools from this bunch, all you would do is simply get the tools using list tools, filter out the ones, you can ask For each one has a name. So you can just filter them by name, select the
ones you want to to pick or or reject the ones you don't, and then instead of saying MCP server equals, you simply provide the tool itself. You say tools equals, and then uh you'll be able to call the different tools uh that you want out of the bunch. That's a great question. But if you if you go with the out of the box approach like this, then it will give them all of the tools, but you can use The prompts as your way to try and tell the the agent not not to use all of
them. Okay. So with this, I now wanted to have a useful productive task to give our agent. And what I came up with uh for our first assignment. So first of all uh we we have our instructions which you remember is OpenAI agents SDK's version of the system prompt. We say you use the internet to accomplish your instructions. Be persistent until you've solved your assignment. And uh then uh We have this code. We use a context manager. We say with the playright one and with the files MCP server, I'm going to create a new agent.
It's called the investigator. And I'm going to give it those two MCP servers. And that means that this agent is going to be able to call any of the tools that are implemented by uh these MCP servers. So what is the task? Well, the task I've decided to assign it is to find a great recipe for bonafi pie, which happens to Be my favorite dessert. Who here knows what bonafi pie is? I think it's a British thing. Look at that. That's a it's a tragedy. But the good news is the good news is you're all
about to find out the recipe for bonafi pie. You'll also find it in the GitHub repo as well because it's already been created by by this agent. It's like a mixture of banana and toffee and cream and chocolate and there's nothing not to like about that. So, let's see what Happens if I run this. And it is quite crazy. So, I've just executed this code and while I wait here, what I believe is going to happen. Oh, wait. We're just going to It's It's I don't want to disturb it while it's doing its thing. Uh,
so it's currently, if you saw that, it just opened up a browser window. It opened up I I didn't have my hands on the keyboard. It opened up a browser window and it's uh now looking at recipes, controlling this Browser. It's going to take time because it's now, I believe, learning about Bonafi Pi, which everyone should know about. Uh, and at the end of this, bam, it finished. This is the pop-up you talk about. Sorry, John. Yes. Uh that is annoying. But this is a recipe for Bonafi pie that you'll see that it just took.
So it launched a web browser. It searched for recipes for Bonafi pie. It found this recipe. It's put it there. But also if I just show you the file browser, there's a folder called Sandbox here that it has written a recipe for Bonafi pie in there. And if I bring this up, you'll see it's been nicely formatted. A beautiful recipe. Um, and bonafi pie I I I can't cook anything. I have no cooking skills consistent with my uh hand eye coordination issues, but bonafi pie is one thing I can cook. Uh, and I can tell
you that the recipe that it's come up with is a legit recipe for Bonafi pie. Uh, strongly recommended. So anyway, a frivolous example, but the point uh was that it was so easy for us to equip an OpenAI agent with the ability to drive a browser to bring up a browser window and do things to that browser window and then to to to report on it and to write that to a file. And honestly, if you if you think about this pattern and think of all of the things you can do with this, uh I
I've I've used this pattern to do so many to Automate so many tasks. I had something where I had to myself just just go and and collect a ton of information from a bunch of different websites and I simply put that instruction into this very prompt and it was able to bring up a browser, go off, collect the information and save it down to a file. It works just like that. Yes, a question. It's a great question. The question was, is the agent that I initiated an MCP client? So uh it it uh uh the
uh the answer is so The agent is really the host. The MCP client is actually coded for us by OpenAI agents SDK. They they in this this this context manager here they've implemented an MCP client that this creates and is attached to the MCP server. So you don't have to do that. Now they only came out with this about a month ago. And when I made the the uh the the Udemy course that that that John has kindly plugged a couple of times, I it was before they had done that. And so I went through
and I built MCP clients myself for all of the different servers. And when I completed the whole project, the day that I completed it, they announced that they now had built uh SCP clients. So I had to go back and redo it all. That's how much this field is changing, how quickly it's changing. So now you don't need to worry about building your own MCP client if you're using OpenAI agents SDK because because this contacts manager does it all for You. Um but we will be in in a moment we will be building looking at
an MCP client anyway just to show you what they look like. Yes. So an excellent question. This is essentially web scraping and uh have I experienced using this with with um uh websites that have scraping protections? Um and the answer is yes. And it it uh only gets through this particular example only gets through the simplest possible uh of those protections. It gets stuck. It it Can say it it can accept cookies with ease. Uh but it it can't uh solve many captures. But uh there are many MCP servers out there. There are MCP server
marketplaces that you can go to. There's one at glamour.ai. There's one at mcp.so SO that has thousands of MCP servers um written by anyone that's that's built some tools and has published them and many of them come with with um much more advanced kinds of of uh of code that will work Around captures by calling NLM to solve the capture and then by putting that in uh of course that that's web scraping is a huge business now I'm definitely uh an advocate for best practice uh scraping principles this is something that that of course is
something that that that people should read up on and understand as good uh internet citizens, the kinds of web scraping that we can do and we can't do and doing things like honoring robots.txt and so on. Super important. Uh but should you wish, there is an enormous set of MCP servers out there that will allow you to to uh uh bend those rules in various ways and and lots of of scraping is happening. Can you speak about security in MCP oath authentication all of those flows? Yeah, so it's very much so the the question about
security in MCP this is very much a a a new emerging uh area. It's something that causes uh a lot of of concerns, rightly so. Um and Anthropic Just released a few weeks ago, I think. Um and an authentication component to MCP, but I haven't seen it yet be take take sort of wider uh being taken on in a in a bigger way. But I think that's probably going to be the next big thing. Um what I'll say is two things. First of all, um if you use one of the MCP marketplaces like MCP.so, So
one of the ways that these marketplaces differentiate themselves is by carrying out security testing themselves. So they Give like a security grade to the different MCP servers published there that reflects the testing they've done. They've confirmed that that that it stays within certain boundaries. Um and there's also of course user feedback that's gathered there as well. So that's a place you can go to for this. And then the other point is is a point people have made which is that in many ways it ties back to that analogy I made with with pip. You should
always consider That running an MCP server is is very similar to pip installing a library on your computer which by its nature is also very much you that that is not secure. you're running someone else's code on your computer and it's incumbent on you as the engineer to do your research to check it's got good feedback to check that it doesn't depend on other packages that maybe have had bad uh stuff injected into them um and that's something that's a very important Consideration it's like npm as well the same thing people have enormous node dependencies
um and it's it's uh it's super important that engineers are aware of that and are checking their dependencies now a concern that people have raised is that one of the dangers of MCP is that it's so easy and it's so accessible that a lot of people who don't have uh engineering or data science credentials are using it to install all sorts of stuff on their Computer. And whereas we can be trusted to do our research, go and look at the number of stars at GitHub, look at the reviews, read the security reviews, um the general
public may not be as well informed. So bridging that education gap and making sure that people understand the risks they're taking when they install someone else's MCP server on their computer. Um I think that's that's still an evolving story. Now you're right to be having some buzz right now Because we are we're starting I'd say the the the most exciting part of this but also the riskiest. So so I'm a bit nervous I have to admit but we will see if things hold together. We after our little introduction, after John uh gives such such a
a thorough explanation of what MCP servers and clients are, we then we looked at a a sort of toy example of using one through the OpenAI agents SDK that makes it so very easy. We are now going to start working on our Project to build a simulated trading floor. And so first of all, I'm going to be I'm going to be showing you some example code and also taking you through some some Python modules because we're we're going to start to move away from notebooks and in into proper uh Python modules. And the first Python
module that I want to show you is a Python module called accounts.py which you will find as a module in the in the three folder for the third Module. So this is accounts.py Pi. And this is a bit of Python code that gives us a simulated environment for managing accounts for traders that want to buy and sell shares, deposit money, withdraw, sell shares, calculate portfolio value, profit loss, holdings, and so on. And I'm hoping that people are uh twigging uh particularly if you notice the comments and the type hints here that this code was not
written by me. Uh but this code of course is the Accounts code that was written by our agents in the second module. Uh now I will confess it's not it's not identical because because I did this a week ago and I got this repo ready. So this is what I ran a week ago and then I put it into this folder. But it's basically code written by agents that to manage a simulated trading environment following exactly the business rules that we set for our crew agents. That's what I have in accounts.py. Pi and then with
with a Couple of tweaks so it would it would be be integrated here. So let's let's let's start by just just using the code. We saw a user interface which was built on top of that. But let's just just uh go for for for looking at the code here. So we're going to import account. That's the name of the class that our agents created for us. I'm going to get the account for Ed. Get Ed's account. We get back an account with a name, a balance. It's got a strategy and I'm going to buy Some
shares. And I specify a stock ticker, a number, and then I say, why am I buying an Amazon share? And I'm rewinding the clock a little bit, and I'm saying because this bookstore website looks promising. All right. So, with that in mind, I bought a share of Amazon. How how wise of me. Uh, and uh we can now call account.report. And this is one of the functions that it made that that gives us details about that account. And I can List transactions. And can see this isn't the first Amazon share that I bought because I've
been uh rehearsing this a couple of times. And you can see that indeed this is all working. So we have a Python class. What we now want to do is turn this into an MCP server. We're going to make our own MCP server. And now as John explained before, if our only intention is to use this ourselves, to use these tools ourselves, then we don't really need to make an MCP server Because you remember you can just put the decorator function tool on each of these each of these methods and it will immediately become something
we can use as a tool. So actually the reason that you would go about creating an MCP server is if you wanted to share it with other people. Otherwise, there's not such a reason. But we're going to do it for educational purposes now. So, even though it's just for us, we're going to do it because I want to show you how Easy it is. And I've got a Python module called accounts server. And this is an MCP server. And I I had an interesting question in the break, which is uh could you build an agent
to write an MCP server like this? And the answer is you certainly could, but it's super boilerplate. It's very light, simple, so it's just as easy just to type it out yourself. But you could equally we could have had our agents we could have had a front-end developer, a backend developer And an MCP server developer that could have turned out this code as well. So this is the code we're using anthropex packages to create our MCP server. Uh there's a there's a class called fast MCP and we create a new fast MCP server. You give
it a name and then quite simply you write each of your tool functions that you want to make available as part of your MCP server and you give it this decorator atmcp.tool and that's just saying this is a this is A tool and it's the comments here are important. This is one time when you will see me writing comments because these comments get used as the description of the tool automatically. So, Anthropic uses these when it's building its its JSON to to define this tool. And I'm simply delegating to the account business logic written by
our agents. Delegating to the account for each of these. So, it's just a bunch of wrapper functions that wraps each of our Of our agent functions. And uh at the end of it, this thing here just simply creates the MCP server. And anyone that that's encountered MCP servers might know that there are in fact two different transport mechanisms that you can choose for MCP servers. There's these the by far the most common uh called stddio standard input output. That means that it connects over your standard input and standard output of a of a process. Um
and this is actually The part that doesn't work on PCs which is why you need to use WSL if you're on a PC. So that's that's the extent of writing an MCP server. You could equally just decorate the the code itself in in the accounts package, but I think it's more common to build something like this. So this is our accounts server. So now we can go back to our our notebook again and take a look at this code. So this looks identical to the kind of code that I had when we were using real
Servers by third parties. We use one to to use Microsoft's playright and we use one for a file browser. So now I'm saying that I want to use my own account server that we just wrote. And I'm going to use this MCP server studio uh uh context manager here. Pass in these parameters to my own server, launch that server and ask it to list its tools. So this is now it's too quick. That initiated that Python process. It ran it. It launched an MCP server. OpenAI Created an MCP client, connected to it, and it came
back and said that the tools that it offers are get balance, get holdings, buy shares, sell shares. And we've also got one here called change strategy which I added in for fun which is allowing an agent to decide if it wants to change the strategy the the investment strategy o of of the uh of of its of the investments it's making then it has the discretion to do so. It has the autonomy to do so. Okay. So these Are tools provided by our homemade MCP server that we just created in in just like 10 or
20 lines of code. So, we're now going to use that with an LLM. So, I'm going to to create a new agent. Me get more room here. Here we go. So, so the instructions, this is the system prompt. You are able to manage an account for a client and answer questions about the account. Okay. What's the question? My name is Ed. My account's under my name. What's my Balance and my holdings? So, that is the uh what we're going to do. All right. So this again is the same code as before to OpenAI agents SDK.
We are using our MCP server. We're we're equipping it with the ability to use our code that we just wrote and we're asking it this question and let's see if it can now launch a Python process, launch a server, connect to it, run the right tools, and tell me that I have I think I have Amazon stock and then one other that I bought that I Forget. But let's see. It's running. And there we go. Your current balance is Oh, it's nice. And for your holdings, 20 shares of Disney. I've got 20 shares of Disney
and now nine shares of Amazon. And so that just shows a homegrown MCP server that we just built uh and we just ran it and and it was super easy. So to the question that someone asked me before, uh, one of the things that makes it really easy is that we never had to build an MCP client because OpenAI Agents SDK does that for you as it happens. Um, but you can if you want. Uh, and and I did this because when I started out, OpenAI agents SDK didn't do this, but uh, I I wrote
a client and again it's actually really simple. Here is the code that does it. I'm not going to go through this because this isn't really required these days because OpenAI does it for you. But basically, you can write some code that uses some context managers that creates a session, Connects to the server, and then can either call list tools or can run different tools. Um, or in this case, we're going to use it to read a resource, which is another thing you can do with MCP. So, let's just try this out. Just make sure it
works. So we are going to take this this uh this function read accounts resource that is something I just wrote in my MCP client. It's going to connect to an MCP server and try and read this resource. Let's see What comes back. And sure enough uh this is uh it's come back with all sorts of details about my account. So that seems to work. Now I realize I went through this part very quickly and again the idea is get some intuition, get a flavor for what it means to build an MCP server and client and
come back look at this code and see how to do it yourself. Bearing in mind that you probably won't need to write your own MCP client. We're now going to go and have a lot of fun With MCP servers. We're going to go and try out a ton of different MCP servers just because it's so easy and because we want to equip our agents with lots of functionality that they can use. So, we're going to add three new powers to our agents. First of all, we're going to give them memory. Someone asked me about this.
Memory is of course a super important functionality that you want to be able to give agents. And we're going to to be able to give it a special kind Of of memory, a memory where it has a knowledge graph where it's able to remember people and relationships between them. We're going to give it the ability to search online to run a search. And I don't just mean look up a web page as we did with Bonafi Pi, but I mean run a search like we did in the first module with deep research. And then, and
this is the best one, we're going to give it access to live market data. We're going to use a service Called polygon.io, which is a super well-known financial market API. It has a free tier, uh, which gets you delayed market data, and it has like a paid tier, which gets you more more real time, and I've paid some money so that we could have some fun. Uh so we're going to get proper market data but this this exercise will all work with without uh paying for for for that live market data. So let's get started.
So the first new MCP server we're going to look at is A persistent knowledge graphbased memory. So it's super interesting. We want to be able to store uh entities, observations about them, and relationships between them. That sounds like some chunky functionality. That would be a lot of engineering. But in MCP land, you just simply set up some parameters and then you equip your agent with those parameters. And we do it. We find out what tools does it give us? It gives us a tool create entities, search Nodes, read graph, create relations, and delete entity and
relation. So, it's allowing our agent to set up these kinds of entities and relations between them and then recall them later. With that in mind, let's let's do some uh some work with this. So, my system prompt, you use your entity tools as persistent memory to store and recall information. My name's Ed. I'm running a workshop live about AI agents right now. And I'm co-presenter with the co-presenting with The legendary presenter John from the super data science podcast. So that is a bit of information and we're going to give that information to our agent that
is equipped with this memory tool. Off it goes. It says I've stored the information about you and John. All right. So now I'm going to ask this agent a question. My name's Ed. What do you know about me? Let's see what it says. I know your name is Ed. You're currently Running a workshop about AI agents. Additionally, you're co-presenting with John, a legendary presenter from the Super Data Science podcast. It doesn't know that it's the world's most listened to data science podcast. Uh, let's change this to my name's John. What do you know about me?
Oh, well, at least one thing had to go wrong. Uh, let's try giving it the Let's try debugging this. Let's see if we can Give it the information a second time. Let's see. Works 90% of the time. Okay. My name's John. What do you know about me? Have I got everything right here? Yes. Drum roll. There we go. Uh, here's what I know about you, John. Legend of Santa. data science and you're co-presented with Ed. And there we go. Very nice. And of course, this is a good Illustration. It's good that something goes wrong because
it shows you that these systems are not 100% reliable. Uh and it it uh answers the the question you had. Even with identical inputs, you can get different answers. Um and I've I've run this quite a few times and this is the first time it's not worked. So that's that's super interesting. And we can actually check out the trace. If you remember the traces from uh uh the first part, uh we can go in and have a look at This and we can look through this trace and see that it did a search nodes and
see the responses that it got back from that memory. And it would be interesting to debug and see what happened the time when it didn't get the right answer and see if we can figure out why that is and how we could then improve the prompts so that that would be less likely to happen in the future. Okay, moving on. Our second MCP server is a Brave search. So Brave is a company that offers an API Service to run like a sort of alternative to a Google search. They'll run it for you online. Um and
uh I' I've set up I had to set up an API key. They give you 2,000 free searches, but beyond that you have to to to pay like a half a cent per search. Um so uh if we just look at what tools we get from Brave. Let's have a look. They give two tools. One is called Brave web search and one is called Brave local search which looks for local businesses. And so now we're going to say you're able to search the web for information and briefly summarize the takeaways. And I'm going to ask
for the latest news on Amazon stock price. I also wanted to mention this little trick I'm doing here which is a very common trick which is in the prompt passing in the current uh date and time. That's just a just a good technique. Anytime they're asking for something that's that's that has time involved. It's worth specifying it. An Alternative is to actually create a tool which can look up the date and time. But that's more work than you need to do because you might as well just always provide the current date and time in your
prompt and then just just give it to the LLM right away. So that's a nice nice trick to know. So we'll do that. And now we're going to ask to to know about uh uh takeaways about Amazon stock price. Here we go. A lot of stuff. the brief Summary of the latest news regarding Amazon's stock price. Let's be bold and try and change this. Let's look for something different. The latest uh uh an update on the Super Data Science podcast. Make up for its lack of memory earlier. Please research the Super Data Science podcast. and
briefly summarize it. And we'll run that. And again, this is now connecting using Braves MCP server to connect to the Brave web search API. It's asking about the Super Data Science podcast. And here it is. Super data science hosted by Dr. John Cron and lots of stuff about it. uh light-hearted interviews with renowned guests. Uh uh now we're going to our uh third MCP server. And this is an example of how easy it is. You can just go to somewhere like mcp.so And search for MCP servers. And there's so many. There's actually there were more
than 20 that offer financial market data, but I already know Polygon. That's a that's a very uh um reputable provider of of market data. And so it made a lot of sense to use their official MCP server. And I've got instructions here for you to use it yourself. And it's free if you go with the free plan. There's also a paid plan that I'm using here. And this is the command that Basically takes Polygon's official GitHub version of this MCP server and runs it locally. So, even though we're looking at market data, it's running, the
MCP server is going to be running on my box, as John explained, which is by far the most common way to run MCP servers. And look at all of these tools. So, it provides a ton of tools. It includes tools to get the the most recent trade. It can get crypto trades uh if you if you want to go in that Direction. Uh it can get uh details about tickers. It can get technical analysis. it can get I believe in here somewhere is uh uh financial reports and so there's a lot of stuff in there.
So let's let's use these tools. It's again just as simple as one line of code to to add that to our agent. Uh and so let's say uh what's the share price of of Apple and give that to an agent that's equipped with the Polygon MCP server. And it's taking a while. Let it do its thing. Be interesting to see in the trace. But here we go. We've got we've got the results. The current share price is is $212 apparently. And here's more information that it's got about today's change, opening price, high and low, and
volume. So, it probably did a fair bit of looking up to to to gather the different bits of information. So, we've looked at a bunch of different MCP tools, and we're about to give all of That functionality to be able to do any of those things, web searches, looking up market data, using uh different uh uh uh web browser techniques. And we're going to give it all to some agents. And let's see in total how many we have. So, this is now gathering all those different MCP servers, listing all of the tools, and adding them
all up. So, in total, we've got six MCP servers, and that covers 44 different tools. And we're now going to build our application That's going to use all 44 of them. So, this is where the Autonomous Traders project comes to life. And what I'm really going to be doing here is showing you the Python code that backs this up, showing you how it makes decisions and also showing you how you build a sort of well ststructured agent project. Um, and then we'll be testing it out, giving it a shot. So, uh, it's going to use
all of the MCP, uh, servers that we've talked about. And you may have, when you're Doing the maths, you might have noticed that I mentioned six MCP servers, but I'd actually only showed you five of them. And that's because I added in one extra one, which is the return of the push notification tool because I love it when this this goes ping and that's just too cool. We have to end with that. So I have to now turn my phone off silent again uh because we're hoping that our agents are going to be able to
send a a push notification when they make trades. Uh which is going to be fun. I want to start by taking you to a Python module called traders.py. And obviously all of this is in the repo for you to look at. Um this module is the module which which creates the two agents that we're using the two different types of agent. One of them is a research agent that goes and analyzes and works on doing market analysis and research. And one of them is a trader agent that goes and uh actually uh makes decisions, trading
Decisions. And at the end of at the end of it, it there's there's just uh there's something that creates each agent, passing in all of the MCP servers, and there's something that runs each agent. But otherwise, this is pretty short and boilerplate. The only thing that's new here is that I've also got some code that allows me to not only use OpenAI's models, but also use DeepSeek, Grock, and Gemini. And that answers a question someone had for me, Which is does OpenAI agents SDK only allow you to use OpenAI's models? And the answer is not
at all. But they absolutely allow you to use different models. And this code here will show you exactly how to do it. There's also a guide in the guides folder that can show you how you can hook up a model of your choosing to the OpenAI agents SDK, including local models running with Olama running on your computer. So that's in this uh code. Now, one thing You might notice is that there's there's no prompt details here. There's no text. And that's because I've sort of taken a leaf out of Cruz's book and I've separated out
my text into a separate Python module, which is just a good practice for for for uh agent engineering. So if I just uh switch over see that, sorry, switch over to another class here. There's one called templates, another module. And basically any of my code that produces prompts, Whether it's the instructions or whether it's just the prompt itself, I've separated it out into one uh function here. And this is a good practice to follow. It's it's got a sort of a nod to the way that crew has you separate things out into YAML, but even
if you're not using crew, you can still follow this practice and design your agent systems so that your prompt templates are in one place. Um, and that allows you to change them in one place and to And to to manage them and look at them like that. Now, the other thing that's here as well ties to a question that came up in module one, which is this module traces.py. And this is a module which is where I am plugging in to some functionality that OpenAI agents SDK has where you can write your own tracer. If
you don't want it to be tracing in their screens, but you would rather capture that information from traces and be able to Store it in a database or in a file system or in log files, then you can write a custom tracer and you can register it. And that way your tracer will be used either instead of or in addition to OpenAI's tracers. And that is what I've done here. And I'm I'm not going to go through this in detail because this is probably something only a few people would want to do. But if if
you'd like to, it's good to look at this. and we will use this because I Want to surface it onto the user user interface onto the UI because I want us all to be able to see what the agents are thinking while they're thinking it. And this is a cool way to do that. That is the those are the the the main classes uh that we have for our agents. And there's just a couple more before I give you the reveal and we see the application, we bring it up and I have my moment of
truth when this thing either works or or uh or doesn't or You're already in that 20%. uh so uh yeah keep fingers crossed. So first of all uh I want to explain that in our uh application we are going to create four instances of our trader um trader agent so that we have four traders and I have named them Warren George Ray and Kathy and those names are by some people are smiling and nodding recognize them as they are a nod to their role models uh Warren the recently retired Warren Buffett George for George Soros,
Ray for Ray Dalio, and Kathy for Kathy Wood, uh, crypto and and tech investing at Arc Investments. So, these are four luminaries of the trading world. And I haven't just used their names. I've I've taken more than that. In this Python module reset, which you run the first time, I set their trading strategies and I asked them to to to try and and pay homage to their role model, to their namesake. And so Warren is a Valueoriented investor, prioritizes long-term wealth creation. George Soros is aggressive and bold and macro trader who actively seeks market mispricings.
Uh Rey is systematic. And uh Kathy is of course crypto and new tech focused. So these are the framings of our four traders. But I do also give them a tool which allows them to change this strategy over time because I wanted them to be truly autonomous. They start uh in this way but they have the ability to Change what they do over time. I also I I want to show you just two more Python modules and then we get to run this. uh MCP params is one module where I have separated out all of
the MCP all of the six servers and 44 tools that we use. So you can look through this and you can add to this. So one of the great to-dos for all of you when you have this running is you can add in equip your agents with more allow them to have more powers so that they can be better Traders and you simply do it in this module MCP params. And then last but not least is this module trading floor. And this is where it actually uh runs our trading floor. There's a few constants. Should
it run every how how many minutes should it should it go trading for? And right now it's uh it's set to uh run every 60 minutes, but I'm going to change this to every minute. We'll make sure that it's going to be aggressively trading while We watch. uh uh run even when the market is closed is is uh set in the environment variables but but by default I have it so that it will stop trading when the market's closed so that it's uh not trading all the way through the night. I forgot to do that
once and uh went to sleep and I forgot that it was still messaging my phone and I was annoyed. I was annoyed when some midnight trading activity happened. So yeah, watch out for that. Uh, and then you can choose whether you're using many LLMs or just using GPT4. And um, and that's really it. And then and then this this is the the the loop. This is the loop where it simply runs the traders. Okay. So, is everyone ready? We're now going to give this a try. So, I'm going to start by launching the user interface
so we can see how it looks. And then when The user interface is up, I'm going to kick off the trading floor and let them do some trading. and we'll see how they perform. So, we're going to uh bring up a terminal, go into here, uvun app.py. So, this is going to be hopefully a user interface for our traders. And they've been running for about a week now. I I set them off a week ago, so we'll see how they're doing right Now. Hold on a second. Let me make this a bit bigger for you
and zoom out. So, this is going to be a bit hard for people at the back to see, but but you can run it on your own computers. And let me tell you what we are seeing here. So, what we're seeing is our four traders are represented by these four columns. Warren, George, Ray, and Kathy. Warren is powered by GPT41 Mini. George by Deepseek V3. Ray by Gemini 25 Flash, the very latest but Cheaper version of Gemini. And Kathy, since since Kathy is is is cryptofocused, high-tech, I thought that that Elon Musk's uh Gro 3
was the right right choice for Kathy. So, that's who powers Kathy. And what's kind of cool is that these have been running for a week and as it stands right now, all four of them have made pretty healthy profits. Uh you can see uh Kathy is in the lead uh with uh 11,000 uh sorry with with 1,000 they were given $10,000 to start With. I should have explained they were all given $10,000 to start with and Kathy has made $1,300 of profit. Now, it's worth having a a caveat that the markets have been quite good
to us in the last week. Uh, and so they they've had a bit of a, as you can see for the way that this has gone up. And you can see they've had their bad days as well as their good days. Um, but they have been running. And now that you've seen this, what I'm Going to do now is actually kick them off and see what happens. So, I'm going to create a new terminal window. Go into the third into trading floor and I'm going to say UV run trading floor.py. Kick that off. Let it
do its thing. Oh, the market is closed. All right, stop that. We're going to have to tell it that it can trade even when the market says closed. After hours. We could do that because um uh hang on. Uh we can go to trading floor. Here we go. Run even when market is closed. We're going to make that true. And now try this again. Off it goes. Let's go back here and watch what happens. And they're off. Things are happening. The red that you see there, I know you can't see what it's actually saying, but
I can, is when it's actually calling the MCP code, and that's actually doing some looking up of our accounts. And then the uh each of the different colors represents it doing a different action that you'll be able to look at. Uh the uh yellow is is when it's actually generating. Uh the green is when it's running a web search. The light blue cyan color is running something. So all four of them are considering their options and deciding whether to take trades. And you can see That there are holdings down below. And actually, if we scroll
down, there's also their their transaction history at the very bottom. So, we're going to let this run for a bit while they analyze the market. They'll be looking up, using their different tools, all 44 of them, to research the market to look up stock prices and to make some decisions. What we're hoping to see is that at the end of this, they will make some trades. While they're doing this, we can also have a look in the uh OpenAI traces. You can see here that we've got four traces. Warren trading, Ray trading, George trading, and
Kathy trading. Did you hear that? That was them just making a trade. Uh someone just did a trade. We'll go and have a look. You can see if you look in here, each of the different traces you can have a look at will show you all of the different trading going on as They list their tools, as they find out what's going on. I'm going to go back to the uh to here. Uh let's see. Well, I'll read my my push notification to tell you who did what. Oh, there's another one. Uh okay, so someone
just I see. So, uh, someone sold five shares of AMD and they're still going on. All right. So, while that's still running and while more trading is happening and we can watch over time what happens to their chart, I want to Remind you of the big to-do that is now sitting with all of you. This is really cool, but it's only cool if you see it happening yourself and if you're part of it and understanding what's going on. And of course, the trades are not actually executed. It's all just happening in our simulation. So, it's
great fun to watch. So, the to-do for all of you now is to to in your own time come back through this and just run those two commands to kick this off and Let let this run and see what happens and uh see if you can you can replicate this kind of success uh and see if it actually is makes some money. uh and uh at the same time you should try and add some more tools yourself to try and add in more and see what more functionality you can give it and whether you can
do more and if you're able to add in some great tools then by all means submit a PR to me to add to the repo because I would love to include your tools in There too people from this we can add in your tools and together we can build something that has more and more functionality so please do do that and do send in updates with more functionality you give it. And so well, thank you. [Applause] And so with that, I'm going to leave this up here trading for a little bit longer uh while I
hand back to John to uh to wrap it up. Brilliant. Wasn't that Great? You'll still have Ed up again uh because we're going to have questions at the end. Now, I've been standing up here for five and a half hours. telling you that there were three modules and we just finished the third module. Isn't it time for me to move over? Well, there's a secret fourth module which and some people have been asking me over lunch. I got lots of questions about this. So, I know it's something that's on people's minds. And so, the secret
fourth module is about the implications of all of this, looking forwards at where we're going. So remember I had this slide up that it's never been a better time to develop AI agents and I talked about all the reasons why it's never been a better time and I also you know after you've built your own after you have your own trader running and you're comfortable with agents and Adding in your own tools hopefully you'll get started on thinking of ways that you can be transforming your life personally you can be transforming enterprises government organizations charities
uh making socially impactful uh use of this tremendous power that we all have access to here. You can be making, you know, uh a return on investment for your businesses in AI. You can be seriously Changing the world uh with this kind of technology that we already have today. And every six months, we're going to leap forward in terms of what the LLMs that power these agentic frameworks can do. And on top of that, we'll come up with easier and easier abstractions just like uh Ed pointed out how he had to be making the MCP
client code himself. Um and then as of a month ago, that changed. So these abstractions will continue and continue and continue Giving you as people, you know, listening to this kind of hands-on 4-hour workshop, the power, more and more and more power concentrates inside of you to be making a big impact in the world. Now there will be uh workforce changes as a result of that. So the good news is that at least for the few coming years more jobs will be created by AI than lost. So every wave of automation that has happened in
human history has Displaced people but it has created more jobs than it lost. Some people think that this time is different because all of the previous waves of automation were uh basically removing the need for physical labor and now cognitive labor suddenly looks the most vulnerable and so maybe that will create something different but probably it won't. We'll probably figure out things to do and in the meantime work will probably continue to get more and more pleasant like it Has for centuries. So two centuries ago, 99% of people were subsistence farming and that was a
lot of work. It was a lot of stress. It didn't always work out. One in every two children would die before the age of five. Uh and thanks to automation, thanks to technology, thanks to laws, thanks to democracy, we have you know this great wealth today that we have the option to instead of be tilling the fields. We can spend a few days sitting at a data science conference Learning about AI agents that can automatically do even more work. And so uh you know and all the while you know you get to sit in comfortable
air conditioning and you can check your email when you want to. You can phone your family when you want to. It's all getting nicer and nicer and nicer. So, I think that that trend will continue. It will continue to get nicer and nicer, but funding for reskilling programs is key Because the pace of change is going to go faster and faster. More and more kinds of specific tasks and if enough specific tasks in a specific role, then that whole role will be replaced. So funding for reskilling programs is key to success. Looking further down the
down the line, maybe we'll we'll need things like a universal basic income or something like that. But uh that's still some years away. Right now we stand at basically historic um unemployment lows In most western economies. So I think that you know that's largely good news when you have the chance to press your local elected official for a reskilling program. Great. But now what about you? You raised your hands at the beginning of the session. Most of you are data scientists and if machine I when I showed you a chart right at the beginning of
how every seven months the the uh cognitive length of a task that can be automatically completed. I told You at that time that that was actually based on the kinds of tasks exactly like running machine learning models generating code which is what a lot of us do in this room. And so you might be particularly thinking what can I do for my career given that this kind of technology exists and is going to get better and better to futureproof my own career. And so the first thing of course is to listen to a great podcast
uh and stay on top of AI trends. That's Something easy and maybe fun you can do especially if you find a light-hearted program that makes you giggle every once in a while. And another great thing is, as you saw today, focusing on multi-agent orchestration and management. So, it's the kind of skills that actually we're taught today. Also, Ed has even more of in his 17our Udemy course. This if you can figure out how to be getting teams of agents to work for you or your organization, that's Going to be an invaluable skill as well. This
sounds kind of contrary to the previous two points. So on the one hand we're moving faster and faster and so there's more and more abstractions. It seems like it's more and more valuable to be kind of at the cutting edge focusing on these abstractions agentic frameworks MCP. But at the same time don't underestimate the power of foundational subjects. So here I'm talking about things like linear Algebra, calculus, probability, statistics, algorithms and data structures, optimization. Those kinds of subject areas I have seen time and again. I've been blown away by some of the most brilliant people
that Ed and I have had the pleasure to work with. People like Vince Patio, Sean Kosla. These people were masters at taking foundational knowledge that they had and realizing that they could use fundamental linear algebra to Dramatically by orders of magnitude increase the compute time and decrease the compute cost of some AI functionality in production. Once you start pushing at the cutting edge, you these the you can be digging down into the weeds of how these things work and be making huge performance optimizations and just be able to have more flexibility and literally do magic
as a result of uh understanding foundational subjects. I do happen to Have uh uh content online on all of those subjects u which will be easy to find. I'm not going to plug it directly. A lot of it is on my YouTube channel. Develop domain expertise. So this is specific to whatever you do. You work at an insurance company, you work at a healthcare provider. Understanding that business and its needs, understanding that domain expertise. So not just having your deep expertise in data science, but broad exposure across your Domain. That is going to be invaluable
as a data scientist as well. Developing human AI collaboration skills. So this is related to the multi- aent orchestration in a way, but it also it it doesn't need to be multi- aent. When I what I mean here with human AI collaboration skills, we're also talking about things like using LLMs wherever you can to be having a conversation to providing it to provide it with lots of context on problems that you're solving. You can be accelerating your own abilities uh you know code generation of course you should be taking advantage of tools like um like
cursor which Ed uh demoed today. And I think this is my final bullet so I'm going to uh trepidaciously say finally. Um, also, uh, honing your communication and influence skills within an organization is key. So being able to not only understand the tech and your domain expertise and have this ability to come Up with worldchanging, gamechanging, enterprise revolutionizing agentic ideas, but also being able to convince people that they should invest in that idea, time, resources, capital um so that you can make that and like I said near the beginning of this lecture, start small, get
that little bit of success, be able to show return on investment in a relatively small AI project and then leverage that success to be tackling grander and grander Projects and so you have this flywheel moving within your organization that allows you to make a bigger and bigger impact. And if we can do that together, if all of us in this room, maybe the people watching this on YouTube in the future, hello. Um, if all of us spend time on this magic that we are able to at any other time in history, if you could see
that us sitting here in this room or us watching this video had the power that we have today, that is That would be magic. It's unbelievable. Um, before the GPT models came out, I didn't think it was a given that we would have the kinds of capabilities of like a GPT 3.5 maybe in our lifetimes. And now all of a sudden, this magic is with us. If you can harness that and make an impact with it, we can in our lifetimes be living in a world that is uh so the best term is like a
protopia where you're the idea of utopia. Thomas Moore, a British guy from centuries ago, he coined the term utopia. It actually means not place. It is an impossible place. But Protopia is this idea of continuous iteration and improvement and AI agents play a big part in that. The first thing that we need to crack is abundant energy. So if we have the potential for limitless intelligence, you combine that with unlimited energy effectively then that is a huge you know Multiplicative effect. abundant super cheap super intelligence multiplied by abundant cheap energy that is a formula for
crazy positive change in our world and AI can play a role in that energy transition making solar panels more efficient making battery storage more efficient uh allowing the liquid plasma to be contained by computational systems within a commercialgrade nuclear fusion reactor as opposed to the fision that we have today you know having a bunch of Suns on our planet providing abundant energy. That changes everything. And once we have that, then we can have high quality nutrition for everyone on the planet. Um, we can have extended lifespans, uh, potentially dramatically extending our lifespans. Something there's a
lot of people out there who think that aging is a disease like any other. Not every living animal ages. We don't necessarily have to resign ourselves to that fate. high quality Education in the local language for everyone who wants to learn it uh and learn anything they want. Freedom from violence, freedom of expression, sustainability, cultural preservation, a sense of exploration for individuals, um and a sense of community. All of these things are possible in our lifetimes. This kind of protopia protopia that I'm envisioning and it can begin with, you know, the decisions you make today.
the Way that you harness the power, the magic that you have. So, thank you. That's the end. [Applause] [Music] [Applause] Yeah. So, thank you uh thank you to Ed. I'm going to have kind of my my contact stuff here and then Ed can go through his as well. We'll stay on for questions until 4:30. Um you know, and pass the mic around to get those as well. Um, so My name is John uh at johncone.com. You can sign up for my email newsletter. Pretty much all the content I create is free and so you can
stay up to date on my latest with that. Connect with me on LinkedIn. I will I'll I might be able Ed accepts everyone. I don't accept everyone. Uh but if you mention that you were in this training today or we met at ODSC East or whatever, just mention ODSC in some way in your connect request and I will definitely uh accept that. My YouTube channel will be putting this video up from today. We have professional uh we have a professional film crew here. So, thank Lucy uh who's our director of photography. Thank you, Lucy. And
Amanda's our second camera. Uh and it seems like everything went super smoothly. So, this is going to be a really nice uh edited uh thing. I'll put it up on my YouTube. There's also stuff on linear algebra and calculus on my channel if you're interested in that. Why Carrot? My company um we will be today. I'm not actually doing hiring, but it would be nice to start pipelining. We specialize in things like generative AI, agentic solutions for enterprises. And so do feel free uh to you know you can you can send me on LinkedIn a
resume or johnycare.com send me a resume or talk to me here at the drinks afterward. And of course, uh, if you're actually looking for some consulting all the way from strategy all The way through to implementation, we also have representatives here from a partner of ours, Biz Love, uh, who are experts, long-standing experts at, uh, managing organizational chive technologies like AI. Um, and so yeah, we can do basically anything for your business related to AI. Super data science podcast. Ed already mentioned it many times kindly. Uh, it's the world's most listened to data science podcast.
We do two episodes every week and uh Have a lot of fun making that. Uh so check out the podcast and then I have uh I work as an ML practice fellow at Lightning AI which means that you here sitting in the audience you get a bonus for free just from sitting all the way to the end. If you take a photo of that QR code, it gets you uh it allows you to skip the waitness for skip the weight list for Lightning AI Studios, which provides access to the largest amount of compute that you
could possibly need for Training any kind of AI model um available immediately. And you get 15 free compute credits per month uh to that Lightning AI Studios platform. And if you choose to, you don't need to pay for a paid version to get access to almost all all the technology. But if you want more credits, uh some more bells and whistles, there's a pro and with this QR code, you get 20% off your first month of Pro. So, uh those are all of my points. I'll hand over to Ed. Thank you. Once again, a tough
act to follow. I do have to say that I've been getting uh constant my my phone keeps buzzing with trade decisions that are made constantly. It's been going for the last few minutes. Uh so I'm I'm Ed. Uh I I I have this course on Udemy in case you might not have mentioned it. Uh this is my LinkedIn and yes I I will immediately accept. I'll I'll be taking the train home in a bit and I will uh follow up With anyone that that connects with me. That would be fabulous. Uh I also have a
website where I I have uh make a few few posts and have some YouTube stuff. And this is a link to the Udemy course and it has a coupon in that link uh just just exclusively for for people here. Uh and so it's also in GitHub as the link there. It takes some of what John and I went through and it expands on it and it also covers Langraph and Autogen as well and it goes through that same uh project At the end but in a bit more detail and uh with with some more updates
to it as well. Uh and I just want to again uh echo John thank you all so much for staying till the bitter end. This was this was terrific fun and it was it was really really enjoyable session. So so thank you for spending the basically the whole day with us. Uh, and we'll be around if you've got questions. [Applause]