Name: Why the Best AI Agents Are Built Without Frameworks (Primitives over Frameworks) — Ahmad Awais, CHAI
Duration: 27 min 8 s
Channel: AI Engineer
Description: Well, hello there. I am Ahmed and I'm going to vibe code an AI agent built with AI primitives while I deliver this talk, right? So, let's start here. One of the most common AI agents that we've see...

Well, hello there. I am Ahmed and I'm going to vibe code an AI agent built with AI primitives while I deliver this talk, right? So, let's start here.

One of the most common AI agents that we've seen in production is there's some data and there's a chatbot as an agent and you're trying to chat with that data because well, all of these LLMs are self attention algorithms anyway, right? So you go to chai new and you say something like chat with PDF and let's see what happens. Chai is now going to vibe code an AI agent for you.

It's going to build it on top of AI primitives instead of an AI framework. And as that is happening it just takes like a minute. How about I tell you a little bit of more about who I am and what are we talking about today.

Right. So uh look at perplexity cursor v 0ero uh lovable bold and even now chai you will see one common uh theme across all these production ready agents that are you know building millions of agents millions of gener you know millions of people are using them and there's this one this one really good trend that you can pick up that all these AI agents in production are actually not built on top of any AI frameworks because Well, frameworks do not really add that much value. They're bloated.

They move super slowly and they they're filled with these abstraction that nobody really needs. Instead, you should be building on top of AI primitives. This is what my entire talk uh is about today.

I am Ahmed. I've been around the blog for quite a while. If you use WordPress, NexJS, NodeJS, React, you have probably using my code because I've contributed to all of these software.

I have also built uh hundreds of open source packages mostly automation CLIs with NodeJS and created a shades of purple code theme all of which are downloaded like 40 50 million times uh you know a year and I've gone up to you know as uh as technical as you can imagine I've contributed to NASA helicopter mission and in the past I have been a VP of developer tools VP of engineering Google developers advisory board member all I'm trying to say is I am deeply technical I've gone through this phase of working with in building frameworks and now why do you think I am talking about primitives I think primitives have this native ability of working really really well in production like Amazon S3 is a a really good uh example here Amazon S3 is a primitive where you can upload data and download data and they scale it uh massively they're not building a framework for object storage it's a very simple thing it's a lowle level primitive you can use to build lots of things right and that is what we are uh talking about here today my journey in LLM actually started with in 2020 when Greg Brockman himself gave me access to GP3 GP3 was just like what uh maybe a month old model and I I had already started building uh you know something like GitHub copilot which uh launched in 2021 a year later right even now you know we've been building it uh things and agents for like I don't know 5 years building and deploying and scaling AI agent remains to be the biggest pain there is and I think everybody has a different definition of AI agents but this is my take at it I think AI agents are just a new way of writing code everything we know of how we used to build code how you used to build coding projects uh or SAS all of that is changing because of AI and because of agents and it's it's just big enough. It's not it's it's just a new way to write code. It's big enough.

You know, if you try to put it inside of a framework, that abstraction might not be enough. Instead, how about we build small building blocks that are useful across a stack like something like threads? You know, every agent needs to store some sort of context or history of conversation.

So, threads would be an awesome primitive and we build that, right? So, why not build that? Why I think that is important is because here's my belief.

I think most of engineers are going to become AI engineers. This you can already see with fullstack AI engineers, web developers and front end developers, they're quickly transitioning into this AI engineering role because they are shipping a lot of product with AI and they are building stuff with uh different LLMs or whatnot with vector stores and this and that. Even DevOps engineers and ML engineers are now shipping product.

So when everyone is building as an AI engineer, what we are trying to do uh here at Langbase is improve their experience. We want to become the fastest possible way for you to build a production ready AI agent. A lot of people I think start with this painful way of building AI agents where they pick up a framework which is filled with uh you know obscure abstractions which are really really hard to debug and then they have to figure out how to deploy and scale those agents.

We think the other way. I think if you are building on top of predefined really good highly scalable AI primitives especially composible primitives that come with a piece of cloud in it like uh you know memory memory is an AI primitive which is like you know like memory which has a vector store in it you can throw in I don't know terabytes of data inside of memory and it will automatically scale so if you build an AI agent with that memory or with that parsing chunking or threads or tools infrastructure what you get is a serverless AI agent that automatically can do the heavy lifting for you. But what does that actually look like?

Let's take a look at what is happening here today in this talk. I'm going to share and by the way like I'm going to probably you know already done. I'm going to deploy this before you know we move forward.

I am going to go over eight different AI agent architectures that are built purely with AI primitives instead of a framework. Look like look at this you know I said chat with PDF and it figured out that well you need a memory for storing this PDF which will have a vector store and you need uh an AI agent an LLM uh which will be used to ask questions from this PDF right so it went ahead and did this you know we need it created a memory created uh a way for you to generate answers an agent on top of that memory right this uh the flow of this very simple I want to retrieve a p some PDF content and then generate the answer right and you know it's already done as you can see these lines the first step is to build that memory and we are using a primitive here langbase. mmemories this primitive is where we will put the user uh input the you know the question that user is trying to ask is the name of that memory the PDF uh document memory uh if you go here you will find that memory in This all of this can be built uh with API as well.

What I'm going to do here is I'm going to go to I don't know uh mls. com about and ms. com talks.

Uh this page has u information about me. This one has information about my talks like the one I'm giving right now. And I'm going to throw all of these as PDF files in it.

So I have already saved them on my desktop. So I'm going to just quickly grab them and upload here. I've also I'm going also going to do this API key lang base.

So I'm also going to throw a PDF version of this particular file into the memory. As you can see uh this is the doc about Langbase uh how to get API keys from Langbase u my talks and my about page. All of these files are right now being processed which means uh a parser primitive is converting these files uh from PDF to text and then a chunker primitive is going to chunk these into small pieces of context which are going to be used in a similarity search.

If I refresh these you will see that all of these files are now ready. Right now now let's go to this agent code and see what is there. All right.

So we are going to ask a question from this memory, right? Which will return back some memories. And these memories are in the second step.

As you can see here, this is what we are going to use as context, right? This is generate answer step. And in this step, we are basically going to create an AI agent that is going to use that context to answer the questions you have.

Right? Very simple. Uh it has also uh Chai has also you know built an agent app for you which makes it really easy to use but you can also go ahead and use you know uh an API with whatever programming language you prefer.

Let's try the agent app. Let's ask it something some uh very simple. So who's the founder and his last three talks?

There you go. Right now this information is in two different files in our memory which was parsed chunked and embedded automatically using those primitives and you can see the answer is already here. I'm the founder here are you know the last three talks I did uh and how do I get an API key?

This should probably also give me uh an answer on how to get that API key from that other file that we had. um added in there. There you go.

How how you get an API key. There's a bug. You can actually go to the app mode and vibe code a fix for this bug and this app uh will be fixed.

You can also, you know, make it public or whatnot. What's happening behind the scene here is most interesting. You know, uh I think every agent needs all these primitives that we are building, right?

Primitives are like you know memory which is an autonomous drag engine a workflow engine that is built purpose built for multi-steps agent threads where you store and manage this context and conversation a parser to extract the context and a chunker to split this right and using all these primitives you can build almost any AI agent we actually did a lot of research on stateofiag aents. com where you can read a lot about how people are building uh agents what type of primitives they are using and what type of uh LLM is required in doing really well in which type of industry right let's now go over different AI primitives and different agent architectures that use different AI primitives so the first most common architecture that we see is an augmented LLM augmented LLM is basically an agent it's an agent that is going to get an uh you some input and it's going to generate some output which will have an LLM. It will have some way to automatically call tools so it can connect to MCPS or call different APIs.

It will have access to threads. Thread is another AI primitive which will which is where you can store the conversation users are having with this agent or the context of this agent uh you know which might be you know completely completely asynchronous right uh thread uh of context a scratch pad like for example when you're booking a flight there's certain set of information that you kind of keep in your header on a scratch pad like I'm going to land here I'm going to go do this and that and that is useful ful when you are you know booking that particular flight. So you can use thread to store that uh you know memory like context and then there's memory the long-term memory of you know different events or whatnot.

This could be terabytes of data that you want to search when you're using this agent right uh which obviously requires more context and requires you know vector store. So in this example, in this augmented LLM, we see tools as primitives, threads as a primitive, and memory as a primitive. And you can basically build almost any type of AI agent with this augmented LLM architecture.

As you can see here, lang pipes or agents provide you with that primitive. Now let's uh look at what type of agents and what type of architectures can you actually build uh using that augmented LLM that I just showed you. Here's another one.

So prompt chaining and composition where you use multiple agents as you can see the purple blocks here working together. You get an input and agent creates an output and based on that output you basically decide if you want to go forward, right? Maybe you got an email uh and you figured out if that was spam or not.

And if it was not spam, then you used another agent to write a draft email in response. And you can uh take a look at, you know, what's happening here. There's a summary agent, there's a features agent, there's a marketing copy agent.

This code is plain JavaScript or TypeScript if you will, right? You can see here um a summary agent, uh this is a feature agent, and this is a marketing copy agent. they're all going to work together to generate an output that you need based on an input.

A more interesting one is uh something like agent router, right? Where an agent uh or an LLM router that you build basically decides which other agent is needed to be called next, right? Let's see what this is what this architecture looks like in production.

None of this uh we are going to build all of this with AI parameters. None of this is built with a framework. We're going to basically create three specialized agents for different tasks.

One is a summary agent for summarizing text. The other one is a reasoning agent for analyzing and explanations and then a coding agent for obvious reasons you know when you need to write code and all these are built with different LLM. So summary is from Gemini uh reasoning is with deepse uh llama 70B and coding is with clots on it.

Right now let's look at this code. This code right here is only going to use AI primitives and you will build your own uh AI framework on top of it instead of using a botted AI framework. Right?

So as you can see here's a routing agent. Router is being told what the job is. It has access to all these agents and it is supposed to respond back with valid JSON and picking which of these agents is going to do the job for us.

Right? And here is the documentation of a very simple set of agents. A summary agent uh uh a summary agent, a reasoning agent uh right here and a coding agent.

Right? Now all of these are going to run together. All of this as you can see is just plain old code.

There's nothing new here. you can easily write this. This there's no magic.

There's nothing to learn. It's basically you're just calling these agents together and one of these agents are going to respond back with which next agent you need to run uh you know for the task. So the task is very simple.

Uh y days are shorter in winter. Now let's go ahead and run this code. So the main agent the router the scene agent has decided that it needs to run the reasoning or you need to run the reasoning agent uh to for this particular task.

Right? Obviously you're not writing code here. You're not uh writing summary.

You can see how the the decision maker made the right choice and then you pick up the same uh uh reasoning agent that he had built based on this answer and then throw that input in there and here's the you know answer of why answer of why days are actually shorter in winter. Don't get distracted by this answer by the way. Right?

So another very common architecture for building agents we see is running uh agents in peril. This is absolutely simple to use. Uh there's no abstraction needed for JavaScript.

Uh in JavaScript, you can basically build a set of agents as you can see here. Here's a sentiment summary in the scene maker agent and you can promise do all of these agents and they will run in parallel. My favorite however is this agent orchestrator worker where an agent basically decides to orchestrate and build any number of worker agents that are going to solve a problem which is then synthesized by another agent.

Right? This is exactly what uh a deep research agent architecture looks like. So this one is really good.

What we are going to do in this is we are going to create an orchestrator agent that will plan and create subtasks for worker agents and these worker agents are going to work on those subtasks. Right? Let's see how that look what that looks like.

So this is the orchestrator and the entire thing it is going to do is give us back this type of response with subtasks which are going to be uh things that you know these worker agents are going to do. It's a very simple worker agent. It's going to get a subtask and it will complete it right.

So let's look at the main input here. The input is write a blog post on benefits of remote work. I care about productivity, work life balance and environmental impact.

Let's see what happens here. So the orchestrator here is going to generate a bunch of subtasks. As you can see that is what it has done.

So write the introduction uh write a section on productivity on work life balance on uh environmental impact and then write a conclusion. So as you can see here there are like one two three four and five subtasks. So this is going to take five workers worker agents to complete.

Uh as you can see here, this is the first worker agent. Second, um third, fourth, and here's the fifth one that is writing the conclusion, right? And finally, we are going to synthesize all of this into one thing.

Uh look how simple this is. This is like what like 90 lines of code and there's no uh framework here. You're basically building very simple uh you know agents worker agents in an orchestrator agent and your promise.

all all running all of them uh and the data is just flowing through that right there's there's no need to build a complicated abstraction layer on top of this because maybe uh you know in a couple of months you'll see that agent this agentic workflow will get even better with LLMs understanding how to run these worker agents if you're using any AI framework in abstraction that you are you know kind of stuck with uh when these agents in with when these LLMs become much more better at these agendic workflows, you'll have a very hard time migrating off of it. So, it's much better to build on top of primitives instead of building on top of a pre-built abstraction, right? uh we'll probably also going to uh take a look at this evaluator optimizer where um an LLM is used an agent is used to generate a response uh generate something and this uh something this is uh probably I don't know let's say a marketing copy this is going to be evaluated by an LLM as a judge right which will either accept it or reject it with feedback I'm just quickly going to go and run this you know here's a generator which is generating ing uh you know um some product description.

Here's an evaluator which is going to either say accepted or provide with uh feedback based on the prompt you write here right and what we are doing is we are writing an eco-friendly uh description for an eco-friendly water bottle for conscious melanin. So um the first agent takes a stab at it. Um and the evaluator actually says no uh you are basically missing uh you know the point with this particular type of audience the eco-conscious millennials right and provides very specific feedback uh on what to do and how to improve this and this evaluator should be built by probably the best possible uh LLM you can think of for your space of what you are trying to evaluate if it is related to health you're probably calling a memory and you you know, you're doing all that you need to do to make sure this evaluation is really good, right?

And the second iteration, as you can see, uh is very well uh very uh is generally well done based on this particular feedback. Finally, uh obviously you can call tools with this. We can skip that.

Uh I think you can probably easily understand that. The most interesting one is memory. Memory is when you upload data, you create a memory agent.

You upload the data and then you retrieve that data and ask questions which is exactly what we saw here in this code where we created a memory. Let me go to the agent code where we created a memory uh uploaded the data manually which you can also do through an API as you can see here and then an agent was used to answer questions related to that data. This is also a very common pattern for building agents.

Now that you know all of these AI primitive patterns, you can build pretty much 80% of the most complicated AI agents out there. And I've done just that like uh I've basically asked Chai to build me a deep researcher like perplexity where it actually went ahead to uh build things to analyze the query uh do a web search consolidate the results and then uh create in a response. It is using exa here and all the code is here.

I can actually just probably go ahead and ask it something I don't know what's the latest with open AI something like this and it's probably going to go ahead u send this basic uh query and start doing that deep research. I've also built things like for example oh let me go back. I've also built things like uh I wanted a receipt checker which would do OCR and since chai or lang didn't have an OCR primitive I found one from Mistl so I asked it to use Mistl OCR and gave it an example of how to use it and that is what it exactly did here so it's processing the image with OCR and using GPD 4.

1 right now extracting uh from the user input whatever is needed and also using uh MR OCR latest model on top of right to do a double fall back and really improve the OCR um based on top of the image we are going to send it right let's see you know I have an example here I think this is Paulo Alto $15 delivery or something yeah there you go total paid 15 bucks and city of Paulo Alto it's a parking ticket I guess it's analyzing the result right now and there you go city of Polo Alto and 15 bucks. Right? Similarly, I built an agent where I like u let me add an image URL and let me chat with that image.

Right? So, I just added this uh image URL. Uh what is the expression of the person in this image?

Okay, let's give it a go and see, you know, what happens. There you go. uh eyebrow is raised uh quite skeptical and curious which is what I generally look like in real life right so uh in the code for this is pretty easy analyze the image using GP40 uh a vision capable model uh takes the image here uh takes the image URL here from here passes the input and that is pretty much it right it's a pretty simple flow as if you ask me so well this this is pretty much it and we've seen like you know it's May 30th and you can see almost every other minute there is a new agent being built.

I saw somebody building a dental cost advisor uh a business swamp pro uh I don't know a task refiner all sorts of amazing uh agents are being built with chai and chai is building all these on top of AI primitives instead of building it on top of a framework uh that would have some abstraction that you probably don't even need when building agents right so the idea is very simple all the production agents that we know of when they are not built on top of frameworks. What good a framework would do to you in a fastm moving space when every other week there is a new paradigm, a new LLM, a new problem being solved already that used to take a lot. Uh why not build your AI agents on top of really good AI primitives?

uh and you can either build those AI primitives or you can use the pre-built AI primitives that we built or some of our you know friends uh are building over at different uh companies u and also uh if you really enjoy uh you know w coding give chia uh you know uh a try. It will try to use the primitives it knows and it will also try to you make it really really easy for you to quickly build an agent deploy it and use it right away instead of building a silly demo that may or may not scale right. Uh I am MLS and I am always hanging out on Twitter.

I would love uh your feedback. I would love to know, you know, what you folks think and what you prompt, ship and ship with our AI primitives or not. Take care.

Ciao. Use your code for good. Peace.