Whether you're building an experimental prototype for your own personal use, or creating an application to power an entire organization, there are key components of the AI technology stack that you must get right to build AI systems that can do more than just generate answers but solve real, meaningful problems. Say, for instance, I'm building an AI-powered application to help drug discovery researchers understand and analyze the latest scientific papers in their domain. Maybe it starts with a model that I recently heard about that is supposed to be better at highly complex tasks like that of a PhD researcher.
Model is an important layer of the stack, but it's just one piece of the puzzle. There's also the infrastructure that that model will run on, because not all LLMs, large language models, can run on standard enterprise CPU-based servers, and not all are small enough to run on a laptop. So it matters what infrastructure you have access to and how you choose to deploy it.
Next is data because in this example, the whole point is to help scientists understand the latest papers in their field. And models typically have a knowledge cutoff date. So if we want to talk about papers from, say, the past three months, that means we have to provide the AI system with extra data.
That will be the data layer. Next would be the orchestration layer. Because to do a complex task like this probably is going to require more than simply providing a large prompt into the AI system and getting an output, a single output, out.
Instead, we'll want to break that user query up into different parts. Help um, plan how the AI solution is going to actually tackle this problem, what data it needs, and then do the summarization and creating an answer and maybe even review that answer. Finally is the application layer.
And this is because at the end of the day, there's a user using this tool. So there will have to be an interface that defines what the inputs will be and what the outputs will be. It might not be as simple as text in and text out.
And there's also the issue of integrations. So, will the actual results of this be something that's integrated into other tools that this user uses? It's important to understand all the layers of the AI stack, whether you're building a solution from scratch or using solutions which might manage several of these layers for you as a service.
This is because across the stack, from the hardware, all the way up to the user interface level, the choices you make will have important implications on your solution's quality, itsspeed, its cost and its safety. When it comes to infrastructure, LLMs generally require AI-specific hardware, specifically GPUs, and these can be deployed in one of three ways. The first would be on premise, that is, assuming you have the means and resource to buy this kind of infrastructure yourself.
Second option would be cloud, and that would allow you to rent this capacity and be able to scale it up or down as needed. Finally would be local, which usually means on your laptop. Not all lap .
. . laptops can support LLMs of different sizes, but there are certainly LLMs on the smaller end of the range that can be run on the kind of GPUs available in a standard laptop.
The next layer is models. So AI builders have plenty of choice when it comes to what model they can use. One dimension to consider is whether the model is open versus proprietary.
Another dimension is the model size. So we have large language models; we also have small language models that might be lighter weight and able to fit on more lightweight hardware,uh, but might not have exactly the same thinking capacity as a large language model and instead be specialized for more specific things. Finally is specialization.
Which sometimes goes hand in hand with size. Some models might perform better on things like reasoning or tool calling or generating code. Others might have different language strengths than others.
There are plenty of new models over 2 million already in model catalogs, like Hugging Face that can serve any mix of these different needs that an AI builder might have. The next layer of the stack is data. This breaks up into a few different components, so the first would be data sources themselves to supplement the model's knowledge.
This could also include the pipelines to do any processing, pre-processing, post-processing of that data, as well as vector databases you may use. Or retrieval systems, also known as RAG. Vector databases is the step where that external data is actually vectorized into embeddings that are saved so your model can retrieve that context more quickly and augment it with this additional knowledge that the base model does not have.
That's important because base models are usually trained on publicly available information, which might not always be complete to accomplish the task that you have. You might need to supplement with additional data. The next layer is orchestration, because building an AI system that does something more complex than just generating text or answering questions requires breaking the initial user input down into smaller tasks.
Those can start with things like thinking, using the model's reasoning ability to plan out how it will tackle the problem. That can also include things like execution, where the model does tool calling or function calling, as well as steps like reviewing, where an LLM can actually provide its own critique of the initial generated responses and initiate feedback loops to even improve those responses. This layer is very quickly evolving, with new protocols like MCP and new architectures for how to best orchestrate increasingly complex tasks.
Next is the application layer, so the most widely used AI systems do follow a pretty simple design of text in and text out. But as we use these tools in our work in life, there are important features that become critical for the actual usability of AI and these factors make up the application layer. First factor is interfaces.
The most classic interface is text in and text out, but there are other modalities that can be very valuable for certain tasks too, like image, audio, numerical data sets, and plenty of other custom data formats. Also, in the interface, it's really important to keep in mind the ability to do things like revisions or citations so that when the user sees what the model comes up with, they have the ability to edit that or inquire on it further. The second consideration is integrations, and that comes both in the form of integrations, of allowing other tools that the user uses to actually send inputs to the AI system, or to take the model outputs and automate how that gets integrated into some of the tools that they use in their day-to-day work.
All together, these layers of the AI stack, from the hardware to the models, the data you use, how you orchestrate it, and the application and the usability of it, matter because when we have a clear understanding of how they fit together, we can see what's truly possible and make practical choices to design AI systems that are reliable, effective and aligned to our real-world needs. Whether you're building an experimental prototype for your own personal use, or creating an application to power an entire organization, there are key components of the AI technology stack that you must get right to build AI systems that can do more than just generate answers but solve real, meaningful problems. Say, for instance, I'm building an AI-powered application to help drug discovery researchers understand and analyze the latest scientific papers in their domain.
Maybe it starts with a model that I recently heard about that is supposed to be better at highly complex tasks like that of a PhD researcher. Model is an important layer of the stack, but it's just one piece of the puzzle. There's also the infrastructure that that model will run on, because not all LLMs, large language models, can run on standard enterprise CPU-based servers, and not all are small enough to run on a laptop.
So it matters what infrastructure you have access to and how you choose to deploy it. Next is data because in this example, the whole point is to help scientists understand the latest papers in their field. And models typically have a knowledge cutoff date.
So if we want to talk about papers from, say, the past three months, that means we have to provide the AI system with extra data. That will be the data layer. Next would be the orchestration layer.
Because to do a complex task like this probably is going to require more than simply providing a large prompt into the AI system and getting an output, a single output, out. Instead, we'll want to break that user query up into different parts. Help um, plan how the AI solution is going to actually tackle this problem, what data it needs, and then do the summarization and creating an answer and maybe even review that answer.
Finally is the application layer. And this is because at the end of the day, there's a user using this tool. So there will have to be an interface that defines what the inputs will be and what the outputs will be.
It might not be as simple as text in and text out. And there's also the issue of integrations. So, will the actual results of this be something that's integrated into other tools that this user uses?
It's important to understand all the layers of the AI stack, whether you're building a solution from scratch or using solutions which might manage several of these layers for you as a service. This is because across the stack, from the hardware, all the way up to the user interface level, the choices you make will have important implications on your solution's quality, itsspeed, its cost and its safety. When it comes to infrastructure, LLMs generally require AI-specific hardware, specifically GPUs, and these can be deployed in one of three ways.
The first would be on premise, that is, assuming you have the means and resource to buy this kind of infrastructure yourself. Second option would be cloud, and that would allow you to rent this capacity and be able to scale it up or down as needed. Finally would be local, which usually means on your laptop.
Not all lap . . .
laptops can support LLMs of different sizes, but there are certainly LLMs on the smaller end of the range that can be run on the kind of GPUs available in a standard laptop. The next layer is models. So AI builders have plenty of choice when it comes to what model they can use.
One dimension to consider is whether the model is open versus proprietary. Another dimension is the model size. So we have large language models; we also have small language models that might be lighter weight and able to fit on more lightweight hardware,uh, but might not have exactly the same thinking capacity as a large language model and instead be specialized for more specific things.
Finally is specialization. Which sometimes goes hand in hand with size. Some models might perform better on things like reasoning or tool calling or generating code.
Others might have different language strengths than others. There are plenty of new models over 2 million already in model catalogs, like Hugging Face that can serve any mix of these different needs that an AI builder might have. The next layer of the stack is data.
This breaks up into a few different components, so the first would be data sources themselves to supplement the model's knowledge. This could also include the pipelines to do any processing, pre-processing, post-processing of that data, as well as vector databases you may use. Or retrieval systems, also known as RAG.
Vector databases is the step where that external data is actually vectorized into embeddings that are saved so your model can retrieve that context more quickly and augment it with this additional knowledge that the base model does not have. That's important because base models are usually trained on publicly available information, which might not always be complete to accomplish the task that you have. You might need to supplement with additional data.
The next layer is orchestration, because building an AI system that does something more complex than just generating text or answering questions requires breaking the initial user input down into smaller tasks. Those can start with things like thinking, using the model's reasoning ability to plan out how it will tackle the problem. That can also include things like execution, where the model does tool calling or function calling, as well as steps like reviewing, where an LLM can actually provide its own critique of the initial generated responses and initiate feedback loops to even improve those responses.
This layer is very quickly evolving, with new protocols like MCP and new architectures for how to best orchestrate increasingly complex tasks. Next is the application layer, so the most widely used AI systems do follow a pretty simple design of text in and text out. But as we use these tools in our work in life, there are important features that become critical for the actual usability of AI and these factors make up the application layer.
First factor is interfaces. The most classic interface is text in and text out, but there are other modalities that can be very valuable for certain tasks too, like image, audio, numerical data sets, and plenty of other custom data formats. Also, in the interface, it's really important to keep in mind the ability to do things like revisions or citations so that when the user sees what the model comes up with, they have the ability to edit that or inquire on it further.
The second consideration is integrations, and that comes both in the form of integrations, of allowing other tools that the user uses to actually send inputs to the AI system, or to take the model outputs and automate how that gets integrated into some of the tools that they use in their day-to-day work. All together, these layers of the AI stack, from the hardware to the models, the data you use, how you orchestrate it, and the application and the usability of it, matter because when we have a clear understanding of how they fit together, we can see what's truly possible and make practical choices to design AI systems that are reliable, effective and aligned to our real-world needs.