No matter which AI coding agent you're using, learning how to maximize your tokens can save you real money and significantly improve the results you get. In this video, I'll show you five practical ways to make better use of your tokens without sacrificing output quality. Most of the examples use cloud code, but the same ideas apply to any AI coding agent.
You'll just need to adjust the syntax. Let's start from the basics. Every time you send a prompt, whether it's plain text, code, images, or attached files, that all gets converted into input tokens.
And whatever the model sends back, that's counted as output tokens. Now, you might be thinking, wait, I'm on subscription. I'm not paying by the token.
And you are right, technically. But here's the catch. Those subscriptions are still limited by token usage.
Each plan gives you a certain amount of usage over a time interval. So if you are not being build per prompt, you are absolutely being capped. And once you hit that limit, you'll see this message.
Prop plan gives you about 45 CLA messages and 10 to 40 claude code prompts every 5 hours, while the max plan multiplies that by five or even 20 depending on the max plan. So how do you actually reduce usage? By the way, I forgot to introduce myself.
I help developers turn AI into real workflows. So sub and like. It really helps me provide more value for you.
The first method is tricky because it's not a known fact, but LLMs are stateless. That means that every time you send a new prompt, it includes the entire conversation history unless you tell it otherwise. So if you had 20 back and forth messages with cloud code and you didn't clear them, you're sending the full history again and again every single time.
So it means that you are paying for all those tokens repeatedly. So tip number one is to take control over the context you attach. Use clear command when switching task or getting into a messy chat.
You can also use the compact command when the context is still helpful but you want it to be lighter. Cloud actually does compact automatically when your thread gets too long. You've probably seen this message.
Tip number two is all about precision. Cloud can understand large code bases, but if you let it explore freely, you are burning tokens fast. Instead of saying, "Here's my whole repo.
Go find the bug. " Say something like, "Check the verify user function inside O. js.
" That's where the issue probably is. Being specific cuts down token usage, speeds up response, and gives you more focused answer. Don't let Cloud Code Explore guide it instead.
Tip number three is to treat each 5 hours window like a sprint. Before you open, list all the tasks you wanted to tackle. Prioritize them.
Start with the most important ones and batch your work into one session. This simple shift helps you to stay on target, avoid distractions, and reduce unnecessary token usage. If you are on the Max plan, you have access to the Oppus model, which is awesome, but super expensive in token terms as we saw.
Tip number four is all about switching models strategically. Use Oppus for highlevel planning, complex logic, and deep debugging. then switch to the set model for buildout follow-ups and light edits.
The next method is related to MCPS. MCP stands for model context protocol and its goal is to provide an open standardized way for models to connect with external tools. Just imagine that every time you wanted to connect to an external tool, you had to write specific code for integration.
This is not scalable. Great. But what does MCP look like?
For example, this is the official GitHub MCP. It has many tools. Each one is just an API call.
When we install an MCP, all these definitions are added to our model context. And from now on, the model can use them as he wishes during development. This is called direct tool calling.
But the MCP approach is so easy and widely accepted. What's the problem? Imagine what will happen if you load dozens of MCPS or even not that many.
But with a lot of definitions like we have in GitHub MCP, it will consume all your context window. Let's see it in action. If I add the playright MCP, it consumes 17.
6K 6k tokens. When I add Superbase MCP, it jumps to 38. 5K.