If Andre Karpathy says he's never felt this behind as a programmer, which he did, all of us should be glad to know as much as we have and should not feel bad about trying to learn more. We're all in the same boat together. What has happened over the last year is a phase transition in technical leverage, and that's what Andre spent a lot of his time talking about over the holiday week between Christmas and New Year's.

And that's what I want to talk about here in my executive briefing today. Fundamentally, what changed is what it means to be technical. And the new technical skill tree that's getting unlocked is no longer just for engineers.

As a leader, you need to think about anyone in your organization who needs the authority to tell probabilistic machines, namely large language models, how they can usefully generate work. And so today, I'm going to take Andre's reflection seriously and reflect back on it and try to lay out a useful skill tree that talks about what's changing, why these new skills feel hard, and how as org leaders, we can start to lay out skill levels and trees that feel useful not just for engineers, but for everyone in the business. Let's start by saying the obvious thing that most people tend to dance around.

This is hard. Andre feels behind. It's because the job is genuinely being refactored while we are all working as quickly as we can.

Tools are changing every week. Capabilities keep jumping. Mental models decay quickly.

One of the things Andre was observing that I think is really true is that if you haven't played with Opus 4. 5 from a technical perspective in the last month, your world model is already outdated. And that's just four weeks ago.

And the emotional whiplash isn't just it's a new tool. It's that the old way that we anchored on our sense of competence, what it meant to be skilled, what what it means to have control over our tools and our craft, all of that has to change because it stopped matching reality. For most of modern engineering history, leverage came from writing more correct instructions faster than other people, sure, but really more correct instructions on problems that mattered.

So if you really wanted leverage in the engineering discipline, you picked the problems that mattered, you wrote correct instructions, i. e. workflows, i.

e. programs and then you were able to leverage those at a wide scale in a very simplified fashion. That is the story of Bill Gates and Windows.

You internalize abstractions, you master your tools, you shape deterministic systems, and before you know it, you have Windows 9. You you write logic, the machine executes the logic the exact same way on every single CD copy that Windows hands out. When something breaks, the system can be inspected.

You can ship a bug patch. You can trace causality. You can step through the behavior end to end and find out what's wrong.

In that old world, authorship and authority were really tightly linked. If you were the person who wrote the program, you had the authority to fix it and also the knowledge to fix it. You wrote the behavior, so you owned the behavior.

The assumption of control is baked into the very rituals we have as software engineers where we talk about the idea of the engineer who knows the code. The engineer who authors the code. The engineer who knows where all of the skeletons are in the codebase because that engineer touched it.

Because that engineer knows it. This is why the conventional wisdom for the last several decades has been that no matter how frustrated you are with your founding engineers, you keep them around because they know the code. That whole system of assumptions is changing.

That regime is ending. The unit of leverage is shifting from writing code toward orchestrating intelligence. And that's not just a buzzword.

Intelligence here doesn't mean a magical AGI vibe. It means a very specific kind of component that has entered the stack that's net new. It's a probabilistic component.

It's stochastic. It's fallible. It's changing.

You can pick your adjectives, but you see the idea. The model is not a deterministic function. An LLM is a probabilistic token generator.

It produces a plausible sequence conditioned on inputs and its internal reasoning is not something you can fully inspect the way that you inspect your code. You can't single step it through. You can't rubber duck it.

If you're an engineer, you can't reliably reproduce it. You can't treat it like a clean abstraction. It's truly an alien component that has really weird ergonomics for traditional software engineers.

And that's why this moment feels like an earthquake. We didn't just get a better Python library, right? We got a new kind of machine in the loop.

And once you accept that, you can explain almost everything that people are feeling, especially the best engineers because the assumptions that engineers trained on are breaking in a number of specific ways. And if you're wondering as a leader why this matters for you, I got news. Your whole technical team is wrestling with this.

And now it's no longer a technical team issue. You're going to have to expand the blast radius and understand how the skill trees we talk about in this video touch all of your job families, not just the technical. But first, let's understand what what has broken so that we can rebuild on top of it correctly.

The first thing that broke is that control is not the default anymore. In the old world, when you authored behavior, it was yours. And in the new world, you condition behavior, right?

You don't author it specifically. You can shape outcomes through prompts, through context windows, through memory structures or tool access. And the model responds probabilistically.

The same input can yield somewhat different outputs. The same workflow, it is possible it can drift when the underlying model changes. In fact, it's likely.

Mastery is then less about I can make it do exactly what I want every time the same way and more about I can steer it toward X outcome reliably. I can detect when it's off and I can correct it very quickly. The mental shift is from authorship to steering.

Second thing that breaks effort no longer clearly maps to output. In a deterministic world being better meant you could frankly do more with your time, right? You were faster at typing, faster at debugging, better recall, better architecture.

As long as you worked on the right problem, you were going to have more leverage. In a probabilistic world, that bottleneck moves. Sometimes one person gets a 10x jump because they know how to set up a delegation loop while another person grinds away manually and gets less done despite being just as smart.

And that's what Andre is calling a skill issue these days. The skill is new. It's unintuitive.

It's hierarchical. It's the it's the need to develop a skill of delegation instead of a skill of execution and failure to learn it means your effort just doesn't convert into leverage in the new AI economy. The third thing that broke is that the abstraction stack got inverted.

Historically, highle reasoning collapsed downward into code very cleanly. Right? You have your intention and it collapses into implementation.

This is where the whole idea of product management and requirements comes from. But now low-level implementation often expands upward from intent. You end up in a place where you have intent and you jump straight to generated artifacts and then you verify the output.

In other words, the job shifts away from constructing something toward supervising a construction crew. You have a defined goal. You've defined constraints for your workspace.

You have evaluations the system needs to pass. And you have correction methodologies. And now the work moves from write an instruction to can you design that system.

So the system self evolves until it hits the correct behavior. So in that world the intention shifts into okay I have I have a desire to make a particular piece of program that can do a particular task for me. Here are my evals.

Here are my constraints. I'm going to let it generate artifacts maybe code bases or pull requests until it passes my eval. and then I have a verified output I can actually do something with.

That's a whole new way of doing things that it changes the way we think about abstraction. Fourth, the old boundaries of engineering don't make sense anymore. The most important divide used to be between engineer and non-engineer and now it's between someone who can delegate and someone who can't.

And that phrase of like that the concept of of preserving authority while delegating generation is a core of the new skills we need in the AI world. And it's not limited to engineering. If you think about what we are doing with AI, we still have to be in charge.

We still have to preserve the authority. But we do have to delegate to get leverage. Authority used to come for free for engineers when they wrote the code.

Because if I write the code, I can justify the behavior of the system. I can point to this line and I can explain the root cause. But in a probabilistic world, the machine will generate behavior and you lose that natural chain of custody.

You're not automatically in authority on what the machine is doing. It is possible to ship something correct without fully understanding why. And you can also ship something wrong that looks correct.

So the center of the craft is changing to how do you design a workflow where you can delegate a huge amount of the generation activity but ensure human authority over what is actually shipping. That that is the new technical skill tree. But it turns out that's not just for technical people.

I don't know that I believe in technical people anymore. It's for everybody because every profession is becoming some version of orchestrate probabilistic components while keeping authority. Like that is the definition of knowledge work.

Now programming just ran into this first. So let's take a minute. Let's actually lay out what we mean by this new skill tree and why I think it's important for leaders to pay attention to it.

I'm going to describe this skill tree as a hierarchy of nodes. Every node is a capability that you can demonstrate. Every node has a failure mode if you skip it.

And the whole tree is built around this core idea that probabilistic models require you to separate your decisions from the act of token generation. And that's the root node we'll start with. If you don't understand that you have to separate generation from decisioning, everything else gets really chaotic because a probabilistic model is is incredibly good at generating.

It can generate drafts or options or code or summaries or transformations or hypotheses or structured outputs. What it is not allowed to do if you want reliability is to be the final authority. The workflow must decide, the system must decide or the human must decide.

But the model should not decide what's true, what's safe, and what's planned. I'm going to say that again because people might scratch their heads. When I say the workflow must decide or the system must decide, I mean you can architect models inside workflows in such a way that they produce extremely dependable, extremely accurate outputs measured against definitions of correctness that humans hold and humans can then take edge cases.

When I say the model should not decide, I mean that the LLM by itself without that workflow harness around it can't reliably decide what is correct. It can't reliably decide what's safe. It can't reliably decide what is approved or what should ship.

When we get burned in the workplace with LLMs, it's almost always because we left a token generator to be the judge because I guess we got fooled by the hype. So the entire tree that I'm talking about here is really a set of skills that are required to do one thing. Let the model generate quickly while preserving human authority through the workflow.

If you follow that through, I think the hierarchy is actually pretty intuitive. Level one is really about conditioning. And again, keep in mind as leaders, this is a way you can start to think about all of the different roles in your company that have to do with knowledge work.

It is not just for engineers. So, conditioning, steering a probabilistic component. The first node there might be intent specification.

In a deterministic system, ambiguous requirements are still going to cause you problems, but the system won't hallucinate what you meant. In a probabilistic system like we have today, ambiguity is gasoline on the fire. The model will happily fill the gap with plausible nonsense.

So, you need a very very tight problem contract and we just haven't been used to that in knowledge work. You need a very tight purpose, a tight audience, tight constraints, definitions, etc. This is not just managerial overhead.

In the new world, it's steering the inputs so that you can reduce variance and increase the reliability of the outputs. Node number two is around context engineering. A huge amount of model failure is simply context failure.

Wrong material, missing material, too much material, poor ordering of material, conflicting instructions, truncated history. Context engineering means you can reliably decide what goes into the context window, what stays out of the context window, what is summarized, what is quoted, what must be preserved verbatim, what's not trusted. This is the new IO and databases of the AI stack.

And then node number three in this level is constraint design. So constraints are how you turn a token generator into a component that's reliable. You have defined output formats, defined schemas, defined rubrics.

You have required citations. You have allowed tools. You have token budgets, stop conditions.

A probabilistic systems without constraints is a slot machine. A probabilistic system with constraints becomes a reliable machine that can do work. So those are the first three pieces I would put into this sort of level one of the skill tree around conditioning, around steering, right?

Can you steer with intent? Can you steer with context engineering? Can you steer with good constraints?

Level two once you have that is really around keeping ownership without full authorship. It's what I would call authority in the age of AI. This is a difficult layer.

It's the difference between I used AI and I know how to operate an AI system responsibly. The first thing to learn here is around verification design. How does truth come into the loop?

Right? How do you know it's correct? Because the model can generate a lot of plausible falsehoods and you need really really explicit verification mechanisms.

Some verification can be deterministic. Is it valid with the schema or not? Right?

Is it passing a unit test or not? Some verification is very procedural. So a human can review, can provide a second pass critique, can do adversarial prompting, etc.

The key is that verification is not optional. It is the mechanism that replaces the old guarantee that you got from the authored logic that an engineer would develop. And so what you need to learn at this level is how can you design verifications that ensure tight alignment between system performance and correctness.

Next, provenence and chain of custody becomes a factor here. Remember in the past chain of custody was implicit, right? You can see the workflow and you know who wrote it.

So you have chain of custody. But if authority requires providence, how do we get that in the age of AI? If the output makes claims, you're going to need to be able to design a system that shows where those claims came from.

Sources, citations, quotes, retrieved documents. That is all part of designing a good probabilistic system today. In the deterministic world, you could get that straight out of the code with some testing.

In the probabilistic world, it's about evidence. It's about establishing systems that author traceability as a first class object. You design them to be audited from day one.

And then the next piece is permissions. The model cannot be your security boundary. That's a disaster.

If the system is allowed to email customers, to move money, to change permissions, to merge code, you have to treat it like you treat permissioning in any other part of your system. The permissioning should be deterministic. It should be on a least privilege basis.

And you should go through all the usual tools you have, allow lists, scope tools, approval steps, audit trails, etc. And and this is where you actually can instantiate agents in a way that's useful. Right?

So if we look at these three, what I'm trying to get you to is an understanding of the of the elements of authority in the age of probabilistic machines. You have to be able to verify work. You have to be able to show providence and chain of custody to have useful work.

And you need permission envelopes so that the agents that you set out, you can prove that they are not over permissioned. And that is going to be more and more of a security issue in 2026. So let's say you've learned about authority, you've learned about conditioning and intent steering.

What's next on the skill tree here? Level three of the skill tree is really around workflows. How can you take intelligence as a raw material and turn it into a scaled out factory?

This is where the compounding leverage comes into play. So one piece here is learning how to decompose into pipeline steps. This is where you stop treating the model like a chatbot and you start treating it as a piece in a pipeline.

You build intermediate artifacts. You create checkmates. You keep the generator away from the final decision.

You make failures local instead of global. And you make the workflow runnable by someone else, not just you. And then that goes with failure mode taxonomy.

In deterministic systems, debugging is tracing logic. You can just trace it through the code, right? In probabilistic systems, debugging is really classifying failure modes and finding useful ways to address them.

Was the context missing? What should we do about it? Was retrieval wrong?

How do we adjust the context window? Did the tool fail? Did we declare the tool correctly?

Did constraints conflict? Did it hallucinate? Was the task underspecified?

Did the model just refuse to do it? Did it exceed budget? You basically need a complete taxonomy of errors so that you stop assuming that you can fiddle with the prompt every time and you start identifying the correct layer where the failure occurred and fixing it properly.

And then the next piece here all still inside workflows is how you handle observability. How do you make the system legible? You cannot fully inspect the model's internal reasoning.

So you have to compensate by making the surrounding system extremely observable. traces of your tool calls, inputs used, documents retrieved, intermediate outputs, validations passed or failed, the timing, the cost. This is how you ensure that the system is legible all the way through your workflow.

In a sense, you're taking the skills you learned about auditability at level two and you're scaling them to the workflow layer. And so the three pieces of the workflow layer together give you the room to extend your leverage. It's not just you doing things.

You're now building automated systems. You can decompose into steps. You can diagnose failure modes.

You can figure out observability on a complex system. You're starting to be able to scale LLMs. The final level is compounding.

This is where the leverage becomes more durable instead of something you just set up and and sort of go with once. Evaluation harnesses are so critical here. Without eval, it's difficult to compound.

You just end up improvising faster. Look, eval can be small. They can be a golden set of examples.

They can be regression tests for outputs. They can be scorecards or thresholds. But you do need a harness so that you can change your prompts, your models, and your retrieval methods or your tools without playing Russian roulette.

You also need better feedback loops so that the system corrects itself. The highest leverage comes from your agent operating effectively in a loop where it can draft, critique, revise, recheck, and ship. or it can retrieve an answer, it can site, it can verify, and it can finalize.

The loop makes the generator less risky because errors are caught within the system before final shipment. And it also makes this skill more transferable for you. You don't need to be a genius prompter.

You just need to be able to build a really, really good evaluation loop. The last thing to keep in mind if you really want to scale this is learning the skill of drift management and governance. models are going to change, data is going to change, teams are going to change, attackers are going to adapt, governance means versioning, etc.

Auditability policies and so on. You need to start treating the work you do like production infrastructure even if you're not used to thinking that way because you're not a technical person. And this is the final layer of authority, right?

It's the ability to operate under a condition of continuous change without losing control over the system. So there you go. That's the compounding piece.

Those are the four levels. I am just sketching in the sort of the loose highlevel skill tree. There's a ton of work to be done to fill that out and actually get to a detailed curriculum, a detailed rubric that suits particular job families in the new AI era.

But my goal here has been to share with you how you can take a reflection like Andre and say, "Yeah, you're right. There's a there's a whole new set of skills to learn. " And start to think about it strategically, not just from an engineering perspective.

So, this is not really I I want you to notice this is not really about learning AI tools. It's learning how to operate probabilistic systems as a compute service across your entire business. It applies to everybody.

The lawyer building a contract review workflow and the engineer building a debugging agent are climbing the same skill tree today. They may have different artifacts, but they have the same hierarchy of skills. And this is this is where a very famous computer game becomes the perfect analogy.

Factorio is a game about climbing exactly this kind of skill tree. If you've never heard of it, I'll give it to you quickly. It's just a video game where you land on a new planet and your job is to build an automated factory.

So, you start by handcrafting really basic items, but the system quickly pushes you into automation. And so, you start to improve your mining. You start to install conveyor belts.

start to route more of your outputs into more factories, you you eventually start to automate more and more of the supply chain. This is a great training metaphor for this era because it teaches us the instincts that actually scale. Decomposing problems scales, modularity scales, observability scales.

Understanding where bottlenecks live in a system, that scales, blast radius estimation, that scales. We don't have to be attached to the quality of manual authorship to find meaning in our work. Nobody cares if you personally crafted a gear that goes into the machine.

The thing that matters is that the system produces gears at scale that do useful work. We spent decades equating competence with authorship. Especially in the engineering world, we celebrated supereng engineers that were good at this.

But the world is now going to reward something else. It's going to reward anyone's ability, not just engineers, anyone's ability to design workflows that produce reliable outcomes. Even when the LLM token generator at the heart of the system is stochastic, is probabilistic, is partially opaque.

That's not less skill. It's just different skill. And it's genuinely difficult because it forces you to replace the comfort of control with the discipline of systems.

So if you feel behind, it's not that you're failing. It means you're correctly perceiving that the stack is different. Now the way forward is not going to be frantic tool chasing.

It's obviously not going to be denial. It's choosing to understand that we have a different skill tree. That all of us in the knowledge work world are climbing that tree together and that we do a better job of that when we climb it deliberately.

When we intentionally separate generation from decisioning. When we intentionally learn to condition the behavior of the system with artifacts and constraints. when we learn how to preserve authority in the system when we learn how to build workflows not just prompts and when we can actually make systems compound with evals with good feedback loops etc the new hierarchy won't be based on who codes the fastest it will be based on who can orchestrate uncertainty without losing authority and that's I think that's what technical means now it's for everyone and yes it's absolutely hard because we're learning to operate a new kind of machine while it's being invented But for the organizations that recognize this as a challenge, these are the kinds of human skills that we all need to grow in in order to move faster in the AI era.

This is the opportunity in front of us. The organizations that figure out how to take this understanding appropriately detail it for their particular context, scale that across their workforce. Those are the ones that are going to realize 10x speedups.

The organizations that insist on the old hierarchies of technical versus non-technical and I have my job hat and it's only product manager and I only do product management stuff. Those are the ones that are not going to do well. So the choice is yours.

But I think that this is the end of the technical versus non-technical era and we need to start a skill tree for a new era. Best of luck.

Why Andrej Karpathy Feels "Behind" (And What It Means for Your Career)