Last week, a blog post called “The 8 Levels of Agentic Engineering” by Bassim Eledath tore through Hacker News with over 200 upvotes, and I think it hit a nerve because it described something a lot of us have been feeling but could not quite articulate: the gap between what AI coding tools can do and what most teams are actually getting out of them.
The core argument goes like this: Anthropic’s team can ship a product in 10 days using AI agents. Another team using the same models cannot move past a broken proof-of-concept. The difference is not the model. It is the level of agentic engineering maturity the team has reached.
I have spent the last week trying to map my own experience (and the experience of the teams I advise) onto this framework. Some of it is brilliant. Some of it is aspirational to the point of being science fiction. Here is my honest take on each level and where most teams actually sit today.
Levels 1 and 2: Tab Completion and Agent IDEs
These are the entry levels and, honestly, where most professionals still are — even if they will not admit it at the next conference.
Level 1 is GitHub Copilot-style tab completion. You write some code, the AI suggests the next few lines, you hit tab. It is autocomplete on steroids. It works best for experienced developers who already know what they want to write and just want to type less of it.
Level 2 is the agent IDE era — Cursor, Windsurf, and similar tools that connect an AI chat to your entire codebase. You can describe what you want in natural language, and the AI makes multi-file edits. This is where most teams I work with are operating today, and honestly, it is a massive productivity boost over Level 1.
My friend Priya, who leads engineering at a mid-size startup, told me last Tuesday that her team’s velocity effectively doubled when they moved from Copilot to Cursor. “But we hit a ceiling almost immediately,” she added. “The AI was fast at writing code, but it kept missing the bigger picture. It would solve the problem in front of it while creating three new problems in files it had not seen.”
That ceiling is what separates Level 2 from Level 3.
Level 3: Context Engineering
This is where things start getting interesting, and where I think the real differentiation begins between teams that use AI as a toy versus teams that use it as infrastructure.
Context engineering is the practice of carefully controlling what information the AI can see. It sounds simple. It is not. It is about writing better system prompts, crafting .cursorrules or CLAUDE.md files that give the AI the right constraints, managing conversation history so a long-running agent does not lose the plot, and deciding which tools to expose in each interaction.
The mantra, as Eledath puts it: “Every token needs to fight for its place in the prompt.”
I started taking context engineering seriously about six months ago, and the difference was immediate. A coding task that used to require 4-5 rounds of correction now lands correctly on the first or second attempt. Not because the model got smarter, but because I got better at telling it what it needed to know.
Here is a concrete example. I was building a payment integration last month. Without context engineering, I would tell the AI “add Stripe payment processing.” It would generate code that looked reasonable but used an outdated API version, ignored our error handling patterns, and put the webhook handler in the wrong directory.
With context engineering, my CLAUDE.md file included: our Stripe API version, our error handling conventions, our directory structure, our naming patterns, and three examples of existing integrations. The AI nailed it on the first try.
Tom, a senior engineer at a company I consult for, compared it to hiring. “You would not hire someone and say ‘build a payment system’ without an onboarding doc, a code style guide, and examples of existing work. Why would you do that with an AI?”
Fair point, Tom.
Levels 4 and 5: Agentic Coding and Background Agents
This is where the framework gets exciting and where most teams have not arrived yet.
Level 4 is about running AI agents that do not just complete tasks you describe but actually plan and execute multi-step workflows. You give the agent a goal, not a sequence of instructions, and it figures out how to get there. This is the difference between “refactor this function” and “improve the performance of this module” — the latter requires the agent to profile, identify bottlenecks, plan changes, implement them, and verify results.
Level 5 takes this further: background agents that work while you do other things. You start a task, switch to something else, and come back to find a pull request waiting for your review. Claude Code with background tasks, GitHub Copilot Workspace, and similar tools are pushing into this territory.
I have been experimenting with Level 5 workflows for about two months, and here is my honest assessment: it works about 60 percent of the time. The other 40 percent, the agent goes down a path that looks reasonable in isolation but does not account for some context it did not have access to. It is the “technically correct but practically wrong” problem.
The fix, as Eledath points out, is not better models (though those help). It is better guardrails. Better rules files. Better test suites that catch “practically wrong” before merge.
Levels 6, 7, and 8: The Frontier
This is where the framework enters territory that most teams have not reached, and frankly, where I start being a little skeptical.
Level 6 is multi-agent orchestration — multiple AI agents working together, each specialized for different tasks. One agent writes code, another reviews it, a third handles testing. In theory, this creates a virtual engineering team. In practice, the coordination overhead is significant and the failure modes multiply.
Level 7 is the “agents raising PRs while you sleep” level — fully autonomous agents that work overnight, navigate ambiguity on their own, and present you with completed work each morning. Eledath notes this is where the “multiplayer effect” really kicks in: if you are operating at Level 7 but your teammate reviewing your PRs is at Level 2, your throughput is bottlenecked by their review speed.
Level 8 is speculative: AI systems that not only write code but help you decide what to build. Product strategy, architecture decisions, prioritization — all informed or driven by AI. This is more vision than reality at the moment.
Sandra, an engineering director I respect deeply, was characteristically blunt about Levels 7 and 8: “I am still trying to get my team to write good pull request descriptions. Let us maybe master the fundamentals before we hand the car keys to a robot.”
(She has a point, even if the robot is getting better at driving.)
Where Most Teams Actually Are Today
Based on conversations with about 30 engineering teams over the past three months, here is my rough distribution:
- Level 1-2 (Tab complete + Agent IDE): About 50 percent of teams. They use Copilot or Cursor but have not invested in context engineering or agentic workflows.
- Level 3 (Context engineering): About 30 percent. These teams have rules files, system prompts, and are thoughtful about what context they provide.
- Level 4-5 (Agentic coding + Background agents): About 15 percent. These teams are experimenting with autonomous coding agents and background tasks.
- Level 6+ (Multi-agent + autonomous): About 5 percent. Mostly AI-native companies that have built their entire workflow around agent orchestration.
The gap between Level 2 and Level 3 is where I see the biggest return on investment. It requires no new tools — just a mindset shift and some upfront work on configuration files. If you are on a team using an AI coding tool and you have not written a project-level context file, that is your next step. Not Level 7 autonomous agents. Just a good .cursorrules or CLAUDE.md.
The Multiplayer Problem
The insight from Eledath that resonated most with me is the multiplayer problem: your output depends on your teammates’ levels, not just your own.
If you are an engineer operating at Level 5 — raising PRs from background agents overnight — but your reviewer is at Level 2 and takes three days to manually review each one, your throughput is capped at their speed. The agent advantage evaporates at the review bottleneck.
This has massive implications for how teams structure code review, how organizations train engineers on AI tools, and — more broadly — how we think about engineering velocity. It is not enough for one person on the team to level up. The whole team needs to move together.
Marcus, who manages a platform team of about 20 engineers, is wrestling with this exact problem. “I have three engineers at Level 4 who are producing incredible output, and the rest of the team is at Level 2. The review queue is backed up, and the Level 4 engineers are frustrated. But I cannot just tell everyone to skip review — that is how you get production outages.”
There is no easy answer here. But being aware of the dynamic is the first step.
My Honest Take
The 8 Levels framework is useful because it gives us a vocabulary for something we could all feel but could not articulate. It turns “some teams are better at AI than others” into something more actionable: here is where you are, here is where you could be, and here is the specific thing to work on next.
But I also think it can create anxiety if taken too literally. Not every team needs to reach Level 7. Not every project benefits from autonomous overnight agents. Sometimes the right answer is a well-written context file and an engineer who knows their codebase deeply.
The models are going to keep getting better. The tools are going to keep evolving. But the engineers who understand their systems deeply and use AI as an amplifier rather than a replacement? Those are the ones shipping the best code. At every level.
(And if you are still at Level 1, using basic tab completion? That is fine. Really. Start with context engineering. Write one rules file. See what changes. You do not need to boil the ocean. You just need to help the AI help you.)
📚 Related reading:
Related: The protocol debate is heating up too. Read why MCP went from messiah to pariah in six months — and why the enterprise version still has a future.
Related Reading
If you found this useful, check out these related articles: