What is a harness

What is a harness

2026-04-08

by Uri Walevski

I was talking with a friend recently about AI agent architecture, and the conversation kept circling back to the same question: what should a harness actually be, and should there be a standard for how harnesses work? My answer is a resounding no.

A harness is the minimal thing you need to connect an LLM to the real world. A conversation loop, a way to call tools, a way for tools to report back. That's it. The moment you try to standardize it, you're implying there's more to it than that, and that's where things go wrong.

When you're building an AI agent framework, there's a natural temptation to add structure on top of that minimal core. You start with the harness, then you add plugins. Then hooks. Then lifecycle events, middleware, side effects. Each addition makes sense on its own, but before long the harness is doing everything and the agent is doing nothing.

This is inheritance thinking applied to agents, and I think it leads to the same dead ends inheritance usually does.

The plugin trap

The temptation is understandable. You want your agent to follow certain rules when writing code, check for code smells after every edit, maybe bundle skills, prompts, rules, tools, and hooks into a single package. Like npm packages but for agents.

The problem is that plugins make the harness load-bearing. Once you have a hook system, every new harness feature has to account for it. Add streaming? Hooks need to handle streaming. Add parallel tool calls? Hooks need to handle concurrency. Add a new model provider with different context limits? Every hook that injects content now needs to be aware of that. The hook system becomes a surface area that everything else in the harness has to accommodate, and it's the same failure mode as deep inheritance hierarchies: implicit coupling between things that should be independent, and a system that gets more brittle with every new capability.

What it means to compose well

Two things compose well when you can combine them without either one needing to know about the other.

A skill and a tool compose well. You can add a new skill about frontend development without changing any tools, and you can add a new tool for database access without rewriting any skills. The conversation ties them together at runtime, but neither depends on the other at design time.

A plugin and a hook don't compose well. A plugin that runs a code review after every edit needs to know about the harness's edit lifecycle. A hook that injects a linting step needs to know about the plugin's output format. Add a second plugin and you need to think about ordering, conflicts, and shared state. Each new piece increases the cost of every other piece.

This is composition vs inheritance in a nutshell. Composition means independent pieces that combine freely. Inheritance means pieces that depend on a shared structure. The shared structure is the harness, and the more you put into it, the more tightly coupled everything becomes.

Tools are the composable primitive

Instead of hooks in the harness, you put behavior in the tools.

Say you want your agent to review code after each edit. You don't add a "post-edit hook" to the harness. Instead, the edit tool itself injects a thought into the conversation: "review what you just wrote." The harness exposes one simple API for this, an inject-context endpoint, and any tool can use it whenever it chooses.

This is composable because two tools can both inject thoughts without conflicting, they're just adding information to the conversation. It keeps the harness thin, which is where you want complexity to not live.

The same pattern works for something like VM provisioning. When your agent spins up a machine, the tool pings the conversation when it's done. That ping is just a history event that could arrive in the middle of anything, and the agent handles it like any other context, because that's what it is.

The harness will change, so build around it

Here's the thing people get backwards: the harness itself is not future-proof, and it shouldn't try to be. Every new model generation changes what a harness needs to do. Context windows grow, tool calling conventions change, reasoning capabilities shift. The harness will keep getting rewritten, and that's fine, because it's small.

What matters is that everything around the harness is future-proof. The skills, the tools, the conversations, the channels where information flows. These are the things worth standardizing, worth building marketplaces around, worth investing in as shared infrastructure.

A marketplace of plugins is a bad bet because plugins are coupled to a specific harness, and the harness is going to change. When it does, every plugin in that marketplace breaks. But a marketplace of skills? Skills are just documents. They work with any harness, any model, any architecture, because they're written for the agent's understanding, which is really just human understanding. Same goes for channels, tools, or any other primitive that's defined by what it does rather than how it hooks into a particular system.

This is why the "where do we standardize" question matters so much. You want standards and ecosystems at the layer that won't change, not at the layer that will. The harness is the volatile layer. Skills, tools, and conversations are the stable layer. Build your markets there.

What "future-proof" actually means

Anything built for humans will survive. Anything built specifically for agents will not.

A conversation is a human interface. Text in, text out. That pattern existed before LLMs and will exist after whatever replaces them. When a tool injects a thought into a conversation, it's using a human interface, one that works today, will work with a different model tomorrow, and will work with a completely different agent architecture next year.

This is the real test for any abstraction in this space. Can a human use it? A conversation, yes. A skill file full of instructions, yes. A tool that does something and reports back, yes. A lifecycle hook that fires between the planning phase and the execution phase of an agent loop? That's machinery, and it'll get replaced when the machinery changes.

Why MCP will die and skills will live on

MCP is the plugin pattern applied to tools. It's trying to make tools into a standard interface that the harness manages, routes, and orchestrates, and that makes it an agent interface, not a human one.

Skills, tools, and conversations are human interfaces. A skill is just a document, text that tells the agent what it needs to know. A tool is just a function that does something in the real world and reports back. A conversation is just a channel where information flows in both directions. You could explain any of these to a non-technical person in one sentence, and they'd get it. Try that with a lifecycle hook or a middleware chain.

These primitives are also the ones that compose naturally. You can add a new skill without worrying about whether it conflicts with existing tools. You can build a new tool without knowing what skills are loaded. The conversation ties them together without any of them needing to know about each other. That's composition, and it's why these things will outlast any specific agent framework or protocol.

MCP, plugins, and hooks are built around assumptions about how a specific kind of agent works, what its lifecycle looks like, how it processes tool calls, what events it emits. Those assumptions change every few months. The protocol you standardize today won't transfer to the next architecture, because it was designed for the machinery, not for the human pattern underneath.

Intelligence and control are a tradeoff

There's a deeper issue with hooks and lifecycle events. "Always do X when Y happens" is not an agent pattern, it's CI, a git hook, a cron job.

The whole point of an agent is that it has judgment about when to do things. You can't hardcode when it's "correct" to review your own code. Sometimes it's after every file, sometimes after a whole feature, sometimes never because the change is trivial. That's intuition, and if you bypass it with a mandatory hook, you're paying for intelligence but not using it.

You can't get both full control and full intelligence. If you want the agent to always follow a rigid sequence, just write a script. If you want the agent to exercise judgment, you have to let it decide when to check itself.

Keep the harness minimal

A harness is a conversation loop, tool execution, and a way for tools to inject context back. Skills give the agent knowledge, tools give it capabilities, and the conversation gives it context.

Every layer you add beyond that is a layer that can break, conflict, or surprise you. Composition means each piece works independently and combines without coordination. Inheritance means each piece depends on a shared structure that everyone has to agree on. The harness that survives is the one with the least in it, and the ecosystem that survives is the one built around the things that don't change.

What the harness should still do

There are two things the harness should handle that aren't about conversation or tools.

Secret swapping. Your agent has credentials. API keys, database passwords, OAuth tokens. If the LLM can read them, they leave your machine. If the LLM can write them to a file or output them in a response, they leave your machine. The harness swaps secrets for placeholders before prompts go to the LLM, and restores them before tools execute locally. The model reasons about credentials it never sees. The shell runs with credentials it never knew.

LLM provider abstraction. The harness speaks to models through a clean abstraction. This keeps your skills, tools, and prompts portable across providers. A skill written for Claude works when you switch to Gemini or a local model. The harness handles the translation. Your content stays clean.

← All posts