Harness vs Framework: Why 'Just Use LangChain' Stopped Being the Answer
Published: June 2026 | Author: David Daniel
Companion to the paper "Harness Engineering" (Paper 1). This article takes a single section of that paper (the harness-versus-framework taxonomy) and expands it into a standalone orientation for practitioners deciding what to build versus what to adopt. The paper develops the full discipline; this piece sharpens one distinction and hands you back to it.
The question changed underneath the answer
For roughly two years, the reflexive answer to "how do I build an agent" was a library. Just use LangChain. Wire up the loop, register the tools, glue the prompts together. The answer was so standard it barely registered as a decision.
By mid-2026, the answer practitioners give each other has shifted. The thing teams increasingly reach for first is not an assemblable framework but a harness: a pre-wired agent runtime where the loop, the tool registry, the context management, and the permission layer already exist, and you bring the model and the task.
Call that a thesis rather than a measurement. Nobody has published survey data on what "most teams" build with, and this article will not pretend otherwise. What the public record does support is something more useful than a market-share claim: the practitioner literature has converged on a vocabulary that splits the old question in two. "How do I build an agent" turns out to contain two different questions (what runs the agent and who builds the thing that runs it), and the harness/framework distinction is the line between them. Naming that line before you write any code is the point of this article.
The definitional frame that the public write-ups have settled on is Agent = Model + Harness, where the harness is "every piece of code, configuration, and execution logic that isn't the model itself." The primary public source for both the equation and that definition is LangChain's "The Anatomy of an Agent Harness" by Vivek Trivedy (March 10, 2026), which states them in exactly those words. The O'Reilly Radar write-up on agent harness engineering by Addy Osmani (May 15, 2026) is the cleanest secondary codification: it quotes Trivedy's one-liner directly and credits him with coining the term "harness engineering". The fact that it reached for the same formulation is part of the evidence that the framing has traveled. And the reframing carries the whole argument: if the agent is the model plus everything around it, then the build-versus-adopt question is really a question about that everything. A framework gives you parts for it. A harness gives you a running version of it. The shift from one default to the other is the story this article tells.
What a harness actually is
Strip an agent to its skeleton and you find a loop: call the model, execute the tools it asks for, feed the results back, repeat until the task is done. The MindStudio write-up on production agent harnesses treats that while-loop as the foundation: a loop engine at the core, a registry of tools and skills around it, a permissions layer enclosing it, with everything pre-wired rather than left to the developer to assemble. The same write-up carries a three-phase framing of the field's trajectory (which it credits to practitioner Akshay Pachaar) from competing on model weights, to context engineering, to harness engineering: each stage widening the scope of what the practitioner is responsible for shaping, from the model itself, to the model's working set, to the entire runtime around the model.
A harness, then, is that runtime shipped as a product. The products usually named as examples (Claude Code, OpenAI's Codex, Cursor, Windsurf) arrive with the loop already running, the tool registry already populated, the permission prompts already designed. (For the same distinction drawn from the evaluation and observability side, see Arize AI's explainer on agent harnesses.) You configure a harness; you do not construct it.
A framework (LangChain and its descendants, AutoGen, CrewAI) sits on the other side of the line. To be fair to frameworks: "assemblable" does not mean empty. Modern framework stacks ship real abstractions (agent classes, graph runtimes, memory primitives, integrations), and a team that wants them can absolutely build a production agent on top. The difference is not capability. You can build the same loop in LangChain that Claude Code ships with.
The difference is ownership of the hard parts. In a framework, the loop, the failure handling, the context management, and the permission boundary are yours: you choose them, wire them, and (the part that compounds) maintain them as models, tool protocols, and safety expectations change underneath you. In a harness, those subsystems ship as defaults, and you inherit a vendor's opinions about how an agent should behave. That trade (flexibility surrendered for a working, opinionated runtime) is the actual content of the harness/framework distinction. Everything else is detail.
The nine subsystems you would otherwise have to build
What makes the distinction consequential rather than academic is the surface area it covers. The enumeration of a production harness carried by MindStudio's write-up, which credits the breakdown to practitioner @engineerprompt, lists nine components: the while-loop itself (the loop engine), context management and compaction, a skills-and-tools registry, sub-agent management, built-in skills, session persistence, dynamic system-prompt assembly, lifecycle hooks, and a permissions and safety layer. (Arize AI's independently written explainer converges on essentially the same nine.) The taxonomy is theirs; the practical reading of it that follows is mine.
Read the list as a build-versus-adopt checklist, because that is what it functionally is. Each item is a subsystem with real engineering depth behind it:
- Context management and compaction is what keeps a long-running agent from drowning in its own history; without it, the loop degrades as it runs.
- Session persistence is what lets a crashed or interrupted run resume instead of starting over.
- Lifecycle hooks are the interception points: the difference between an agent you can govern and one you can only watch.
- The permissions and safety layer is the boundary between an agent that asks before it acts and one that quietly does whatever the model suggests.
A team that adopts a harness inherits a vendor's answer to each of these problems on day one. Exactly which of the nine any given product implements, and how well, varies. The enumeration is a map of the surface area, not a feature matrix for any specific tool, and this article makes no product-by-product coverage claim. But the asymmetry it exposes is the load-bearing point: on one side of the fork, nine subsystems arrive pre-built and pre-integrated; on the other, nine subsystems go on your team's roadmap, and stay there for the life of the system. That asymmetry (not any deficiency in the libraries) is, in my reading, why "just use LangChain" faded as the default answer. The libraries did not get worse. The checklist got visible.
The strongest public evidence the harness side is real
The distinction would be a curiosity if it had stayed blog vocabulary. What moved it past that is a vendor putting first-party weight behind it.
OpenAI published a first-party essay, "Harness engineering: leveraging Codex in an agent-first world," by Ryan Lopopolo, an MTS on OpenAI's Frontier team, describing harness engineering as a practiced discipline inside the company. The essay reports an internal product of over a million lines of code built with zero human-written code, a figure that is OpenAI's own self-report about its own internal project, not an independently audited number, and it should be read that way.
The operational details around that claim come from a separate source and deserve separate attribution. The Latent Space discussion of the work describes the project's code as shipping without human review before merge, and puts the spend at roughly $2,000–3,000 per day at around one billion output tokens per day, explicitly correcting the ~$1,000/day figure that circulated in some retellings. Those numbers are conversation-reported and likewise unaudited; treat them as an order-of-magnitude sketch of what running a serious harness-built project costs, not as a benchmark.
Strip the caveats and what survives is still significant, and it is the directional point that matters for the taxonomy: a frontier vendor is building real internal software on the harness model and publishing the practice under a name. That a first-party essay of this kind turns a label into a category is my interpretation, not OpenAI's claim. But it is hard to read the episode any other way.
The vocabulary has also demonstrably traveled beyond OpenAI. As of June 2026 the term is used independently by LangChain (a framework vendor), O'Reilly Radar (technical press), and Arize AI (an evaluation and observability vendor), each developing the harness/framework distinction in its own terms rather than echoing OpenAI's framing. When sources with no stake in each other's positioning reach for the same word and draw the same line, the taxonomy has stabilized enough to build decisions on.
Which side of the fork are you on
The harness/framework line is the first architectural fork an agent project hits, and it shapes nearly everything downstream: who maintains the loop, where the permission boundary lives, how much of the vendor's worldview you absorb, and what your team is signing up to operate in year two. What follows is decision guidance: author judgment, drawn from the taxonomy above, not from benchmark data.
You are probably on the harness side if the work is shaped like the work harnesses were built for (agentic coding and adjacent tasks against files, repositories, and shells), and the vendor's opinions about permissions, compaction, and session handling are opinions you can live with. The deal is straightforward: you give up architectural control over nine subsystems you were unlikely to build better, and you get an operationally mature runtime now instead of after a roadmap.
You are probably on the framework side if the agent is a component of your own product rather than a tool your team uses; if the loop you need is shaped differently from anything a coding-agent vendor imagined; or if compliance, data-handling, or governance constraints mean you must own the permission boundary outright rather than inherit it. Then the framework's flexibility is not a tax; it is the requirement, and the nine-subsystem to-do list is simply the cost of meeting it.
The expensive failures live in the conflations. A team that needed a framework's freedom but adopted a harness spends its time fighting the vendor's opinions at every hook and permission prompt. A team that needed a harness but reached for a framework spends its time rebuilding a worse version of one (re-deriving compaction, persistence, and safety boundaries that it could have inherited) before it writes a line of the thing it actually set out to build. Neither choice is wrong in general. They are different decisions, and the only reliable way to make the right one is to name the fork before you build.
The third path: owning an open harness
Naming the fork honestly also means admitting it is no longer strictly binary. As of mid-2026 a third option has taken recognizable shape: open-source, deliberately minimal harnesses that you adopt the way you adopt a vendor product, and then own outright, the way you own framework code. The cleanest statement of the idea is the homepage tagline of pi (pi.dev): "There are many agent harnesses, but this one is yours." pi is an MIT-licensed, terminal-based harness that is minimal on purpose. Its homepage lists what it deliberately left out as the pitch ("No MCP," "No sub-agents," "No permission popups," "No plan mode"), and the launch post by creator Mario Zechner (the libGDX author) explains the philosophy behind it: four default tools, a system prompt and tool definitions that together come in under a thousand tokens, and 15+ model providers per the homepage. One omission deserves emphasis under a taxonomy that treats the permission layer as a core harness subsystem: pi ships no built-in permission system at all; the repository is explicit about it, and points users who need boundaries to containers and sandboxes instead. Zechner joined Earendil, the company co-founded by Armin Ronacher, in April 2026, and the project moved to the earendil-works organization that May; the repository sits at 61.3k GitHub stars as of June 9, 2026. Note what kind of evidence this is: vendor self-description plus two recognized open-source authors' own blogs, with no major-press validation behind it yet.
The bigger entry is OpenCode (opencode.ai), an MIT-licensed open-source coding agent from Anomaly (the SST team), at roughly 169k GitHub stars on its repository as of June 9, 2026. Its positioning (vendor self-description, but a consistent one) is flexibility and no lock-in: an at-cost Zen model gateway whose stated goal is to pass price drops along "by selling at cost," a $10/month plan ($5 the first month) for open-weight coding models (Qwen, DeepSeek, Kimi, and others), and reuse of subscriptions you already pay for, including an official GitHub partnership under which paid Copilot subscribers authenticate into OpenCode with "no additional AI license needed" (GitHub Changelog, Jan 16, 2026).
Why teams pick this path matters, because the community shorthand ("it's cheaper") is not what the on-record sources say, and neither project's official pitch is a price war. There is no independent reporting of a billing-driven enterprise migration wave to open harnesses; treat any cost framing as unproven. The stated motivation is lock-in and control. Austin Vance of Focused Labs puts it bluntly: "The agent harness is the new lock-in layer. Better to own it on purpose" (Focused Labs, Jun 4, 2026; a consultancy blog, with the positioning interest that implies). Kai Waehner, Confluent's Global Field CTO, makes the structural version of the argument on his own blog: agentic lock-in "accumulates at multiple layers simultaneously" (Apr 6, 2026; named-expert blog, not press). And there is one well-documented enterprise deployment at scale: Cloudflare's internal AI engineering stack uses OpenCode as the primary engineer-facing tool (its AI coding tools reached 3,683 internal users between Feb 5 and Apr 15, 2026, and OpenCode carried 27.08 million of their messages, with Windsurf, the next tool in the table, under half a million), routed through Cloudflare's own AI Gateway for cost tracking and data-retention policy (Cloudflare engineering blog, Apr 20, 2026). Read that source as it is written: Cloudflare's stated motives are governance, flexibility, and dogfooding its own gateway, with cost as one factor among several, not the headline.
Two cautions before treating the third path as a free lunch, both of which are my reading of documented events, not anyone's official warning. First, the BYO-subscription lever belongs to the model vendor, and the vendor can revoke it: over early 2026 Anthropic moved in stages to block Claude Pro/Max subscriptions from third-party harnesses, OpenCode included, clarifying the prohibition in its terms in February (The Register, Feb 20, 2026) and enforcing the cutoff that April. Second, owning the harness at enterprise scale carries a mandate-backfire risk: Amazon pushed its engineers to prioritize its in-house Kiro for production code, then within months reversed course, approving Claude Code, with Codex to follow, after internal pushback (The New Stack, May 5, 2026); Paper 1 covers that case in full. Owning the harness removes the vendor's opinions from your loop; it does not remove the model vendor from your dependency graph, or organizational physics from your rollout.
Cost pressure makes that discipline more urgent, not less. The pricing turbulence of mid-2026 (metered billing, premium agentic model tiers) tempts leaders toward the build path as an escape from vendor pricing, and the instinct deserves the same scrutiny "just use LangChain" once escaped. Gartner predicts that more than 40% of agentic AI projects will be canceled by the end of 2027, citing escalating costs, unclear business value, and inadequate risk controls; that is an analyst forecast, but it is aimed squarely at builds undertaken without a workflow-level case. Practitioner decision frameworks have converged on the same advice, working the choice per workflow rather than as a company-wide posture; Nate B. Jones's Build-Buy-Hire-Wait matrix is a representative public example. The Amazon arc above is the enterprise-scale version of the lesson: the question is never whether your organization could build a harness. It is whether owning one is a decision your workflows justify, made deliberately at the fork this article has been naming.
Where this article stops and the paper begins
This piece has carried exactly one distinction: a harness is a pre-wired agent runtime you adopt; a framework is a kit of parts you assemble; the open harnesses split the difference for teams that need to adopt and own at once; and which one you need is the first question of any agent project, not an implementation detail to defer. The fork is where this article stops.
"Harness Engineering" (Paper 1) is where you walk down it. The paper develops the full discipline the term now names: the nine components in engineering depth, the invariants that govern long-running loops, and the operational patterns that production harnesses encode. If this article clarified which side of the fork you are on, the paper is the map of the territory on the other side.
Companion to: "Harness Engineering" (Paper 1), harness-vs-framework taxonomy section.
Sources: LangChain (Vivek Trivedy), O'Reilly Radar (Addy Osmani), MindStudio, Arize AI, OpenAI, Latent Space, pi.dev and the blogs of Mario Zechner and Armin Ronacher, OpenCode docs, GitHub Changelog, Cloudflare engineering blog, Focused Labs (consultancy), Kai Waehner (named-expert blog), Gartner (analyst forecast), Nate B. Jones (practitioner newsletter), The Register, and The New Stack. Vendor-reported metrics are labeled as self-report inline; GitHub star counts and pricing are snapshots verified June 9, 2026. Drafted with AI assistance; all sources verified and final judgments are the author's.
This article is part of an ongoing research project tracking AI tooling and software engineering practices at daviddaniel.tech/research.