Agent Harnesses: A Standard for Structuring Agentic Systems

The word “harness” is everywhere in AI agent development — but reputable sources define it differently. A new open standard proposes to fix that with a practical, opinionated definition and a concrete directory convention.

What Is a Harness? (The Definitional Problem)

Definition 1 — The agent wrapper:

agent = model + harness — LangChain

Under this view, a harness is the scaffolding that turns a raw LLM into an agent: tools, memory, a prompt loop.

Definition 2 — The task wrapper:

We developed a two-fold solution to enable the Claude Agent SDK to work effectively across many context windows: an initializer agent that sets up the environment on the first run, and a coding agent that is tasked with making incremental progress in every session. — Anthropic Engineering

Under this view, a harness wraps an already-agentic system and applies it to a specific repeatable role.

The Agent Harnesses project argues for the second definition:

A harness is information and tools that allow a general purpose agentic system to do specific, complex tasks in a repeatable and maintainable manner.

The reasoning: “agent” is already a well-defined concept. Redefining it as model + harness is redundant. The real unsolved problem is how to take a general-purpose agent (like Claude Code) and reliably constrain it to a specialized role across sessions.

Why a Standard Is Needed

Ad-hoc approaches to structuring agent context — LLM wikis, loose CLAUDE.md files, scattered documentation — run into predictable problems at scale:

Slow initialization — the agent spends too long understanding its environment before doing useful work
Context blindness — the agent skips documentation entirely and works without grounding
Wiki entropy — when an LLM maintains its own wiki, it forgets where it wrote things, producing duplicate and contradictory content
Configuration drift — adjusting agent behaviour in a large unstructured wiki requires finding and updating content in many places

For Claude specifically, there’s an additional structural constraint: skills must live in a flat .claude directory and can’t be nested within a larger harness directory. This makes it hard to package a role’s prompts and skills together.

The Agent Harnesses Standard

The standard is a directory convention for structuring harnesses. The key idea is structured progressive disclosure: the agent loads a minimal entry point, reads routing files to navigate, and only loads detailed content when a task actually requires it.

Directory Structure

my-harness/
  ├── HARNESS.md          ← entry point, loaded every session
  ├── tools/
  │   ├── TOOLS.md        ← routing file for the tools/ subtree
  │   └── query-db/
  └── data/
      ├── DATA.md         ← routing file for the data/ subtree
      └── schema.md

HARNESS.md is the only required file. It uses a short YAML frontmatter block (name, description) followed by a brief markdown body: what the agent’s role is, and where to look for capabilities and context. It’s kept minimal because it’s loaded on every session start.

Routing files (TOOLS.md, DATA.md, etc.) are named after their parent directory in all-caps. They tell the agent what lives in that subtree without requiring it to scan every file. This convention propagates recursively through the directory tree.

Leaf directories prevent the agent from recursing into implementation details (skill internals, large content stores). Two mechanisms:

.harnessleaf — a marker file that makes any directory explicitly a leaf
.leaf-detectors — a root-level config that defines patterns for automatic leaf detection (e.g., skill=SKILL.md marks any directory containing a SKILL.md as a skill leaf)

Information Flow

Session start
    ↓
Load HARNESS.md  (always)
    ↓
Task arrives
    ↓
Read routing files to locate relevant context
    ↓
Load specific files only when needed

This mirrors how Claude Code’s own skills standard (SKILL.md) works — the harnesses standard is designed in direct inspiration from it, extending the same progressive-disclosure principle to the full scope of a role.

Using the Standard with Claude Code

The project ships a CLI tool:

pip install agentharnesses-cli

This installs ahar (Agent Harnesses). To initialize a directory:

ahar init .

The default target is Claude Code. This:

Creates the HARNESS.md entry point
Installs the Agent Harnesses “meta skill” into .claude/, which teaches Claude how to navigate the harness structure

Once initialized:

claude
# then: "load the harness"

Claude reads HARNESS.md, understands its role, and navigates the directory structure using routing files when it needs deeper context.

The Broader Debate

The underlying tension is that the AI agent tooling ecosystem is still converging on vocabulary. “Harness,” “agent,” “scaffold,” and “framework” are used interchangeably across blog posts, SDKs, and papers — making it hard to reason about what a tool actually does without reading the implementation.

The Agent Harnesses project is an attempt to claim and stabilize one of those terms with a practical, opinionated definition. Whether the definition wins adoption depends on how well the standard solves the real problems it targets: initialization speed, context discipline, and role maintainability.

Resources

Agent Harnesses Standard Docs
Original Reddit discussion (r/LangChain) — where the post was published and discussed
How to use Agent Harnesses with Claude — author’s companion article
GitHub: agentharnesses/agentharnesses — the standard spec
GitHub: agentharnesses/exampleharnesses — example harnesses
GitHub: agentharnesses/cli — ahar CLI source
LLM Agents: Intuitively and Exhaustively Explained — author’s prior work defining agents