Complete Bibliographic Citation

Bui, Nghi D. Q. (2026). “Building Effective AI Coding Agents for the Terminal: Scaffolding, Harness, Context Engineering, and Lessons Learned.” arXiv preprint, arXiv:2603.05344 [cs.AI]. Available: https://arxiv.org/abs/2603.05344


Summary

This paper presents OPENDEV, an open-source Rust-based CLI coding agent, and provides formal academic definitions for core concepts in AI agent engineering. It is among the most technically rigorous sources on Harness Engineering, grounding concepts like context compaction, dual-agent design, and safety-layered orchestration in a working implementation.

The paper’s central thesis: effective autonomous AI coding assistance is not primarily a model problem — it is an engineering problem. The harness, scaffolding, and context management infrastructure determine whether a capable LLM becomes a reliable agent.

Formal Definitions

The paper establishes precise vocabulary:

  • Scaffolding: The construction phase that runs before the first user prompt. Covers system prompt compilation, tool schema building, and subagent registration. Every agent in OPENDEV is fully constructed before the conversation lifecycle begins.
  • Harness: The runtime orchestration infrastructure that transforms a stateless LLM into a persistent, tool-using agent. Responsible for tool execution, context management, safety enforcement, and session persistence.
  • Context Engineering: Treating context management as a first-class engineering concern, not an afterthought. OPENDEV implements four subsystems: System Reminders, Prompt Composer, Memory, and Compaction.

Dual-Agent Design

OPENDEV separates planning from execution through two agent modes:

  • Plan Mode: A read-only Planner subagent explores the codebase, analyzes patterns, and produces structured plans requiring user approval before any writes occur
  • Normal Mode: Full read-write tool access for implementation

The key insight: write operations are excluded from the Planner’s tool schema entirely — the LLM never sees tool definitions it cannot use. This architectural constraint eliminates write attempts during planning by design, not by instruction.

Adaptive Context Compaction

As token budgets approach limits, OPENDEV applies a five-stage compaction pipeline integrated directly into the Extended ReAct reasoning loop. Each stage applies progressively more aggressive reduction to older observations. This addresses context rot — the degradation of reasoning quality as older, lower-relevance content accumulates in the window.

Lazy Tool Discovery

External tools are discovered on-demand via MCP (Model Context Protocol) rather than loaded upfront. A search_tools mechanism provides keyword-scored tool discovery, keeping the initial system prompt lean while preserving access to a large tool ecosystem.

Memory Accumulation

Cross-session continuity is achieved through two systems:

  • Episodic Memory: Summaries of historical conversations
  • Working Memory: Current session context

Together, these enable the agent to accumulate project-specific knowledge across sessions — patterns, strategies, and codebase-specific conventions that would otherwise be re-derived from scratch.

Safety: Defense-in-Depth

Five independent layers prevent harmful operations:

  1. Prompt-Level Guardrails: Security policies in the system prompt
  2. Schema-Level Restrictions: Plan-mode whitelist, per-subagent tool filtering
  3. Runtime Approval System: Manual, Semi-Auto, and Auto levels with persistent permissions
  4. Tool-Level Validation: DANGEROUS_PATTERNS blocklist, timeouts, stale-read detection
  5. Lifecycle Hooks: User-defined pre-execution blocking and argument mutation

Design principle: no single point of failure compromises the system. Each layer operates independently.

Compound AI Architecture

OPENDEV routes different workloads to different LLMs — five distinct model roles (normal execution, deliberation, self-critique, vision, fallback). This reflects a broader trend: state-of-the-art AI results increasingly come from systems that compose multiple models and tools rather than from a single model call.

Key Lessons

  • Context pressure is the central constraint: The context window is the limiting resource in long-running coding sessions, not model capability
  • Steer behavior over long horizons with explicit decision trees, not prompt instructions alone
  • Safety through architectural constraints is more reliable than safety through instructions
  • Design for approximate outputs: terminal environments require handling imperfect LLM outputs gracefully

Key Concepts Extracted

Sources

  • Bui, Nghi D. Q. (2026). “Building Effective AI Coding Agents for the Terminal: Scaffolding, Harness, Context Engineering, and Lessons Learned.” arXiv preprint, arXiv:2603.05344 [cs.AI]. Available: https://arxiv.org/abs/2603.05344
    • Primary source: full system description (OPENDEV), formal definitions, experimental implementation, and lessons learned

Fair Use Notice

This note contains summaries and analysis of copyrighted material for educational and commentary purposes. This constitutes fair use/fair dealing under copyright law. The original work remains the property of its copyright holders. Full citation provided above.

Note

This content was drafted with assistance from AI tools for research, organization, and initial content generation. All final content has been reviewed, fact-checked, and edited by the author to ensure accuracy and alignment with the author’s intentions and perspective.