Context management for AI agents is not a single technique — it is a layered strategy. The naive expectation is that a capable model plus a clear task is sufficient. In practice, the information architecture surrounding the model determines whether it stays coherent, stays within budget, and produces reliable output across long or complex sessions. This note argues that robust context management requires four coordinated layers: controlling what enters the context, managing what accumulates, configuring what persists, and removing noise before it compounds.

1. The Central Problem: Context as a Finite Resource

AI agents operate on a bounded context window. The failure mode is not merely exhausting that window — it is filling it with the wrong content.

Context-Rot names this precisely: as an agent works through a long session, the window accumulates intermediate noise — failed attempts, redundant tool outputs, superseded reasoning artifacts — while relevant signal gets buried. Liu et al. (2023) empirically demonstrated the consequence: a U-shaped performance curve where information in the middle of long contexts is systematically underutilized, even in models designed for long-context tasks. Horthy (2026) calls this “the dumb zone” — a region where distractor effects intensify and performance drops more steeply than token count alone predicts.

The problem has two orthogonal dimensions:

  • What enters: loading too much at the start saturates the window before real work begins
  • What accumulates: intermediate session noise progressively displaces operative attention

Solving one dimension without the other is insufficient. A strategy that controls initial loading but ignores accumulation still degrades over long sessions. A strategy that aggressively compacts but front-loads everything still starts with a degraded signal-to-noise ratio.

2. What Enters: Progressive Disclosure

Progressive-Disclosure-Context is the proactive prevention layer — it controls what enters the context rather than waiting to manage what accumulates.

The principle: load information in stages, on demand, rather than all at once at session start. The three-level hierarchy:

  • Level 1 — Metadata (~100 tokens): Skill names and descriptions only; the agent knows what capabilities exist without paying to load them
  • Level 2 — Instructions (<5k tokens): The full skill body loads when triggered; activated expertise without front-loading everything
  • Level 3 — Resources: Individual reference files, documentation, and datasets load only when a specific step requires them

The practical impact is substantial. Five MCP servers can consume 50,000+ tokens — up to 40% of the context window — in tool definitions alone before the first prompt is processed. Progressive disclosure prevents this saturation from occurring.

Agent-Skills are the implementation mechanism for Level 1 and Level 2. AGENTS-md-Files contribute to Level 1 and Level 2 as well — always-loaded ambient context that configures the agent for the repository.

3. What Stays: Compaction and Isolation

Even with disciplined progressive disclosure, agents accumulate context over extended sessions. Two complementary mechanisms address what stays:

Adaptive compaction (Adaptive-Context-Compaction) applies variable compression based on recency and relevance:

  • Recent observations remain verbatim — full fidelity, maximum tokens
  • Intermediate history is lightly summarized — key decisions and outcomes retained
  • Oldest accumulated noise is aggressively condensed — failed attempts, redundant outputs, superseded reasoning

Critically, compaction differs from truncation: it preserves semantic continuity while shedding verbatim detail. Kang et al. (2025) demonstrated that optimized compression guidelines reduce peak token usage 26–54% while preserving 95%+ task accuracy — a compaction budget that makes long-session coherence achievable.

Sub-agent isolation (Sub-Agents-Context-Isolation) prevents accumulation at source. Rather than managing noise after it enters the parent context, spawning a sub-agent means the intermediate work of the subtask — retries, tool call noise, exploratory dead-ends — never enters the parent’s context window at all. The parent receives only the final result. This is the architectural equivalent of function encapsulation: implementation details are hidden, interfaces stay clean.

The two mechanisms are complementary: compaction manages what has accumulated; sub-agent isolation prevents accumulation from occurring. For tasks with clear input/output boundaries and significant tool use, isolation is the higher-leverage choice. For tasks that must run in the parent context, compaction is the fallback.

4. What to Configure Persistently: AGENTS.md Design

AGENTS-md-Files are the always-on ambient layer — repository-level markdown files injected into the agent’s context at the start of every session, before the agent sees any task.

This persistent baseline creates a category of its own: unlike progressive disclosure (which loads on demand) and compaction (which manages accumulation), AGENTS.md content enters every session unconditionally. This makes design discipline critical:

  • Keep it short (under 60 lines): every line costs tokens across every session
  • Universally applicable only: instructions that don’t apply to most tasks consume budget without benefit; edge cases belong in skills or task-specific prompts
  • Non-inferable content only: custom build commands, non-standard test runner flags, naming conventions the model cannot derive from the code are worth including; architecture overviews and directory listings are not
  • Never auto-generate: Gloaguen et al. (2026) found LLM-generated files reduced task resolve rates by ~3% while inflating inference cost by over 20% — a costly negative signal

The design principle bridges progressive disclosure: AGENTS.md should declare what capabilities exist (Level 1 metadata) and how to trigger them, deferring detailed how-to guidance to referenced skills and prompt files.

5. The Integrated Strategy

The four layers are mutually reinforcing, not alternatives:

LayerMechanismTimingAddresses
Progressive disclosureSkills, on-demand resourcesBefore accumulationWhat enters
Adaptive compactionContext summarizationDuring accumulationWhat stays
Sub-agent isolationContext firewallsBefore accumulationWhat accumulates per task
AGENTS.mdAmbient configurationAlways-on baselinePersistent configuration

Three integration principles complete the strategy:

Failure-only surfacing (Back-Pressure-Mechanisms): back-pressure mechanisms must swallow passing output and emit only failures. Every passing test that prints 200 lines of “OK” wastes context budget that could hold task-relevant content. Context discipline extends to the verification layer.

External state as context relief: progress files, JSON feature lists, and git history allow agents to offload in-context state to persistent external storage. When state is externalized, compaction has less to preserve, sub-agents can be handed clean task inputs, and AGENTS.md doesn’t need to carry state that belongs in artifacts.

Team-level harness configuration: context management is not an individual practice — it is a team infrastructure concern. AGENTS.md is version-controlled and shared. Skills are collectively maintained. The quality of the shared harness determines the average quality of every agent session run against that repository.

Taken together, these layers transform context from an unmanaged resource that degrades silently into a deliberately engineered environment — the difference between agents that drift and agents that stay sharp.

Sources

  • Original synthesis based on combining Context-Engineering, Context-Rot, AGENTS-md-Files, Progressive-Disclosure-Context, Adaptive-Context-Compaction, and Sub-Agents-Context-Isolation

  • Liu, Nelson F., Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang (2023). “Lost in the Middle: How Language Models Use Long Contexts.” Transactions of the Association for Computational Linguistics (TACL). DOI: https://doi.org/10.48550/arXiv.2307.03172

    • Empirical evidence for the U-shaped performance curve in long contexts; foundational evidence for context rot as a quality problem, not merely a capacity problem
  • Kang, Minki, Wei-Ning Chen, Dongge Han, Huseyin A. Inan, Lukas Wutschitz, Yanzhi Chen, Robert Sim, and Saravan Rajmohan (2025). “ACON: Optimizing Context Compression for Long-horizon LLM Agents.” arXiv preprint, arXiv:2510.00615. Available: https://arxiv.org/abs/2510.00615

    • Empirical demonstration that optimized compression guidelines reduce peak token usage 26–54% while preserving 95%+ task accuracy; provides quantitative basis for adaptive compaction as a viable strategy
  • Gloaguen, Thibaud, Niels Mündler, Mark Müller, Veselin Raychev, and Martin Vechev (2026). “Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?” arXiv preprint, arXiv:2602.11988. ETH Zurich SRI Lab. Available: https://arxiv.org/abs/2602.11988

    • Key empirical finding: LLM-generated AGENTS.md files reduce resolve rates by ~3% while inflating inference cost by >20%; provides evidence for design discipline requirements in ambient configuration
  • Horthy, Dex (2026). “Skill Issue: Harness Engineering for Coding Agents.” HumanLayer Blog. Available: https://www.humanlayer.dev/blog/skill-issue-harness-engineering-for-coding-agents

    • Introduces sub-agents as context firewalls; defines the dumb zone; covers the failure-only surfacing discipline for back-pressure mechanisms; integrates multiple context management techniques into a unified harness framework
  • Anthropic (2025). “Agent Skills Overview.” Claude Platform Documentation. Retrieved March 2026. Available: https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview

    • Defines the three-level progressive disclosure hierarchy with exact token costs; establishes skills as the primary implementation mechanism for on-demand context loading
  • Anthropic (2026). “Effective Harnesses for Long-Running Agents.” Anthropic Engineering Blog. Available: https://www.anthropic.com/engineering/effective-harnesses-for-long-running-agents

    • Practical patterns for session handoffs, external state files, and multi-session context management; demonstrates that compaction alone is insufficient for multi-session continuity

Note

This content was drafted with assistance from AI tools for research, organization, and initial content generation. All final content has been reviewed, fact-checked, and edited by the author to ensure accuracy and alignment with the author’s intentions and perspective.