Progressive disclosure of context is the principle of loading information into an AI agent’s context window in stages, on demand, rather than all at once at session start. Only what is needed for the current task enters the context; everything else remains dormant on the filesystem.

The Three Loading Levels

The model follows a layered hierarchy, echoing a manual’s table of contents structure:

  • Level 1 — Metadata (~100 tokens, always loaded): Skill name and description only. The agent knows what capabilities exist without paying to load them. Many skills can be installed with minimal overhead.
  • Level 2 — Instructions (<5k tokens, loaded when triggered): The full SKILL.md body loads when the agent determines the skill is relevant. This is the activated expertise — instructions, workflows, guidance.
  • Level 3 — Resources (effectively unlimited, loaded on-demand): Bundled reference files, documentation, scripts, datasets. The agent reads only the specific files needed for the current step. A skill can include dozens of reference files; if only one is needed, only one loads.

Why It Matters

The naive alternative — pre-loading all available instructions, documentation, and tools at session startup — fills the context window before any real work begins. Five MCP servers can consume 50,000+ tokens (up to 40% of the context window) in tool definitions alone, before the first prompt is processed.

Progressive disclosure addresses this directly:

  • Reduces extraneous context overhead dramatically
  • Keeps the context window available for task-relevant reasoning
  • Enables comprehensive capability sets without proportional token costs
  • Aligns with research showing attention quality degrades as context length increases

UX Origins

The principle borrows directly from interface design. Nielsen Norman Group formalized progressive disclosure as an interaction pattern in the 1990s, itself rooted in Carroll and Rosson’s (1984) “training wheels” research showing that hiding advanced features early improved both learning rates and final performance. The core insight transfers cleanly: cognitive load management in human interfaces and token cost management in agent systems follow the same logic.

Tradeoffs

  • Access overhead: Loading a skill or resource introduces a small latency and read operation
  • Trigger reliability: LLM-driven triggering introduces uncertainty; human-driven or deterministic lifecycle triggers are more predictable
  • Maintenance cost: Building and curating a layered skill library requires ongoing effort

Extension: Kiro Powers

Kiro’s Powers system applies progressive disclosure at the MCP level. Instead of loading all tool definitions from all servers upfront, keyword-triggered activation loads only the relevant Power when the agent detects the right signals in the conversation.

Sources

  • Anthropic (2025). “Agent Skills Overview.” Claude Platform Documentation. Retrieved March 2026. https://platform.claude.com/docs/en/agents-and-tools/agent-skills/overview

    • Defines three loading levels with exact token costs; source of the key quote “progressive disclosure ensures only relevant content occupies the context window at any given time”
  • Böckeler, Birgitta (2026, February 5). “Context Engineering for Coding Agents.” MartinFowler.com. https://martinfowler.com/articles/exploring-gen-ai/context-engineering-coding-agents.html

    • Identifies three loading trigger mechanisms (LLM-driven, human-driven, deterministic) and the core tradeoff of balancing context — not too little, not too much
  • Nielsen, Jakob (2006). “Progressive Disclosure.” Nielsen Norman Group. https://www.nngroup.com/articles/progressive-disclosure/

    • Traces UX origins of the concept; establishes progressive disclosure as cognitive load management, revealing information on demand to reduce extraneous cognitive load
  • Carroll, John M. (1984). “Training Wheels in a User Interface.” Communications of the ACM, 27(8), 800–806. https://www.researchgate.net/publication/220424654_Training_Wheels_in_a_User_Interface

    • Original research showing that hiding advanced functionality early improved both learning rates and eventual expert performance — foundational evidence for progressive disclosure as principle
  • Mason, Tony (2026). “The Missing Memory Hierarchy: Demand Paging for LLM Context Windows.” arXiv preprint, arXiv:2603.09023. https://arxiv.org/abs/2603.09023

    • Frames LLM context limits as “virtual memory problems wearing different clothes”; reports up to 93% context reduction through demand paging, providing quantitative evidence for progressive disclosure at the infrastructure layer

Note

This content was drafted with assistance from AI tools for research, organization, and initial content generation. All final content has been reviewed, fact-checked, and edited by the author to ensure accuracy and alignment with the author’s intentions and perspective.