What is skill injection in AI agents?

Skill injection is the process of loading domain-specific instructions, tool definitions, or expertise into an AI agent's context window so it can perform specialized tasks. Different frameworks inject skills at different points—system prompts, user messages, tool definitions, or prompt templates—and at different times, from session startup to on-demand activation.

What is progressive disclosure in AI agent context management?

Progressive disclosure is a technique borrowed from UI design where agents see only lightweight metadata (skill names and descriptions) at startup, and full skill content is loaded on-demand only when relevant. This can reduce token usage by 40-60% per session. Claude Code, LangChain Deep Agents, and OpenAI's Responses API all implement variations of this pattern.

Which AI agent framework is most token-efficient for skill injection?

Frameworks using progressive disclosure—Claude Code, LangChain Deep Agents, and OpenAI Responses API—are the most token-efficient because they only load full skill content when needed. AutoGen Teachability and Voyager are also efficient because they use semantic retrieval to inject only relevant knowledge per turn. CrewAI, SuperAGI, and CAMEL-AI inject all skill content upfront, which is simpler but more expensive.

How does FlowHunt handle skill injection for AI agents?

FlowHunt supports a hybrid approach where core skills defining an agent's domain expertise are always injected into the system prompt, while extended skills are loaded on-demand when needed. This balances reliability (agents always have their primary expertise) with efficiency (optional capabilities don't waste tokens when unused).

How AI Agents Actually Implement Skills: Complete Cross-Platform Comparison

Deep dive into the injection mechanics of 11 AI agent platforms: where skills land in the prompt, when they load, what they cost in tokens, and how they survive context compaction.

AI Agents LLM Context Management Agent Frameworks

Try it Now Book a Demo

Introduction

Every AI agent framework faces the same fundamental question: how do you make an LLM good at something specific? The model itself has broad general knowledge, but when you need it to perform a code review, deploy infrastructure, or navigate Minecraft—it needs specialized instructions, tool access, and domain context.

This is the skill injection problem. And every major framework solves it differently.

Some platforms dump everything into the system prompt upfront. Others use lazy loading, only revealing capabilities when the agent needs them. A few use vector databases to retrieve relevant skills based on semantic similarity. The differences aren’t academic—they directly affect token costs, agent reliability, and how many skills an agent can realistically juggle.

We analyzed 11 major AI agent platforms to understand exactly where skills land in the prompt, when they load, what they cost in tokens, and how they survive when the context window fills up. This isn’t a surface-level feature comparison. We dug into source code, documentation, and architecture diagrams to map out the precise injection mechanics of each platform.

Master Comparison Table

Here’s the complete overview before we dive into the details.

Injection Mechanics: Where, When, and How

Platform	Injection Point	When Loaded	Mechanism
Claude Code	System-reminder (metadata) + conversation message (body)	Metadata at session start; body on `/command` or auto-match	Framework injects metadata; Skill tool loads full body on activation
CrewAI	Task prompt (appended before LLM call)	Every task execution via `_finalize_task_prompt()`	`format_skill_context()` appends all skill bodies to prompt
LangChain Deep Agents	System prompt (metadata) + conversation history (body)	Metadata at startup; body when agent calls `read_file()`	SkillsMiddleware injects index; agent loads body via filesystem tool
OpenAI Responses API	User prompt context (platform-managed)	On `skill_reference` in API call	Platform appends metadata; model reads full SKILL.md on invocation
OpenAI Agents SDK	Tool definitions (deferred via ToolSearchTool)	Namespace names at creation; schemas on `ToolSearchTool` call	`tool_namespace()` + `ToolSearchTool()` for progressive discovery
AutoGen Teachability	Modified user message (retrieved memos injected)	Every turn — vector DB retrieval before each LLM call	Middleware intercepts message, queries ChromaDB, injects top-K matches
Semantic Kernel	Function-calling schemas + prompt template content	All schemas at startup; template content on function invocation	`kernel.add_plugin()` registers all; `kernel.invoke()` renders templates
MetaGPT	Action prompt template (rendered into LLM call)	When Role’s `_act()` fires for a specific Action	`Action.run()` formats `PROMPT_TEMPLATE`, sends via `aask()`
Voyager	Code generation prompt (retrieved skill code)	Before each code generation; embedding similarity search	`SkillLibrary.retrieve_skills()` injects top-5 as few-shot examples
DSPy	Compiled few-shot demos in Predict module prompts	Compiled offline by optimizer; fixed at runtime	BootstrapFewShot / MIPROv2 selects best demos; Predict renders into prompt
SuperAGI	Tool schemas in agent’s tool list	Agent creation — all toolkit tools registered upfront	`BaseToolkit.get_tools()` registers all as function-calling tools
CAMEL-AI	Function schemas + role system message	Agent creation — all tools registered upfront	`ChatAgent(tools=[*toolkit.get_tools()])` loads everything at init

Persistence, Token Cost, and Always-On Behavior

Platform	Always Present?	Persistence	Token Cost
Claude Code	Metadata: YES. Body: only after activation	Session-scoped. On compaction: re-attached (5K/skill, 25K cap)	~250 chars/skill metadata; 1% of context budget
CrewAI	YES — full body in every task prompt	Fresh injection per task; no cross-task persistence	Full body every call. 50K char soft limit
LangChain Deep Agents	Metadata: YES. Body: on-demand	Body stays in conversation history; subagent skills isolated	~100 tokens/skill metadata; body paid once (~3,302 tokens)
OpenAI Responses API	Name+desc: YES. Full body: on invocation	Single API response only; no cross-call persistence	Platform-managed
OpenAI Agents SDK	Namespace list: YES. Schemas: on demand	Single run only; re-discover per session	Minimal until activated
AutoGen Teachability	NO — only relevant memos per turn	Cross-session via ChromaDB; persists indefinitely	~3-5 memos per turn (variable)
Semantic Kernel	All schemas: YES. Templates: on invocation	In-memory per kernel instance; no cross-session	All schemas always present
MetaGPT	NO — only current Action’s template	Single action execution only	One template per turn
Voyager	NO — top-5 retrieved per task	Lifelong persistence in vector DB	~500-2,000 tokens per skill example
DSPy	YES — compiled demos baked in	Serializable to JSON; persists across sessions	Fixed after compilation (3-8 demos/module)
SuperAGI	YES — all schemas always present	Within agent session	All schemas always present
CAMEL-AI	YES — all schemas + role prompt	Within conversation session	All schemas always present

What “Skill Injection” Actually Means

Before diving into the comparison, let’s define the problem space. An AI agent’s context window—the total text the LLM sees on each call—has a fixed size. Every token of instruction, conversation history, tool definition, and retrieved data competes for space in that window.

A “skill” in the agent context is any structured package of expertise that changes how the agent behaves. This could be:

Instructions telling the agent how to approach a specific domain (code review guidelines, deployment checklists)
Tool definitions giving the agent callable functions (API integrations, file operations)
Few-shot examples showing the agent what good output looks like
Retrieved knowledge from vector databases or external documents

The injection mechanism—where and when this content enters the context—determines three critical properties:

Token efficiency: How many tokens does the skill consume, and is that cost paid even when the skill isn’t needed?
Reliability: Will the agent consistently use the skill when relevant, or might it miss the cue?
Scalability: How many skills can the agent access before context bloat degrades performance?

Every framework makes different tradeoffs across these three dimensions. Let’s examine each one.

The Injection Spectrum: From Always-On to On-Demand

Across all 11 platforms, skill injection approaches fall along a spectrum from “everything loaded upfront” to “nothing loaded until explicitly needed.”

At one end, platforms like CrewAI, SuperAGI, and CAMEL-AI inject the full content of every activated skill into every LLM call. The agent always has its complete expertise available. Simple, reliable, but expensive in tokens.

At the other end, Claude Code, LangChain Deep Agents, and OpenAI’s Responses API use progressive disclosure—the agent sees only skill names and short descriptions at startup, and full content loads on-demand. Efficient, scalable, but requires the agent to recognize when it needs a skill.

In the middle, AutoGen Teachability and Voyager use semantic retrieval to inject only the most relevant skills per turn, creating a dynamic, context-sensitive injection pattern.

And then there are unique approaches: DSPy compiles optimized few-shot examples offline and bakes them permanently into module prompts. MetaGPT encodes skills as action templates that activate only when a specific role transitions to a specific action.

Let’s examine each in detail.

Claude Code: Three-Layer Progressive Disclosure

Claude Code implements one of the most sophisticated skill injection architectures, using a three-layer progressive disclosure system that balances awareness with token efficiency.

Layer 1: Always in Context

At session start, every available skill’s name and description is injected into a system-reminder message—a metadata block the model always sees. This costs roughly 250 characters per skill, consuming about 1% of the context window budget for all skill descriptions combined (approximately 8K characters as a fallback budget, overridable via the SLASH_COMMAND_TOOL_CHAR_BUDGET environment variable).

Similarly, deferred tools—tools whose full JSON schemas haven’t been loaded yet—appear as a name-only list in system-reminder blocks. As of Claude Code v2.1.69, even built-in system tools like Bash, Read, Edit, Write, Glob, and Grep are deferred behind ToolSearch, reducing system tool context from approximately 14–16K tokens to roughly 968 tokens.

The agent sees enough to know what’s available without paying the token cost of full definitions.

Layer 2: On Activation

When a user types a slash command (e.g., /commit) or the model auto-matches a skill based on its description, the full SKILL.md body is loaded as a conversation message via the Skill tool. This body contains the complete instructions—sometimes thousands of tokens of detailed guidance.

Key detail: Shell preprocessing runs first (any !command directives in the skill file execute and their output replaces the directive), and once loaded, the skill body stays in the conversation for the rest of the session.

Layer 3: On Demand

Additional resources—reference documents, scripts, asset files—are only read when the model explicitly decides to use the Read tool to access them. These never load automatically.

Context Compaction Behavior

When the conversation approaches the context limit and compaction triggers, Claude Code re-attaches the most recently invoked skills with a budget of 5K tokens per skill and a 25K combined maximum. Most-recently-invoked skills get priority. Older skills may be dropped entirely.

This three-layer architecture means an agent with 20+ available skills pays a minimal upfront cost but can access full expertise on any of them within a single turn.

CrewAI: Full Injection Into Every Task Prompt

CrewAI skill injection: full body appended to every task prompt via format_skill_context()

CrewAI takes the opposite approach from progressive disclosure. When a skill is activated for an agent, its full content is injected into every task prompt the agent executes.

How It Works

Skills in CrewAI are self-contained directories, each with a SKILL.md file containing YAML frontmatter (name, description, license, compatibility, allowed tools) and a markdown body. The skill system distinguishes between skills and tools: skills inject instructions and context that shape how the agent thinks, while tools provide callable functions for actions.

During agent initialization, Agent.set_skills() calls discover_skills() to scan skill directories at the metadata level, then activate_skill() to read full skill bodies. At task execution time, _finalize_task_prompt() calls format_skill_context() for each activated skill and appends all formatted skill content to the task prompt.

The LLM receives: [system message] + [task prompt + ALL skill bodies]

Token Implications

CrewAI imposes a soft warning at 50,000 characters per skill but no hard limit. The documentation recommends keeping skills focused and concise because large prompt injections dilute the model’s attention—a real concern given research on context rot.

The tradeoff is straightforward: the agent always has full expertise available (high reliability), but token cost scales linearly with the number of skills per task (low efficiency). For agents with 1-2 focused skills, this works well. For agents needing broad capability sets, it becomes expensive fast.

No Cross-Task Persistence

Each task gets a fresh injection. There’s no accumulation of skill content across tasks—which is actually a feature, not a bug. It means each task starts with a clean context, avoiding the staleness problems that session-based persistence can create.

LangChain Deep Agents: Agent-Controlled Loading via SkillsMiddleware

LangChain Deep Agents implements a sophisticated middleware-based skill system where the agent itself decides when to load full skill content—a true progressive disclosure model where the agent controls activation.

The Three Tiers

Tier 1 (Index): SkillsMiddleware parses all SKILL.md frontmatter at startup and injects a lightweight index into the system prompt. This index contains only names and descriptions, costing approximately 278 tokens per skill versus 3,302 tokens for full content.

Tier 2 (Full Content): When the agent determines a skill is relevant, it calls read_file() on the skill’s SKILL.md path. This is a regular tool call—the framework doesn’t inject the body; the agent makes a deliberate decision to load it. The full content enters the conversation history as a tool result.

Tier 3 (Deep Dive): Supporting materials, reference docs, and scripts are only accessed when the agent explicitly reads them.

Token Efficiency in Practice

With 12 skills, progressive disclosure reduces context from approximately 30,000 tokens (all loaded) to roughly 600 tokens (index only), expanding to 2,000–5,000 when relevant skills are loaded for a specific task. That’s a potential 83–98% reduction in skill-related token consumption.

Multiple skill sources can be layered, and when names collide, the last source wins. Files over 10 MB are automatically skipped.

The Key Difference from Claude Code

While Claude Code uses a dedicated Skill tool to trigger loading, Deep Agents repurposes the agent’s existing read_file tool. This means the loading mechanism is transparent—the agent reads skill files the same way it reads any other file. The downside is that there’s no special compaction behavior: skill content that enters conversation history is subject to standard LangChain message trimming, with no priority treatment.

OpenAI Responses API and Agents SDK: Platform-Managed Deferred Loading

OpenAI implements skill injection through two distinct but philosophically aligned mechanisms: the Responses API’s tool_search tool type and the Agents SDK’s ToolSearchTool.

Responses API: tool_search

The tool_search tool type (available on GPT-5.4+) allows developers to defer large tool surfaces until runtime. Three deferral strategies are available:

Individual function deferral: @function_tool(defer_loading=True) — the model sees the function name and description but the parameter schema is deferred. Saves parameter-level tokens.
Namespace deferral: tool_namespace(name=..., description=..., tools=[...]) — groups functions under a single namespace. The model sees only the namespace name and description, saving significantly more tokens.
MCP server deferral: HostedMCPTool(tool_config={..., "defer_loading": True}) — defers entire MCP server tool surfaces.

When the model determines it needs a specific tool, it issues a tool_search call. The API returns 3-5 relevant tool definitions, injected at the end of the context window to preserve prompt caching.

Agents SDK: ToolSearchTool

The Agents SDK provides a programmatic equivalent. Tool namespaces are registered but not loaded:

crm_tools = tool_namespace(
    name="crm",
    description="CRM management tools",
    tools=[...]
)
agent = Agent(tools=[*crm_tools, ToolSearchTool()])

At runtime, the agent sees only namespace names. It calls ToolSearchTool("crm") to discover and load the full schemas, then can call individual tools within that namespace.

No Cross-Request Persistence

Each API request is independent. Discovered tools don’t persist across calls. This is the most stateless approach in our comparison—clean, predictable, but requiring re-discovery on every request if tools change.

AutoGen Teachability: Per-Turn Semantic Retrieval

AutoGen Teachability per-turn retrieval loop: message intercept, ChromaDB query, memo injection, learning loop

AutoGen’s Teachability capability takes a fundamentally different approach from every other framework in this comparison. Instead of injecting static skill content, it dynamically retrieves relevant “memos” from a ChromaDB vector database on every single turn.

The Per-Turn Retrieval Loop

Teachability registers a hook on process_last_received_message that intercepts every incoming user message before the agent processes it:

A TextAnalyzerAgent extracts key concepts from the incoming message
These concepts are used to query ChromaDB (using Sentence Transformer embeddings by default)
The top-K most relevant memos are retrieved (configurable via max_num_retrievals, default 10)
Retrieved memos are appended to the message text before the agent sees it

Critically, the modified message does not propagate into stored conversation history—only the original message is stored. This prevents memo content from compounding across turns.

Learning Loop

After the LLM responds, a second hook analyzes the response for new learnings:

TextAnalyzerAgent identifies new knowledge in the response
New memos are extracted as key-value pairs (input text → output text)
These memos are stored in ChromaDB, available for future turns and sessions

This creates a genuine learning loop where the agent accumulates expertise over time.

Cross-Session Persistence

AutoGen Teachability is one of only three platforms in our comparison (alongside Voyager and DSPy) that persists skills across sessions. The ChromaDB database lives on disk, meaning an agent can learn from interactions on Monday and apply that knowledge on Friday.

The recall_threshold parameter (default 1.5) controls how similar a message must be to a stored memo for retrieval, and reset_db can clear the entire memory when needed.

Token Efficiency

Since only relevant memos are injected per turn (typically 3-5), the token cost is naturally bounded regardless of how large the memo database grows. An agent with 10,000 stored memos still only pays for the handful most relevant to the current turn.

Semantic Kernel: Plugin Schemas as Always-Present Tool Definitions

Microsoft’s Semantic Kernel takes a straightforward approach: plugins are collections of KernelFunction objects registered with the Kernel, and their schemas are exposed to the LLM as function-calling tool definitions.

Two Injection Paths

Function Calling: When ToolCallBehavior.AutoInvokeKernelFunctions is set, all registered functions are sent to the LLM as available tools in every API request. The LLM decides which to call; Semantic Kernel handles invocation and result routing.

Prompt Templates: Semantic Kernel’s template syntax ({{plugin.function}}, Handlebars, or Liquid) allows functions to be called inline during prompt rendering. Results are embedded directly in the prompt text before it reaches the LLM—a form of eager evaluation rather than lazy tool calling.

No Progressive Disclosure

Every registered plugin’s schema is included in every API call. There’s no built-in deferred loading, namespace grouping, or on-demand activation. The documentation explicitly recommends importing only the plugins needed for a specific scenario to reduce token consumption and miscalls.

This makes Semantic Kernel one of the most predictable platforms—you always know exactly what the agent has access to—but it limits scalability. An agent with 50 registered functions pays the full schema cost on every single call.

Persistence

Plugin registration is per-Kernel-instance and in-memory. There’s no built-in mechanism for cross-session skill persistence.

MetaGPT: Action Templates Within Role-Based SOPs

MetaGPT role-based SOP: Role with persona, react mode selection, active Action template, aask() LLM call

MetaGPT encodes skills not as standalone packages but as action templates embedded within Standard Operating Procedures (SOPs) that govern role behavior.

Role and Action Architecture

Each Role in MetaGPT has a persona prefix injected into prompts and a set of Action classes. Each Action contains an LLM proxy invoked via aask(), which uses natural language prompt templates to structure the LLM call.

When Role._act() fires, it supports three react modes:

"react": The LLM dynamically selects actions in think-act loops
"by_order": Actions execute sequentially in a predetermined order
"plan_and_act": The agent plans first, then executes actions according to the plan

Narrow Injection Window

Only the current Action’s prompt template is active at any given moment. The agent doesn’t see templates for other actions—it only sees its role prefix plus the specific action’s context. This is the narrowest injection window of any framework we examined.

Context parsing functions within Action classes extract relevant information from inputs, so each action receives a curated subset of available context rather than the full conversation history.

Single-Turn Persistence

The template is rendered fresh for each action execution. There’s no accumulation or cross-session persistence. This keeps each action focused but means the agent can’t build on previously loaded skill content within a single workflow.

Voyager: Embedding-Based Skill Retrieval for Lifelong Learning

Voyager, the Minecraft exploration agent from NVIDIA and Caltech, implements one of the most elegant skill injection architectures: a growing library of verified programs retrieved by embedding similarity.

The Skill Library

When Voyager writes code that passes self-verification (the generated Mineflayer JavaScript actually works in the game), the code and its documentation string are stored in a vector database. The docstring embedding becomes the retrieval key.

Per-Task Retrieval

On each new task proposed by the automatic curriculum:

The task description and environment feedback are embedded
Cosine similarity search against all stored skill embeddings
Top-5 most relevant skills are retrieved
Retrieved skill code is included in the action agent’s prompt as few-shot examples

The prompt looks like this:

You are a Minecraft bot. Here are some relevant skills you've learned:

// Skill: mineWoodLog
async function mineWoodLog(bot) { ... }

// Skill: craftPlanks
async function craftPlanks(bot) { ... }

Now write code to: build a wooden pickaxe

The generated code can call retrieved skills by name, enabling compositional skill building—complex behaviors constructed from simpler, verified primitives.

Lifelong Persistence

The skill library is the core “lifelong learning” mechanism. It grows across the agent’s entire lifetime, and new skills build on old ones. Unlike most frameworks where skills are authored by humans, Voyager’s skills are generated, verified, and stored by the agent itself.

Token cost is naturally bounded: regardless of whether the library contains 50 or 5,000 skills, each task only pays for the 5 most relevant retrievals.

DSPy: Compiled Few-Shot Examples as Frozen Skills

DSPy compilation: BootstrapFewShot and MIPROv2 optimizers compile frozen few-shot demos into Predict module prompts

DSPy takes a radically different approach from every other framework. Instead of injecting skills at runtime, DSPy compiles optimal few-shot demonstrations offline and bakes them permanently into module prompts.

The Compilation Process

Two main optimizers handle compilation:

BootstrapFewShot: Uses a teacher module to generate traces through the program. Traces that pass a user-defined metric are kept as demonstrations. Each dspy.Predict module within the program gets its own curated set of demonstrations.

MIPROv2 (Multi-prompt Instruction Proposal Optimizer v2): A three-phase process:

Bootstrap: Generate candidate demonstration sets
Propose: Generate candidate instruction texts that are aware of both the data distribution and the demonstrations
Search: Bayesian optimization over the combined space of instructions × demonstrations across all modules

Parameters like max_bootstrapped_demos (generated examples) and max_labeled_demos (from training data) control how many examples end up in each module’s prompt.

Fixed After Compilation

Once compiled, demonstrations are stored in each Predict module’s demos attribute and formatted into the prompt on every LLM call. They don’t change at runtime—the “skill” is frozen.

This means DSPy skills are the most predictable in our comparison: token cost is known after compilation, there’s no variance between turns, and the agent always sees the same demonstrations. The downside is inflexibility—to change skills, you must recompile.

Persistence

Compiled programs serialize to JSON, including all demonstrations. They’re fully persistent and loadable across sessions, making DSPy one of the most durable skill storage mechanisms.

SuperAGI: Toolkit-Based Upfront Registration

SuperAGI and CAMEL-AI upfront toolkit registration: all tool schemas loaded at agent initialization

SuperAGI uses a traditional toolkit pattern where all tools are registered at agent initialization.

Each toolkit extends BaseToolkit with:

name and description attributes
get_tools() method returning a list of BaseTool instances
get_env_keys() for required environment variables

Toolkits are installed from GitHub repositories via SuperAGI’s tool manager. At agent initialization, BaseToolkit.get_tools() returns all tools, and their complete schemas are exposed to the LLM as function-calling definitions.

There’s no deferred loading, no progressive disclosure, and no per-turn filtering. Every registered tool’s schema is present in every call. This is the simplest injection model and works well for agents with focused, small tool sets but doesn’t scale to agents needing dozens of capabilities.

CAMEL-AI: ChatAgent Tool Registration

CAMEL-AI follows a similar upfront registration pattern. Tools from various toolkits (e.g., MathToolkit, SearchToolkit) are passed as a list to ChatAgent(tools=[...]) at initialization.

The framework emphasizes that custom functions need clear argument names and comprehensive docstrings so the model can understand usage—the tool schema is the only “skill” content the model sees. There’s no separate instruction injection mechanism.

Recent additions include MCP (Model Context Protocol) support via MCPToolkit, allowing ChatAgent to connect to MCP servers and register external tools. This expands the available tool surface but doesn’t change the injection model—all discovered MCP tools are still registered upfront.

Cross-Platform Comparison

When Skills Are Injected

Timing	Platforms	What’s Injected
Always present (session start)	Claude Code, CrewAI, Deep Agents, Semantic Kernel, SuperAGI, CAMEL-AI, DSPy	Metadata (name + description) or full schemas
On activation (user or agent triggered)	Claude Code, Deep Agents, OpenAI	Full skill body
Every task/turn	CrewAI, AutoGen Teachability	Full body (CrewAI) or retrieved memos (AutoGen)
On LLM selection	Semantic Kernel, MetaGPT	Prompt template content
On similarity match	Voyager, AutoGen Teachability	Retrieved code or memos
Compiled/fixed	DSPy	Optimized few-shot examples

Persistence Models

Persistence	Platforms	Mechanism
Single turn only	MetaGPT, Voyager	Template rendered per-action / per-generation
Within session	Claude Code, Deep Agents, OpenAI, Semantic Kernel	Body stays in message history
Re-injected every task	CrewAI, SuperAGI, CAMEL-AI	Appended fresh each task execution
Cross-session (persistent storage)	AutoGen Teachability, Voyager, DSPy	Vector DB / compiled modules / skill library

Context Compaction Survival

Platform	What Happens When Context Gets Full
Claude Code	Re-attaches most recent skills (5K tokens each, 25K cap). Older skills dropped
CrewAI	N/A—injected fresh per task, no accumulation
Deep Agents	Body in conversation history, subject to standard LangChain trimming
OpenAI	N/A—each API call is independent
AutoGen	Only relevant memos retrieved per-turn, naturally bounded
Voyager	Only top-K skills retrieved per task, naturally bounded

The Progressive Disclosure Pattern

The most significant architectural trend across these platforms is the adoption of progressive disclosure—a concept borrowed from UI design where information is revealed incrementally based on need.

Why Progressive Disclosure Matters

A naive approach to skill injection—loading everything upfront—creates two problems:

Token waste: Most skills aren’t relevant to most turns. Loading 20 full skill bodies when only 1-2 are needed per turn wastes 90%+ of skill-related tokens.
Attention dilution: Research on context rot shows that LLMs perform worse when their context contains large amounts of irrelevant information. More skills in context can actually reduce the quality of skill application.

Progressive disclosure solves both problems by maintaining a lightweight index of available skills while loading full content only when needed.

Implementation Variations

Claude Code uses a dedicated system: skill metadata in system-reminder messages, a Skill tool for activation, and ToolSearch for deferred tool schemas. The framework manages injection automatically with priority-based compaction.

LangChain Deep Agents uses the agent’s existing file-reading capability: SkillsMiddleware injects the index, and the agent loads full content via read_file(). This is more transparent but offers less framework-level optimization.

OpenAI Responses API uses namespace-based grouping with platform-managed search: tool namespaces provide high-level descriptions, and tool_search returns relevant schemas. The platform handles the search logic entirely.

Token Savings in Practice

The numbers are compelling. With 12 skills:

Always-on injection (CrewAI/SuperAGI style): ~30,000 tokens
Progressive disclosure index only: ~600 tokens
Index + 2 activated skills: ~2,000–5,000 tokens

That’s a 83–98% reduction in skill-related token consumption per turn. Over a long session with hundreds of turns, the savings compound dramatically.

Architectural Patterns and Tradeoffs

Looking across all 11 platforms, four distinct architectural patterns emerge:

Pattern 1: Always-On Injection

Used by: CrewAI, SuperAGI, CAMEL-AI, Semantic Kernel

How it works: Full skill content or tool schemas are present in every LLM call.

Pros:

Maximum reliability—the agent always has full expertise available
Simplest implementation—no activation logic needed
Predictable token costs—same every turn

Cons:

Token cost scales linearly with number of skills
Attention dilution with many skills
Doesn’t scale beyond ~5-10 skills per agent

Best for: Focused agents with 1-3 core skills that are always relevant.

Pattern 2: Progressive Disclosure

Used by: Claude Code, LangChain Deep Agents, OpenAI Responses API/Agents SDK

How it works: Lightweight metadata always present; full content loaded on-demand.

Pros:

Scales to dozens or hundreds of available skills
Minimal token cost when skills aren’t needed
Preserves prompt cache when full schemas append at end

Cons:

Agent might miss the cue to activate a relevant skill
Additional latency from the activation step
More complex framework implementation

Best for: General-purpose agents that need access to many capabilities but use only a few per task.

Pattern 3: Semantic Retrieval

Used by: AutoGen Teachability, Voyager

How it works: Vector database queries surface relevant skills/knowledge based on semantic similarity to the current context.

Pros:

Naturally bounded token cost regardless of library size
Content relevance improves over time as the library grows
Cross-session learning and accumulation
No explicit activation needed—relevance is computed automatically

Cons:

Retrieval quality depends on embedding model quality
Risk of retrieving outdated or subtly wrong information
Requires vector database infrastructure
Less predictable—different turns load different content

Best for: Agents that learn from experience and need to accumulate domain knowledge over time.

Pattern 4: Compiled/Static Injection

Used by: DSPy, MetaGPT

How it works: Skills are compiled into fixed prompt content (DSPy) or activated through rigid action templates (MetaGPT).

Pros:

Most predictable behavior—same content every time
Optimization can be done offline (DSPy’s compilation)
No runtime overhead for skill selection
Proven effective for well-defined, repeatable tasks

Cons:

Inflexible—changing skills requires recompilation (DSPy) or code changes (MetaGPT)
Can’t adapt to novel situations outside the compiled examples
DSPy’s compilation process itself requires many LLM calls

Best for: Production pipelines with well-defined tasks where reliability trumps flexibility.

Practical Implications for Agent Builders

Choosing the Right Pattern

The right skill injection architecture depends on your agent’s profile:

If your agent has a narrow, well-defined role (e.g., a code review bot, a customer support agent for one product), always-on injection (CrewAI/SuperAGI pattern) is simplest and most reliable. The token cost of 2-3 always-present skills is manageable, and you avoid the complexity of activation logic.

If your agent needs broad capabilities but uses only a few per interaction (e.g., a developer assistant, a general-purpose automation agent), progressive disclosure (Claude Code/Deep Agents pattern) is the clear winner. The 83-98% token savings at scale are too significant to ignore.

If your agent needs to learn and improve from interactions (e.g., a personal assistant, a domain expert that accumulates knowledge), semantic retrieval (AutoGen Teachability pattern) provides the learning loop other patterns lack. Just ensure you have quality controls on what enters the knowledge base.

If your agent runs well-defined pipelines (e.g., data processing, report generation, standardized workflows), compiled injection (DSPy pattern) gives you the most predictable, optimized behavior.

The Hybrid Approach

For production agent teams where agents need to work out of the box, we recommend a hybrid approach:

Core skills (1-2 per agent, defining their primary domain expertise): always injected into the system prompt, CrewAI-style. These are non-negotiable capabilities the agent needs on every turn.

Extended skills (additional capabilities the agent might need): metadata only in the system prompt, loaded via a search/load mechanism when needed, Deep Agents-style. These expand the agent’s capability set without paying the token cost when they’re not relevant.

Learned knowledge (accumulated domain expertise): stored in a vector database and retrieved semantically per-turn, AutoGen-style. This allows the agent to improve over time without manual skill authoring.

This layered architecture maps naturally to how a system prompt is built: date → persona → system instructions → core skills → skill index → role/team context. The core skills and index add a predictable, manageable token cost, while the full skill bodies only appear when needed.

Token Budget Best Practices Across Frameworks

Regardless of which injection pattern you use, these token management strategies apply universally:

Cache-Friendly Ordering

Stack unchanging context (system instructions, tool schemas) at the front of the prompt. On providers that support prompt caching, cached tokens cost 75% less. Claude Code and OpenAI both inject discovered tool schemas at the end of the context specifically to preserve cache hits on the static prefix.

Offloading

Summarize tool responses rather than keeping full results in context. Store the complete data in external references that the agent can read on demand. This is especially important for agents that make many tool calls per session.

Reduction

Compact conversation history through summarization. Extract key facts from long exchanges into condensed representations. Every framework with session-based persistence benefits from aggressive history management.

Retrieval Over Pre-Loading

Dynamically fetch relevant information at runtime rather than loading everything upfront. This applies to skills, knowledge bases, and even conversation history. Studies show this can reduce prompt sizes by up to 70%.

Isolation

Use sub-agents for specific tasks so each agent’s context stays focused. Rather than giving one agent 20 skills, create a team of 5 agents with 4 skills each. Each agent maintains a lean context window, and the team collectively covers the full capability set.

Conclusion

The way AI agent frameworks inject skills into context is one of the most consequential architectural decisions in agent design—yet it’s rarely discussed at this level of detail.

The field is clearly converging on progressive disclosure as the preferred pattern for general-purpose agents, with Claude Code, LangChain Deep Agents, and OpenAI all independently arriving at similar three-tier architectures. Meanwhile, specialized patterns like semantic retrieval (AutoGen, Voyager) and compiled injection (DSPy) serve important niches that progressive disclosure alone doesn’t address.

For practitioners building agent systems today, the key insight is that skill injection isn’t a one-size-fits-all problem. The right approach depends on your agent’s role, the number of skills it needs, whether it needs to learn over time, and your tolerance for token costs versus reliability tradeoffs.

The most robust production systems will likely combine multiple patterns—always-on for core capabilities, progressive disclosure for extended skills, and semantic retrieval for accumulated knowledge—creating agents that are both efficient and expert.

Frequently asked questions

: Skill injection is the process of loading domain-specific instructions, tool definitions, or expertise into an AI agent's context window so it can perform specialized tasks. Different frameworks inject skills at different points—system prompts, user messages, tool definitions, or prompt templates—and at different times, from session startup to on-demand activation.
: Progressive disclosure is a technique borrowed from UI design where agents see only lightweight metadata (skill names and descriptions) at startup, and full skill content is loaded on-demand only when relevant. This can reduce token usage by 40-60% per session. Claude Code, LangChain Deep Agents, and OpenAI's Responses API all implement variations of this pattern.
: Frameworks using progressive disclosure—Claude Code, LangChain Deep Agents, and OpenAI Responses API—are the most token-efficient because they only load full skill content when needed. AutoGen Teachability and Voyager are also efficient because they use semantic retrieval to inject only relevant knowledge per turn. CrewAI, SuperAGI, and CAMEL-AI inject all skill content upfront, which is simpler but more expensive.
: FlowHunt supports a hybrid approach where core skills defining an agent's domain expertise are always injected into the system prompt, while extended skills are loaded on-demand when needed. This balances reliability (agents always have their primary expertise) with efficiency (optional capabilities don't waste tokens when unused).

Build Smarter AI Agents with FlowHunt

Design AI agent teams with intelligent skill injection and context management. No code required.

Try it Now Book a Demo

Learn more

Prompt Injection Attacks: How Hackers Hijack AI Chatbots

Prompt injection is the #1 LLM security risk. Learn how attackers hijack AI chatbots through direct and indirect injection, with real-world examples and concret...

Mar 12, 2026 10 min read

AI Security Prompt Injection +3

Why Top Engineers Are Ditching MCP Servers: 3 Proven Alternatives for Efficient AI Agents

Discover why leading engineers are moving away from MCP servers and explore three proven alternatives—CLI-based approaches, script-based tools, and code executi...

Nov 11, 2025 18 min read

AI Agents MCP +3

LLM As a Judge for AI Evaluation

A comprehensive guide to using Large Language Models as judges for evaluating AI agents and chatbots. Learn about LLM As a Judge methodology, best practices for...

Jul 28, 2025 9 min read

AI LLM +10

How AI Agents Actually Implement Skills: Complete Cross-Platform Comparison

Introduction

Master Comparison Table

Injection Mechanics: Where, When, and How

Persistence, Token Cost, and Always-On Behavior

Ready to grow your business?

What “Skill Injection” Actually Means

The Injection Spectrum: From Always-On to On-Demand

Join our newsletter

Claude Code: Three-Layer Progressive Disclosure

Layer 1: Always in Context

Layer 2: On Activation

Layer 3: On Demand

Context Compaction Behavior

CrewAI: Full Injection Into Every Task Prompt

How It Works

Token Implications

No Cross-Task Persistence

LangChain Deep Agents: Agent-Controlled Loading via SkillsMiddleware

The Three Tiers

Token Efficiency in Practice

The Key Difference from Claude Code

OpenAI Responses API and Agents SDK: Platform-Managed Deferred Loading

Responses API: tool_search

Agents SDK: ToolSearchTool

No Cross-Request Persistence

AutoGen Teachability: Per-Turn Semantic Retrieval

The Per-Turn Retrieval Loop

Learning Loop

Cross-Session Persistence

Token Efficiency

Semantic Kernel: Plugin Schemas as Always-Present Tool Definitions

Two Injection Paths

No Progressive Disclosure

Persistence

MetaGPT: Action Templates Within Role-Based SOPs

Role and Action Architecture

Narrow Injection Window

Single-Turn Persistence

Voyager: Embedding-Based Skill Retrieval for Lifelong Learning

The Skill Library

Per-Task Retrieval

Lifelong Persistence

DSPy: Compiled Few-Shot Examples as Frozen Skills

The Compilation Process

Fixed After Compilation

Persistence

SuperAGI: Toolkit-Based Upfront Registration

CAMEL-AI: ChatAgent Tool Registration

Cross-Platform Comparison

When Skills Are Injected

Persistence Models

Context Compaction Survival

The Progressive Disclosure Pattern

Why Progressive Disclosure Matters

Implementation Variations

Token Savings in Practice

Architectural Patterns and Tradeoffs

Pattern 1: Always-On Injection

Pattern 2: Progressive Disclosure

Pattern 3: Semantic Retrieval

Pattern 4: Compiled/Static Injection

Practical Implications for Agent Builders

Choosing the Right Pattern

The Hybrid Approach

Token Budget Best Practices Across Frameworks

Cache-Friendly Ordering

Offloading

Reduction

Retrieval Over Pre-Loading

Isolation

Conclusion

Frequently asked questions

Build Smarter AI Agents with FlowHunt

Learn more

Prompt Injection Attacks: How Hackers Hijack AI Chatbots

Why Top Engineers Are Ditching MCP Servers: 3 Proven Alternatives for Efficient AI Agents

LLM As a Judge for AI Evaluation

Cookie Settings

Necessary Cookies