How AI Agents Actually Implement Skills: Complete Cross-Platform Comparison

AI Agents LLM Context Management Agent Frameworks

Introduction

Every AI agent framework faces the same fundamental question: how do you make an LLM good at something specific? The model itself has broad general knowledge, but when you need it to perform a code review, deploy infrastructure, or navigate Minecraft—it needs specialized instructions, tool access, and domain context.

This is the skill injection problem. And every major framework solves it differently.

Some platforms dump everything into the system prompt upfront. Others use lazy loading, only revealing capabilities when the agent needs them. A few use vector databases to retrieve relevant skills based on semantic similarity. The differences aren’t academic—they directly affect token costs, agent reliability, and how many skills an agent can realistically juggle.

We analyzed 11 major AI agent platforms to understand exactly where skills land in the prompt, when they load, what they cost in tokens, and how they survive when the context window fills up. This isn’t a surface-level feature comparison. We dug into source code, documentation, and architecture diagrams to map out the precise injection mechanics of each platform.

Master Comparison Table

Here’s the complete overview before we dive into the details.

Injection Mechanics: Where, When, and How

PlatformInjection PointWhen LoadedMechanism
Claude CodeSystem-reminder (metadata) + conversation message (body)Metadata at session start; body on /command or auto-matchFramework injects metadata; Skill tool loads full body on activation
CrewAITask prompt (appended before LLM call)Every task execution via _finalize_task_prompt()format_skill_context() appends all skill bodies to prompt
LangChain Deep AgentsSystem prompt (metadata) + conversation history (body)Metadata at startup; body when agent calls read_file()SkillsMiddleware injects index; agent loads body via filesystem tool
OpenAI Responses APIUser prompt context (platform-managed)On skill_reference in API callPlatform appends metadata; model reads full SKILL.md on invocation
OpenAI Agents SDKTool definitions (deferred via ToolSearchTool)Namespace names at creation; schemas on ToolSearchTool calltool_namespace() + ToolSearchTool() for progressive discovery
AutoGen TeachabilityModified user message (retrieved memos injected)Every turn — vector DB retrieval before each LLM callMiddleware intercepts message, queries ChromaDB, injects top-K matches
Semantic KernelFunction-calling schemas + prompt template contentAll schemas at startup; template content on function invocationkernel.add_plugin() registers all; kernel.invoke() renders templates
MetaGPTAction prompt template (rendered into LLM call)When Role’s _act() fires for a specific ActionAction.run() formats PROMPT_TEMPLATE, sends via aask()
VoyagerCode generation prompt (retrieved skill code)Before each code generation; embedding similarity searchSkillLibrary.retrieve_skills() injects top-5 as few-shot examples
DSPyCompiled few-shot demos in Predict module promptsCompiled offline by optimizer; fixed at runtimeBootstrapFewShot / MIPROv2 selects best demos; Predict renders into prompt
SuperAGITool schemas in agent’s tool listAgent creation — all toolkit tools registered upfrontBaseToolkit.get_tools() registers all as function-calling tools
CAMEL-AIFunction schemas + role system messageAgent creation — all tools registered upfrontChatAgent(tools=[*toolkit.get_tools()]) loads everything at init

Persistence, Token Cost, and Always-On Behavior

PlatformAlways Present?PersistenceToken Cost
Claude CodeMetadata: YES. Body: only after activationSession-scoped. On compaction: re-attached (5K/skill, 25K cap)~250 chars/skill metadata; 1% of context budget
CrewAIYES — full body in every task promptFresh injection per task; no cross-task persistenceFull body every call. 50K char soft limit
LangChain Deep AgentsMetadata: YES. Body: on-demandBody stays in conversation history; subagent skills isolated~100 tokens/skill metadata; body paid once (~3,302 tokens)
OpenAI Responses APIName+desc: YES. Full body: on invocationSingle API response only; no cross-call persistencePlatform-managed
OpenAI Agents SDKNamespace list: YES. Schemas: on demandSingle run only; re-discover per sessionMinimal until activated
AutoGen TeachabilityNO — only relevant memos per turnCross-session via ChromaDB; persists indefinitely~3-5 memos per turn (variable)
Semantic KernelAll schemas: YES. Templates: on invocationIn-memory per kernel instance; no cross-sessionAll schemas always present
MetaGPTNO — only current Action’s templateSingle action execution onlyOne template per turn
VoyagerNO — top-5 retrieved per taskLifelong persistence in vector DB~500-2,000 tokens per skill example
DSPyYES — compiled demos baked inSerializable to JSON; persists across sessionsFixed after compilation (3-8 demos/module)
SuperAGIYES — all schemas always presentWithin agent sessionAll schemas always present
CAMEL-AIYES — all schemas + role promptWithin conversation sessionAll schemas always present
Logo

Ready to grow your business?

Start your free trial today and see results within days.

What “Skill Injection” Actually Means

Before diving into the comparison, let’s define the problem space. An AI agent’s context window—the total text the LLM sees on each call—has a fixed size. Every token of instruction, conversation history, tool definition, and retrieved data competes for space in that window.

A “skill” in the agent context is any structured package of expertise that changes how the agent behaves. This could be:

  • Instructions telling the agent how to approach a specific domain (code review guidelines, deployment checklists)
  • Tool definitions giving the agent callable functions (API integrations, file operations)
  • Few-shot examples showing the agent what good output looks like
  • Retrieved knowledge from vector databases or external documents

The injection mechanism—where and when this content enters the context—determines three critical properties:

  1. Token efficiency: How many tokens does the skill consume, and is that cost paid even when the skill isn’t needed?
  2. Reliability: Will the agent consistently use the skill when relevant, or might it miss the cue?
  3. Scalability: How many skills can the agent access before context bloat degrades performance?

Every framework makes different tradeoffs across these three dimensions. Let’s examine each one.

The Injection Spectrum: From Always-On to On-Demand

Across all 11 platforms, skill injection approaches fall along a spectrum from “everything loaded upfront” to “nothing loaded until explicitly needed.”

At one end, platforms like CrewAI, SuperAGI, and CAMEL-AI inject the full content of every activated skill into every LLM call. The agent always has its complete expertise available. Simple, reliable, but expensive in tokens.

At the other end, Claude Code, LangChain Deep Agents, and OpenAI’s Responses API use progressive disclosure—the agent sees only skill names and short descriptions at startup, and full content loads on-demand. Efficient, scalable, but requires the agent to recognize when it needs a skill.

In the middle, AutoGen Teachability and Voyager use semantic retrieval to inject only the most relevant skills per turn, creating a dynamic, context-sensitive injection pattern.

And then there are unique approaches: DSPy compiles optimized few-shot examples offline and bakes them permanently into module prompts. MetaGPT encodes skills as action templates that activate only when a specific role transitions to a specific action.

Let’s examine each in detail.

Claude Code: Three-Layer Progressive Disclosure

Claude Code three-layer progressive disclosure: always-on metadata, on-activation skill body, on-demand resources

Claude Code implements one of the most sophisticated skill injection architectures, using a three-layer progressive disclosure system that balances awareness with token efficiency.

Layer 1: Always in Context

At session start, every available skill’s name and description is injected into a system-reminder message—a metadata block the model always sees. This costs roughly 250 characters per skill, consuming about 1% of the context window budget for all skill descriptions combined (approximately 8K characters as a fallback budget, overridable via the SLASH_COMMAND_TOOL_CHAR_BUDGET environment variable).

Similarly, deferred tools—tools whose full JSON schemas haven’t been loaded yet—appear as a name-only list in system-reminder blocks. As of Claude Code v2.1.69, even built-in system tools like Bash, Read, Edit, Write, Glob, and Grep are deferred behind ToolSearch, reducing system tool context from approximately 14–16K tokens to roughly 968 tokens.

The agent sees enough to know what’s available without paying the token cost of full definitions.

Layer 2: On Activation

When a user types a slash command (e.g., /commit) or the model auto-matches a skill based on its description, the full SKILL.md body is loaded as a conversation message via the Skill tool. This body contains the complete instructions—sometimes thousands of tokens of detailed guidance.

Key detail: Shell preprocessing runs first (any !command directives in the skill file execute and their output replaces the directive), and once loaded, the skill body stays in the conversation for the rest of the session.

Layer 3: On Demand

Additional resources—reference documents, scripts, asset files—are only read when the model explicitly decides to use the Read tool to access them. These never load automatically.

Context Compaction Behavior

When the conversation approaches the context limit and compaction triggers, Claude Code re-attaches the most recently invoked skills with a budget of 5K tokens per skill and a 25K combined maximum. Most-recently-invoked skills get priority. Older skills may be dropped entirely.

This three-layer architecture means an agent with 20+ available skills pays a minimal upfront cost but can access full expertise on any of them within a single turn.

CrewAI: Full Injection Into Every Task Prompt

CrewAI skill injection: full body appended to every task prompt via format_skill_context()

CrewAI takes the opposite approach from progressive disclosure. When a skill is activated for an agent, its full content is injected into every task prompt the agent executes.

How It Works

Skills in CrewAI are self-contained directories, each with a SKILL.md file containing YAML frontmatter (name, description, license, compatibility, allowed tools) and a markdown body. The skill system distinguishes between skills and tools: skills inject instructions and context that shape how the agent thinks, while tools provide callable functions for actions.

During agent initialization, Agent.set_skills() calls discover_skills() to scan skill directories at the metadata level, then activate_skill() to read full skill bodies. At task execution time, _finalize_task_prompt() calls format_skill_context() for each activated skill and appends all formatted skill content to the task prompt.

The LLM receives: [system message] + [task prompt + ALL skill bodies]

Token Implications

CrewAI imposes a soft warning at 50,000 characters per skill but no hard limit. The documentation recommends keeping skills focused and concise because large prompt injections dilute the model’s attention—a real concern given research on context rot.

The tradeoff is straightforward: the agent always has full expertise available (high reliability), but token cost scales linearly with the number of skills per task (low efficiency). For agents with 1-2 focused skills, this works well. For agents needing broad capability sets, it becomes expensive fast.

No Cross-Task Persistence

Each task gets a fresh injection. There’s no accumulation of skill content across tasks—which is actually a feature, not a bug. It means each task starts with a clean context, avoiding the staleness problems that session-based persistence can create.

LangChain Deep Agents: Agent-Controlled Loading via SkillsMiddleware

LangChain Deep Agents three-tier skill loading: index via SkillsMiddleware, full content via read_file, deep dive on demand

LangChain Deep Agents implements a sophisticated middleware-based skill system where the agent itself decides when to load full skill content—a true progressive disclosure model where the agent controls activation.

The Three Tiers

Tier 1 (Index): SkillsMiddleware parses all SKILL.md frontmatter at startup and injects a lightweight index into the system prompt. This index contains only names and descriptions, costing approximately 278 tokens per skill versus 3,302 tokens for full content.

Tier 2 (Full Content): When the agent determines a skill is relevant, it calls read_file() on the skill’s SKILL.md path. This is a regular tool call—the framework doesn’t inject the body; the agent makes a deliberate decision to load it. The full content enters the conversation history as a tool result.

Tier 3 (Deep Dive): Supporting materials, reference docs, and scripts are only accessed when the agent explicitly reads them.

Token Efficiency in Practice

With 12 skills, progressive disclosure reduces context from approximately 30,000 tokens (all loaded) to roughly 600 tokens (index only), expanding to 2,000–5,000 when relevant skills are loaded for a specific task. That’s a potential 83–98% reduction in skill-related token consumption.

Multiple skill sources can be layered, and when names collide, the last source wins. Files over 10 MB are automatically skipped.

The Key Difference from Claude Code

While Claude Code uses a dedicated Skill tool to trigger loading, Deep Agents repurposes the agent’s existing read_file tool. This means the loading mechanism is transparent—the agent reads skill files the same way it reads any other file. The downside is that there’s no special compaction behavior: skill content that enters conversation history is subject to standard LangChain message trimming, with no priority treatment.

OpenAI Responses API and Agents SDK: Platform-Managed Deferred Loading

OpenAI deferred tool loading: three deferral strategies with platform-managed tool_search

OpenAI implements skill injection through two distinct but philosophically aligned mechanisms: the Responses API’s tool_search tool type and the Agents SDK’s ToolSearchTool.

The tool_search tool type (available on GPT-5.4+) allows developers to defer large tool surfaces until runtime. Three deferral strategies are available:

  • Individual function deferral: @function_tool(defer_loading=True) — the model sees the function name and description but the parameter schema is deferred. Saves parameter-level tokens.
  • Namespace deferral: tool_namespace(name=..., description=..., tools=[...]) — groups functions under a single namespace. The model sees only the namespace name and description, saving significantly more tokens.
  • MCP server deferral: HostedMCPTool(tool_config={..., "defer_loading": True}) — defers entire MCP server tool surfaces.

When the model determines it needs a specific tool, it issues a tool_search call. The API returns 3-5 relevant tool definitions, injected at the end of the context window to preserve prompt caching.

Agents SDK: ToolSearchTool

The Agents SDK provides a programmatic equivalent. Tool namespaces are registered but not loaded:

crm_tools = tool_namespace(
    name="crm",
    description="CRM management tools",
    tools=[...]
)
agent = Agent(tools=[*crm_tools, ToolSearchTool()])

At runtime, the agent sees only namespace names. It calls ToolSearchTool("crm") to discover and load the full schemas, then can call individual tools within that namespace.

No Cross-Request Persistence

Each API request is independent. Discovered tools don’t persist across calls. This is the most stateless approach in our comparison—clean, predictable, but requiring re-discovery on every request if tools change.

AutoGen Teachability: Per-Turn Semantic Retrieval

AutoGen Teachability per-turn retrieval loop: message intercept, ChromaDB query, memo injection, learning loop

AutoGen’s Teachability capability takes a fundamentally different approach from every other framework in this comparison. Instead of injecting static skill content, it dynamically retrieves relevant “memos” from a ChromaDB vector database on every single turn.

The Per-Turn Retrieval Loop

Teachability registers a hook on process_last_received_message that intercepts every incoming user message before the agent processes it:

  1. A TextAnalyzerAgent extracts key concepts from the incoming message
  2. These concepts are used to query ChromaDB (using Sentence Transformer embeddings by default)
  3. The top-K most relevant memos are retrieved (configurable via max_num_retrievals, default 10)
  4. Retrieved memos are appended to the message text before the agent sees it

Critically, the modified message does not propagate into stored conversation history—only the original message is stored. This prevents memo content from compounding across turns.

Learning Loop

After the LLM responds, a second hook analyzes the response for new learnings:

  1. TextAnalyzerAgent identifies new knowledge in the response
  2. New memos are extracted as key-value pairs (input text → output text)
  3. These memos are stored in ChromaDB, available for future turns and sessions

This creates a genuine learning loop where the agent accumulates expertise over time.

Cross-Session Persistence

AutoGen Teachability is one of only three platforms in our comparison (alongside Voyager and DSPy) that persists skills across sessions. The ChromaDB database lives on disk, meaning an agent can learn from interactions on Monday and apply that knowledge on Friday.

The recall_threshold parameter (default 1.5) controls how similar a message must be to a stored memo for retrieval, and reset_db can clear the entire memory when needed.

Token Efficiency

Since only relevant memos are injected per turn (typically 3-5), the token cost is naturally bounded regardless of how large the memo database grows. An agent with 10,000 stored memos still only pays for the handful most relevant to the current turn.

Semantic Kernel: Plugin Schemas as Always-Present Tool Definitions

Semantic Kernel two injection paths: function calling with all schemas always present and prompt template rendering

Microsoft’s Semantic Kernel takes a straightforward approach: plugins are collections of KernelFunction objects registered with the Kernel, and their schemas are exposed to the LLM as function-calling tool definitions.

Two Injection Paths

Function Calling: When ToolCallBehavior.AutoInvokeKernelFunctions is set, all registered functions are sent to the LLM as available tools in every API request. The LLM decides which to call; Semantic Kernel handles invocation and result routing.

Prompt Templates: Semantic Kernel’s template syntax ({{plugin.function}}, Handlebars, or Liquid) allows functions to be called inline during prompt rendering. Results are embedded directly in the prompt text before it reaches the LLM—a form of eager evaluation rather than lazy tool calling.

No Progressive Disclosure

Every registered plugin’s schema is included in every API call. There’s no built-in deferred loading, namespace grouping, or on-demand activation. The documentation explicitly recommends importing only the plugins needed for a specific scenario to reduce token consumption and miscalls.

This makes Semantic Kernel one of the most predictable platforms—you always know exactly what the agent has access to—but it limits scalability. An agent with 50 registered functions pays the full schema cost on every single call.

Persistence

Plugin registration is per-Kernel-instance and in-memory. There’s no built-in mechanism for cross-session skill persistence.

MetaGPT: Action Templates Within Role-Based SOPs

MetaGPT role-based SOP: Role with persona, react mode selection, active Action template, aask() LLM call

MetaGPT encodes skills not as standalone packages but as action templates embedded within Standard Operating Procedures (SOPs) that govern role behavior.

Role and Action Architecture

Each Role in MetaGPT has a persona prefix injected into prompts and a set of Action classes. Each Action contains an LLM proxy invoked via aask(), which uses natural language prompt templates to structure the LLM call.

When Role._act() fires, it supports three react modes:

  • "react": The LLM dynamically selects actions in think-act loops
  • "by_order": Actions execute sequentially in a predetermined order
  • "plan_and_act": The agent plans first, then executes actions according to the plan

Narrow Injection Window

Only the current Action’s prompt template is active at any given moment. The agent doesn’t see templates for other actions—it only sees its role prefix plus the specific action’s context. This is the narrowest injection window of any framework we examined.

Context parsing functions within Action classes extract relevant information from inputs, so each action receives a curated subset of available context rather than the full conversation history.

Single-Turn Persistence

The template is rendered fresh for each action execution. There’s no accumulation or cross-session persistence. This keeps each action focused but means the agent can’t build on previously loaded skill content within a single workflow.

Voyager: Embedding-Based Skill Retrieval for Lifelong Learning

Voyager skill library: curriculum proposes task, embedding search retrieves top-5 skills, code generation with lifelong learning loop

Voyager, the Minecraft exploration agent from NVIDIA and Caltech, implements one of the most elegant skill injection architectures: a growing library of verified programs retrieved by embedding similarity.

The Skill Library

When Voyager writes code that passes self-verification (the generated Mineflayer JavaScript actually works in the game), the code and its documentation string are stored in a vector database. The docstring embedding becomes the retrieval key.

Per-Task Retrieval

On each new task proposed by the automatic curriculum:

  1. The task description and environment feedback are embedded
  2. Cosine similarity search against all stored skill embeddings
  3. Top-5 most relevant skills are retrieved
  4. Retrieved skill code is included in the action agent’s prompt as few-shot examples

The prompt looks like this:

You are a Minecraft bot. Here are some relevant skills you've learned:

// Skill: mineWoodLog
async function mineWoodLog(bot) { ... }

// Skill: craftPlanks
async function craftPlanks(bot) { ... }

Now write code to: build a wooden pickaxe

The generated code can call retrieved skills by name, enabling compositional skill building—complex behaviors constructed from simpler, verified primitives.

Lifelong Persistence

The skill library is the core “lifelong learning” mechanism. It grows across the agent’s entire lifetime, and new skills build on old ones. Unlike most frameworks where skills are authored by humans, Voyager’s skills are generated, verified, and stored by the agent itself.

Token cost is naturally bounded: regardless of whether the library contains 50 or 5,000 skills, each task only pays for the 5 most relevant retrievals.

DSPy: Compiled Few-Shot Examples as Frozen Skills

DSPy compilation: BootstrapFewShot and MIPROv2 optimizers compile frozen few-shot demos into Predict module prompts

DSPy takes a radically different approach from every other framework. Instead of injecting skills at runtime, DSPy compiles optimal few-shot demonstrations offline and bakes them permanently into module prompts.

The Compilation Process

Two main optimizers handle compilation:

BootstrapFewShot: Uses a teacher module to generate traces through the program. Traces that pass a user-defined metric are kept as demonstrations. Each dspy.Predict module within the program gets its own curated set of demonstrations.

MIPROv2 (Multi-prompt Instruction Proposal Optimizer v2): A three-phase process:

  1. Bootstrap: Generate candidate demonstration sets
  2. Propose: Generate candidate instruction texts that are aware of both the data distribution and the demonstrations
  3. Search: Bayesian optimization over the combined space of instructions × demonstrations across all modules

Parameters like max_bootstrapped_demos (generated examples) and max_labeled_demos (from training data) control how many examples end up in each module’s prompt.

Fixed After Compilation

Once compiled, demonstrations are stored in each Predict module’s demos attribute and formatted into the prompt on every LLM call. They don’t change at runtime—the “skill” is frozen.

This means DSPy skills are the most predictable in our comparison: token cost is known after compilation, there’s no variance between turns, and the agent always sees the same demonstrations. The downside is inflexibility—to change skills, you must recompile.

Persistence

Compiled programs serialize to JSON, including all demonstrations. They’re fully persistent and loadable across sessions, making DSPy one of the most durable skill storage mechanisms.

SuperAGI: Toolkit-Based Upfront Registration

SuperAGI and CAMEL-AI upfront toolkit registration: all tool schemas loaded at agent initialization

SuperAGI uses a traditional toolkit pattern where all tools are registered at agent initialization.

Each toolkit extends BaseToolkit with:

  • name and description attributes
  • get_tools() method returning a list of BaseTool instances
  • get_env_keys() for required environment variables

Toolkits are installed from GitHub repositories via SuperAGI’s tool manager. At agent initialization, BaseToolkit.get_tools() returns all tools, and their complete schemas are exposed to the LLM as function-calling definitions.

There’s no deferred loading, no progressive disclosure, and no per-turn filtering. Every registered tool’s schema is present in every call. This is the simplest injection model and works well for agents with focused, small tool sets but doesn’t scale to agents needing dozens of capabilities.

CAMEL-AI: ChatAgent Tool Registration

CAMEL-AI follows a similar upfront registration pattern. Tools from various toolkits (e.g., MathToolkit, SearchToolkit) are passed as a list to ChatAgent(tools=[...]) at initialization.

The framework emphasizes that custom functions need clear argument names and comprehensive docstrings so the model can understand usage—the tool schema is the only “skill” content the model sees. There’s no separate instruction injection mechanism.

Recent additions include MCP (Model Context Protocol) support via MCPToolkit, allowing ChatAgent to connect to MCP servers and register external tools. This expands the available tool surface but doesn’t change the injection model—all discovered MCP tools are still registered upfront.

Cross-Platform Comparison

When Skills Are Injected

TimingPlatformsWhat’s Injected
Always present (session start)Claude Code, CrewAI, Deep Agents, Semantic Kernel, SuperAGI, CAMEL-AI, DSPyMetadata (name + description) or full schemas
On activation (user or agent triggered)Claude Code, Deep Agents, OpenAIFull skill body
Every task/turnCrewAI, AutoGen TeachabilityFull body (CrewAI) or retrieved memos (AutoGen)
On LLM selectionSemantic Kernel, MetaGPTPrompt template content
On similarity matchVoyager, AutoGen TeachabilityRetrieved code or memos
Compiled/fixedDSPyOptimized few-shot examples

Persistence Models

PersistencePlatformsMechanism
Single turn onlyMetaGPT, VoyagerTemplate rendered per-action / per-generation
Within sessionClaude Code, Deep Agents, OpenAI, Semantic KernelBody stays in message history
Re-injected every taskCrewAI, SuperAGI, CAMEL-AIAppended fresh each task execution
Cross-session (persistent storage)AutoGen Teachability, Voyager, DSPyVector DB / compiled modules / skill library

Context Compaction Survival

PlatformWhat Happens When Context Gets Full
Claude CodeRe-attaches most recent skills (5K tokens each, 25K cap). Older skills dropped
CrewAIN/A—injected fresh per task, no accumulation
Deep AgentsBody in conversation history, subject to standard LangChain trimming
OpenAIN/A—each API call is independent
AutoGenOnly relevant memos retrieved per-turn, naturally bounded
VoyagerOnly top-K skills retrieved per task, naturally bounded

The Progressive Disclosure Pattern

The most significant architectural trend across these platforms is the adoption of progressive disclosure—a concept borrowed from UI design where information is revealed incrementally based on need.

Why Progressive Disclosure Matters

A naive approach to skill injection—loading everything upfront—creates two problems:

  1. Token waste: Most skills aren’t relevant to most turns. Loading 20 full skill bodies when only 1-2 are needed per turn wastes 90%+ of skill-related tokens.
  2. Attention dilution: Research on context rot shows that LLMs perform worse when their context contains large amounts of irrelevant information. More skills in context can actually reduce the quality of skill application.

Progressive disclosure solves both problems by maintaining a lightweight index of available skills while loading full content only when needed.

Implementation Variations

Claude Code uses a dedicated system: skill metadata in system-reminder messages, a Skill tool for activation, and ToolSearch for deferred tool schemas. The framework manages injection automatically with priority-based compaction.

LangChain Deep Agents uses the agent’s existing file-reading capability: SkillsMiddleware injects the index, and the agent loads full content via read_file(). This is more transparent but offers less framework-level optimization.

OpenAI Responses API uses namespace-based grouping with platform-managed search: tool namespaces provide high-level descriptions, and tool_search returns relevant schemas. The platform handles the search logic entirely.

Token Savings in Practice

The numbers are compelling. With 12 skills:

  • Always-on injection (CrewAI/SuperAGI style): ~30,000 tokens
  • Progressive disclosure index only: ~600 tokens
  • Index + 2 activated skills: ~2,000–5,000 tokens

That’s a 83–98% reduction in skill-related token consumption per turn. Over a long session with hundreds of turns, the savings compound dramatically.

Architectural Patterns and Tradeoffs

Looking across all 11 platforms, four distinct architectural patterns emerge:

Pattern 1: Always-On Injection

Used by: CrewAI, SuperAGI, CAMEL-AI, Semantic Kernel

How it works: Full skill content or tool schemas are present in every LLM call.

Pros:

  • Maximum reliability—the agent always has full expertise available
  • Simplest implementation—no activation logic needed
  • Predictable token costs—same every turn

Cons:

  • Token cost scales linearly with number of skills
  • Attention dilution with many skills
  • Doesn’t scale beyond ~5-10 skills per agent

Best for: Focused agents with 1-3 core skills that are always relevant.

Pattern 2: Progressive Disclosure

Used by: Claude Code, LangChain Deep Agents, OpenAI Responses API/Agents SDK

How it works: Lightweight metadata always present; full content loaded on-demand.

Pros:

  • Scales to dozens or hundreds of available skills
  • Minimal token cost when skills aren’t needed
  • Preserves prompt cache when full schemas append at end

Cons:

  • Agent might miss the cue to activate a relevant skill
  • Additional latency from the activation step
  • More complex framework implementation

Best for: General-purpose agents that need access to many capabilities but use only a few per task.

Pattern 3: Semantic Retrieval

Used by: AutoGen Teachability, Voyager

How it works: Vector database queries surface relevant skills/knowledge based on semantic similarity to the current context.

Pros:

  • Naturally bounded token cost regardless of library size
  • Content relevance improves over time as the library grows
  • Cross-session learning and accumulation
  • No explicit activation needed—relevance is computed automatically

Cons:

  • Retrieval quality depends on embedding model quality
  • Risk of retrieving outdated or subtly wrong information
  • Requires vector database infrastructure
  • Less predictable—different turns load different content

Best for: Agents that learn from experience and need to accumulate domain knowledge over time.

Pattern 4: Compiled/Static Injection

Used by: DSPy, MetaGPT

How it works: Skills are compiled into fixed prompt content (DSPy) or activated through rigid action templates (MetaGPT).

Pros:

  • Most predictable behavior—same content every time
  • Optimization can be done offline (DSPy’s compilation)
  • No runtime overhead for skill selection
  • Proven effective for well-defined, repeatable tasks

Cons:

  • Inflexible—changing skills requires recompilation (DSPy) or code changes (MetaGPT)
  • Can’t adapt to novel situations outside the compiled examples
  • DSPy’s compilation process itself requires many LLM calls

Best for: Production pipelines with well-defined tasks where reliability trumps flexibility.

Practical Implications for Agent Builders

Choosing the Right Pattern

The right skill injection architecture depends on your agent’s profile:

If your agent has a narrow, well-defined role (e.g., a code review bot, a customer support agent for one product), always-on injection (CrewAI/SuperAGI pattern) is simplest and most reliable. The token cost of 2-3 always-present skills is manageable, and you avoid the complexity of activation logic.

If your agent needs broad capabilities but uses only a few per interaction (e.g., a developer assistant, a general-purpose automation agent), progressive disclosure (Claude Code/Deep Agents pattern) is the clear winner. The 83-98% token savings at scale are too significant to ignore.

If your agent needs to learn and improve from interactions (e.g., a personal assistant, a domain expert that accumulates knowledge), semantic retrieval (AutoGen Teachability pattern) provides the learning loop other patterns lack. Just ensure you have quality controls on what enters the knowledge base.

If your agent runs well-defined pipelines (e.g., data processing, report generation, standardized workflows), compiled injection (DSPy pattern) gives you the most predictable, optimized behavior.

The Hybrid Approach

For production agent teams where agents need to work out of the box, we recommend a hybrid approach:

Core skills (1-2 per agent, defining their primary domain expertise): always injected into the system prompt, CrewAI-style. These are non-negotiable capabilities the agent needs on every turn.

Extended skills (additional capabilities the agent might need): metadata only in the system prompt, loaded via a search/load mechanism when needed, Deep Agents-style. These expand the agent’s capability set without paying the token cost when they’re not relevant.

Learned knowledge (accumulated domain expertise): stored in a vector database and retrieved semantically per-turn, AutoGen-style. This allows the agent to improve over time without manual skill authoring.

This layered architecture maps naturally to how a system prompt is built: date → persona → system instructions → core skills → skill index → role/team context. The core skills and index add a predictable, manageable token cost, while the full skill bodies only appear when needed.

Token Budget Best Practices Across Frameworks

Regardless of which injection pattern you use, these token management strategies apply universally:

Cache-Friendly Ordering

Stack unchanging context (system instructions, tool schemas) at the front of the prompt. On providers that support prompt caching, cached tokens cost 75% less. Claude Code and OpenAI both inject discovered tool schemas at the end of the context specifically to preserve cache hits on the static prefix.

Offloading

Summarize tool responses rather than keeping full results in context. Store the complete data in external references that the agent can read on demand. This is especially important for agents that make many tool calls per session.

Reduction

Compact conversation history through summarization. Extract key facts from long exchanges into condensed representations. Every framework with session-based persistence benefits from aggressive history management.

Retrieval Over Pre-Loading

Dynamically fetch relevant information at runtime rather than loading everything upfront. This applies to skills, knowledge bases, and even conversation history. Studies show this can reduce prompt sizes by up to 70%.

Isolation

Use sub-agents for specific tasks so each agent’s context stays focused. Rather than giving one agent 20 skills, create a team of 5 agents with 4 skills each. Each agent maintains a lean context window, and the team collectively covers the full capability set.

Conclusion

The way AI agent frameworks inject skills into context is one of the most consequential architectural decisions in agent design—yet it’s rarely discussed at this level of detail.

The field is clearly converging on progressive disclosure as the preferred pattern for general-purpose agents, with Claude Code, LangChain Deep Agents, and OpenAI all independently arriving at similar three-tier architectures. Meanwhile, specialized patterns like semantic retrieval (AutoGen, Voyager) and compiled injection (DSPy) serve important niches that progressive disclosure alone doesn’t address.

For practitioners building agent systems today, the key insight is that skill injection isn’t a one-size-fits-all problem. The right approach depends on your agent’s role, the number of skills it needs, whether it needs to learn over time, and your tolerance for token costs versus reliability tradeoffs.

The most robust production systems will likely combine multiple patterns—always-on for core capabilities, progressive disclosure for extended skills, and semantic retrieval for accumulated knowledge—creating agents that are both efficient and expert.

Frequently asked questions

Yasha is a talented software developer specializing in Python, Java, and machine learning. Yasha writes technical articles on AI, prompt engineering, and chatbot development.

Yasha Boroumand
Yasha Boroumand
CTO, FlowHunt

Build Smarter AI Agents with FlowHunt

Design AI agent teams with intelligent skill injection and context management. No code required.

Learn more

Prompt Injection Attacks: How Hackers Hijack AI Chatbots
Prompt Injection Attacks: How Hackers Hijack AI Chatbots

Prompt Injection Attacks: How Hackers Hijack AI Chatbots

Prompt injection is the #1 LLM security risk. Learn how attackers hijack AI chatbots through direct and indirect injection, with real-world examples and concret...

10 min read
AI Security Prompt Injection +3
LLM As a Judge for AI Evaluation
LLM As a Judge for AI Evaluation

LLM As a Judge for AI Evaluation

A comprehensive guide to using Large Language Models as judges for evaluating AI agents and chatbots. Learn about LLM As a Judge methodology, best practices for...

9 min read
AI LLM +10