
Prompt Injection Attacks: How Hackers Hijack AI Chatbots
Prompt injection is the #1 LLM security risk. Learn how attackers hijack AI chatbots through direct and indirect injection, with real-world examples and concret...
Deep dive into the injection mechanics of 11 AI agent platforms: where skills land in the prompt, when they load, what they cost in tokens, and how they survive context compaction.
Every AI agent framework faces the same fundamental question: how do you make an LLM good at something specific? The model itself has broad general knowledge, but when you need it to perform a code review, deploy infrastructure, or navigate Minecraft—it needs specialized instructions, tool access, and domain context.
This is the skill injection problem. And every major framework solves it differently.
Some platforms dump everything into the system prompt upfront. Others use lazy loading, only revealing capabilities when the agent needs them. A few use vector databases to retrieve relevant skills based on semantic similarity. The differences aren’t academic—they directly affect token costs, agent reliability, and how many skills an agent can realistically juggle.
We analyzed 11 major AI agent platforms to understand exactly where skills land in the prompt, when they load, what they cost in tokens, and how they survive when the context window fills up. This isn’t a surface-level feature comparison. We dug into source code, documentation, and architecture diagrams to map out the precise injection mechanics of each platform.
Here’s the complete overview before we dive into the details.
| Platform | Injection Point | When Loaded | Mechanism |
|---|---|---|---|
| Claude Code | System-reminder (metadata) + conversation message (body) | Metadata at session start; body on /command or auto-match | Framework injects metadata; Skill tool loads full body on activation |
| CrewAI | Task prompt (appended before LLM call) | Every task execution via _finalize_task_prompt() | format_skill_context() appends all skill bodies to prompt |
| LangChain Deep Agents | System prompt (metadata) + conversation history (body) | Metadata at startup; body when agent calls read_file() | SkillsMiddleware injects index; agent loads body via filesystem tool |
| OpenAI Responses API | User prompt context (platform-managed) | On skill_reference in API call | Platform appends metadata; model reads full SKILL.md on invocation |
| OpenAI Agents SDK | Tool definitions (deferred via ToolSearchTool) | Namespace names at creation; schemas on ToolSearchTool call | tool_namespace() + ToolSearchTool() for progressive discovery |
| AutoGen Teachability | Modified user message (retrieved memos injected) | Every turn — vector DB retrieval before each LLM call | Middleware intercepts message, queries ChromaDB, injects top-K matches |
| Semantic Kernel | Function-calling schemas + prompt template content | All schemas at startup; template content on function invocation | kernel.add_plugin() registers all; kernel.invoke() renders templates |
| MetaGPT | Action prompt template (rendered into LLM call) | When Role’s _act() fires for a specific Action | Action.run() formats PROMPT_TEMPLATE, sends via aask() |
| Voyager | Code generation prompt (retrieved skill code) | Before each code generation; embedding similarity search | SkillLibrary.retrieve_skills() injects top-5 as few-shot examples |
| DSPy | Compiled few-shot demos in Predict module prompts | Compiled offline by optimizer; fixed at runtime | BootstrapFewShot / MIPROv2 selects best demos; Predict renders into prompt |
| SuperAGI | Tool schemas in agent’s tool list | Agent creation — all toolkit tools registered upfront | BaseToolkit.get_tools() registers all as function-calling tools |
| CAMEL-AI | Function schemas + role system message | Agent creation — all tools registered upfront | ChatAgent(tools=[*toolkit.get_tools()]) loads everything at init |
| Platform | Always Present? | Persistence | Token Cost |
|---|---|---|---|
| Claude Code | Metadata: YES. Body: only after activation | Session-scoped. On compaction: re-attached (5K/skill, 25K cap) | ~250 chars/skill metadata; 1% of context budget |
| CrewAI | YES — full body in every task prompt | Fresh injection per task; no cross-task persistence | Full body every call. 50K char soft limit |
| LangChain Deep Agents | Metadata: YES. Body: on-demand | Body stays in conversation history; subagent skills isolated | ~100 tokens/skill metadata; body paid once (~3,302 tokens) |
| OpenAI Responses API | Name+desc: YES. Full body: on invocation | Single API response only; no cross-call persistence | Platform-managed |
| OpenAI Agents SDK | Namespace list: YES. Schemas: on demand | Single run only; re-discover per session | Minimal until activated |
| AutoGen Teachability | NO — only relevant memos per turn | Cross-session via ChromaDB; persists indefinitely | ~3-5 memos per turn (variable) |
| Semantic Kernel | All schemas: YES. Templates: on invocation | In-memory per kernel instance; no cross-session | All schemas always present |
| MetaGPT | NO — only current Action’s template | Single action execution only | One template per turn |
| Voyager | NO — top-5 retrieved per task | Lifelong persistence in vector DB | ~500-2,000 tokens per skill example |
| DSPy | YES — compiled demos baked in | Serializable to JSON; persists across sessions | Fixed after compilation (3-8 demos/module) |
| SuperAGI | YES — all schemas always present | Within agent session | All schemas always present |
| CAMEL-AI | YES — all schemas + role prompt | Within conversation session | All schemas always present |
Before diving into the comparison, let’s define the problem space. An AI agent’s context window—the total text the LLM sees on each call—has a fixed size. Every token of instruction, conversation history, tool definition, and retrieved data competes for space in that window.
A “skill” in the agent context is any structured package of expertise that changes how the agent behaves. This could be:
The injection mechanism—where and when this content enters the context—determines three critical properties:
Every framework makes different tradeoffs across these three dimensions. Let’s examine each one.
Across all 11 platforms, skill injection approaches fall along a spectrum from “everything loaded upfront” to “nothing loaded until explicitly needed.”
At one end, platforms like CrewAI, SuperAGI, and CAMEL-AI inject the full content of every activated skill into every LLM call. The agent always has its complete expertise available. Simple, reliable, but expensive in tokens.
At the other end, Claude Code, LangChain Deep Agents, and OpenAI’s Responses API use progressive disclosure—the agent sees only skill names and short descriptions at startup, and full content loads on-demand. Efficient, scalable, but requires the agent to recognize when it needs a skill.
In the middle, AutoGen Teachability and Voyager use semantic retrieval to inject only the most relevant skills per turn, creating a dynamic, context-sensitive injection pattern.
And then there are unique approaches: DSPy compiles optimized few-shot examples offline and bakes them permanently into module prompts. MetaGPT encodes skills as action templates that activate only when a specific role transitions to a specific action.
Let’s examine each in detail.
Claude Code implements one of the most sophisticated skill injection architectures, using a three-layer progressive disclosure system that balances awareness with token efficiency.
At session start, every available skill’s name and description is injected into a system-reminder message—a metadata block the model always sees. This costs roughly 250 characters per skill, consuming about 1% of the context window budget for all skill descriptions combined (approximately 8K characters as a fallback budget, overridable via the SLASH_COMMAND_TOOL_CHAR_BUDGET environment variable).
Similarly, deferred tools—tools whose full JSON schemas haven’t been loaded yet—appear as a name-only list in system-reminder blocks. As of Claude Code v2.1.69, even built-in system tools like Bash, Read, Edit, Write, Glob, and Grep are deferred behind ToolSearch, reducing system tool context from approximately 14–16K tokens to roughly 968 tokens.
The agent sees enough to know what’s available without paying the token cost of full definitions.
When a user types a slash command (e.g., /commit) or the model auto-matches a skill based on its description, the full SKILL.md body is loaded as a conversation message via the Skill tool. This body contains the complete instructions—sometimes thousands of tokens of detailed guidance.
Key detail: Shell preprocessing runs first (any !command directives in the skill file execute and their output replaces the directive), and once loaded, the skill body stays in the conversation for the rest of the session.
Additional resources—reference documents, scripts, asset files—are only read when the model explicitly decides to use the Read tool to access them. These never load automatically.
When the conversation approaches the context limit and compaction triggers, Claude Code re-attaches the most recently invoked skills with a budget of 5K tokens per skill and a 25K combined maximum. Most-recently-invoked skills get priority. Older skills may be dropped entirely.
This three-layer architecture means an agent with 20+ available skills pays a minimal upfront cost but can access full expertise on any of them within a single turn.
CrewAI takes the opposite approach from progressive disclosure. When a skill is activated for an agent, its full content is injected into every task prompt the agent executes.
Skills in CrewAI are self-contained directories, each with a SKILL.md file containing YAML frontmatter (name, description, license, compatibility, allowed tools) and a markdown body. The skill system distinguishes between skills and tools: skills inject instructions and context that shape how the agent thinks, while tools provide callable functions for actions.
During agent initialization, Agent.set_skills() calls discover_skills() to scan skill directories at the metadata level, then activate_skill() to read full skill bodies. At task execution time, _finalize_task_prompt() calls format_skill_context() for each activated skill and appends all formatted skill content to the task prompt.
The LLM receives: [system message] + [task prompt + ALL skill bodies]
CrewAI imposes a soft warning at 50,000 characters per skill but no hard limit. The documentation recommends keeping skills focused and concise because large prompt injections dilute the model’s attention—a real concern given research on context rot.
The tradeoff is straightforward: the agent always has full expertise available (high reliability), but token cost scales linearly with the number of skills per task (low efficiency). For agents with 1-2 focused skills, this works well. For agents needing broad capability sets, it becomes expensive fast.
Each task gets a fresh injection. There’s no accumulation of skill content across tasks—which is actually a feature, not a bug. It means each task starts with a clean context, avoiding the staleness problems that session-based persistence can create.
LangChain Deep Agents implements a sophisticated middleware-based skill system where the agent itself decides when to load full skill content—a true progressive disclosure model where the agent controls activation.
Tier 1 (Index): SkillsMiddleware parses all SKILL.md frontmatter at startup and injects a lightweight index into the system prompt. This index contains only names and descriptions, costing approximately 278 tokens per skill versus 3,302 tokens for full content.
Tier 2 (Full Content): When the agent determines a skill is relevant, it calls read_file() on the skill’s SKILL.md path. This is a regular tool call—the framework doesn’t inject the body; the agent makes a deliberate decision to load it. The full content enters the conversation history as a tool result.
Tier 3 (Deep Dive): Supporting materials, reference docs, and scripts are only accessed when the agent explicitly reads them.
With 12 skills, progressive disclosure reduces context from approximately 30,000 tokens (all loaded) to roughly 600 tokens (index only), expanding to 2,000–5,000 when relevant skills are loaded for a specific task. That’s a potential 83–98% reduction in skill-related token consumption.
Multiple skill sources can be layered, and when names collide, the last source wins. Files over 10 MB are automatically skipped.
While Claude Code uses a dedicated Skill tool to trigger loading, Deep Agents repurposes the agent’s existing read_file tool. This means the loading mechanism is transparent—the agent reads skill files the same way it reads any other file. The downside is that there’s no special compaction behavior: skill content that enters conversation history is subject to standard LangChain message trimming, with no priority treatment.
OpenAI implements skill injection through two distinct but philosophically aligned mechanisms: the Responses API’s tool_search tool type and the Agents SDK’s ToolSearchTool.
The tool_search tool type (available on GPT-5.4+) allows developers to defer large tool surfaces until runtime. Three deferral strategies are available:
@function_tool(defer_loading=True) — the model sees the function name and description but the parameter schema is deferred. Saves parameter-level tokens.tool_namespace(name=..., description=..., tools=[...]) — groups functions under a single namespace. The model sees only the namespace name and description, saving significantly more tokens.HostedMCPTool(tool_config={..., "defer_loading": True}) — defers entire MCP server tool surfaces.When the model determines it needs a specific tool, it issues a tool_search call. The API returns 3-5 relevant tool definitions, injected at the end of the context window to preserve prompt caching.
The Agents SDK provides a programmatic equivalent. Tool namespaces are registered but not loaded:
crm_tools = tool_namespace(
name="crm",
description="CRM management tools",
tools=[...]
)
agent = Agent(tools=[*crm_tools, ToolSearchTool()])
At runtime, the agent sees only namespace names. It calls ToolSearchTool("crm") to discover and load the full schemas, then can call individual tools within that namespace.
Each API request is independent. Discovered tools don’t persist across calls. This is the most stateless approach in our comparison—clean, predictable, but requiring re-discovery on every request if tools change.
AutoGen’s Teachability capability takes a fundamentally different approach from every other framework in this comparison. Instead of injecting static skill content, it dynamically retrieves relevant “memos” from a ChromaDB vector database on every single turn.
Teachability registers a hook on process_last_received_message that intercepts every incoming user message before the agent processes it:
TextAnalyzerAgent extracts key concepts from the incoming messagemax_num_retrievals, default 10)Critically, the modified message does not propagate into stored conversation history—only the original message is stored. This prevents memo content from compounding across turns.
After the LLM responds, a second hook analyzes the response for new learnings:
TextAnalyzerAgent identifies new knowledge in the responseThis creates a genuine learning loop where the agent accumulates expertise over time.
AutoGen Teachability is one of only three platforms in our comparison (alongside Voyager and DSPy) that persists skills across sessions. The ChromaDB database lives on disk, meaning an agent can learn from interactions on Monday and apply that knowledge on Friday.
The recall_threshold parameter (default 1.5) controls how similar a message must be to a stored memo for retrieval, and reset_db can clear the entire memory when needed.
Since only relevant memos are injected per turn (typically 3-5), the token cost is naturally bounded regardless of how large the memo database grows. An agent with 10,000 stored memos still only pays for the handful most relevant to the current turn.
Microsoft’s Semantic Kernel takes a straightforward approach: plugins are collections of KernelFunction objects registered with the Kernel, and their schemas are exposed to the LLM as function-calling tool definitions.
Function Calling: When ToolCallBehavior.AutoInvokeKernelFunctions is set, all registered functions are sent to the LLM as available tools in every API request. The LLM decides which to call; Semantic Kernel handles invocation and result routing.
Prompt Templates: Semantic Kernel’s template syntax ({{plugin.function}}, Handlebars, or Liquid) allows functions to be called inline during prompt rendering. Results are embedded directly in the prompt text before it reaches the LLM—a form of eager evaluation rather than lazy tool calling.
Every registered plugin’s schema is included in every API call. There’s no built-in deferred loading, namespace grouping, or on-demand activation. The documentation explicitly recommends importing only the plugins needed for a specific scenario to reduce token consumption and miscalls.
This makes Semantic Kernel one of the most predictable platforms—you always know exactly what the agent has access to—but it limits scalability. An agent with 50 registered functions pays the full schema cost on every single call.
Plugin registration is per-Kernel-instance and in-memory. There’s no built-in mechanism for cross-session skill persistence.
MetaGPT encodes skills not as standalone packages but as action templates embedded within Standard Operating Procedures (SOPs) that govern role behavior.
Each Role in MetaGPT has a persona prefix injected into prompts and a set of Action classes. Each Action contains an LLM proxy invoked via aask(), which uses natural language prompt templates to structure the LLM call.
When Role._act() fires, it supports three react modes:
"react": The LLM dynamically selects actions in think-act loops"by_order": Actions execute sequentially in a predetermined order"plan_and_act": The agent plans first, then executes actions according to the planOnly the current Action’s prompt template is active at any given moment. The agent doesn’t see templates for other actions—it only sees its role prefix plus the specific action’s context. This is the narrowest injection window of any framework we examined.
Context parsing functions within Action classes extract relevant information from inputs, so each action receives a curated subset of available context rather than the full conversation history.
The template is rendered fresh for each action execution. There’s no accumulation or cross-session persistence. This keeps each action focused but means the agent can’t build on previously loaded skill content within a single workflow.
Voyager, the Minecraft exploration agent from NVIDIA and Caltech, implements one of the most elegant skill injection architectures: a growing library of verified programs retrieved by embedding similarity.
When Voyager writes code that passes self-verification (the generated Mineflayer JavaScript actually works in the game), the code and its documentation string are stored in a vector database. The docstring embedding becomes the retrieval key.
On each new task proposed by the automatic curriculum:
The prompt looks like this:
You are a Minecraft bot. Here are some relevant skills you've learned:
// Skill: mineWoodLog
async function mineWoodLog(bot) { ... }
// Skill: craftPlanks
async function craftPlanks(bot) { ... }
Now write code to: build a wooden pickaxe
The generated code can call retrieved skills by name, enabling compositional skill building—complex behaviors constructed from simpler, verified primitives.
The skill library is the core “lifelong learning” mechanism. It grows across the agent’s entire lifetime, and new skills build on old ones. Unlike most frameworks where skills are authored by humans, Voyager’s skills are generated, verified, and stored by the agent itself.
Token cost is naturally bounded: regardless of whether the library contains 50 or 5,000 skills, each task only pays for the 5 most relevant retrievals.
DSPy takes a radically different approach from every other framework. Instead of injecting skills at runtime, DSPy compiles optimal few-shot demonstrations offline and bakes them permanently into module prompts.
Two main optimizers handle compilation:
BootstrapFewShot: Uses a teacher module to generate traces through the program. Traces that pass a user-defined metric are kept as demonstrations. Each dspy.Predict module within the program gets its own curated set of demonstrations.
MIPROv2 (Multi-prompt Instruction Proposal Optimizer v2): A three-phase process:
Parameters like max_bootstrapped_demos (generated examples) and max_labeled_demos (from training data) control how many examples end up in each module’s prompt.
Once compiled, demonstrations are stored in each Predict module’s demos attribute and formatted into the prompt on every LLM call. They don’t change at runtime—the “skill” is frozen.
This means DSPy skills are the most predictable in our comparison: token cost is known after compilation, there’s no variance between turns, and the agent always sees the same demonstrations. The downside is inflexibility—to change skills, you must recompile.
Compiled programs serialize to JSON, including all demonstrations. They’re fully persistent and loadable across sessions, making DSPy one of the most durable skill storage mechanisms.
SuperAGI uses a traditional toolkit pattern where all tools are registered at agent initialization.
Each toolkit extends BaseToolkit with:
name and description attributesget_tools() method returning a list of BaseTool instancesget_env_keys() for required environment variablesToolkits are installed from GitHub repositories via SuperAGI’s tool manager. At agent initialization, BaseToolkit.get_tools() returns all tools, and their complete schemas are exposed to the LLM as function-calling definitions.
There’s no deferred loading, no progressive disclosure, and no per-turn filtering. Every registered tool’s schema is present in every call. This is the simplest injection model and works well for agents with focused, small tool sets but doesn’t scale to agents needing dozens of capabilities.
CAMEL-AI follows a similar upfront registration pattern. Tools from various toolkits (e.g., MathToolkit, SearchToolkit) are passed as a list to ChatAgent(tools=[...]) at initialization.
The framework emphasizes that custom functions need clear argument names and comprehensive docstrings so the model can understand usage—the tool schema is the only “skill” content the model sees. There’s no separate instruction injection mechanism.
Recent additions include MCP (Model Context Protocol) support via MCPToolkit, allowing ChatAgent to connect to MCP servers and register external tools. This expands the available tool surface but doesn’t change the injection model—all discovered MCP tools are still registered upfront.
| Timing | Platforms | What’s Injected |
|---|---|---|
| Always present (session start) | Claude Code, CrewAI, Deep Agents, Semantic Kernel, SuperAGI, CAMEL-AI, DSPy | Metadata (name + description) or full schemas |
| On activation (user or agent triggered) | Claude Code, Deep Agents, OpenAI | Full skill body |
| Every task/turn | CrewAI, AutoGen Teachability | Full body (CrewAI) or retrieved memos (AutoGen) |
| On LLM selection | Semantic Kernel, MetaGPT | Prompt template content |
| On similarity match | Voyager, AutoGen Teachability | Retrieved code or memos |
| Compiled/fixed | DSPy | Optimized few-shot examples |
| Persistence | Platforms | Mechanism |
|---|---|---|
| Single turn only | MetaGPT, Voyager | Template rendered per-action / per-generation |
| Within session | Claude Code, Deep Agents, OpenAI, Semantic Kernel | Body stays in message history |
| Re-injected every task | CrewAI, SuperAGI, CAMEL-AI | Appended fresh each task execution |
| Cross-session (persistent storage) | AutoGen Teachability, Voyager, DSPy | Vector DB / compiled modules / skill library |
| Platform | What Happens When Context Gets Full |
|---|---|
| Claude Code | Re-attaches most recent skills (5K tokens each, 25K cap). Older skills dropped |
| CrewAI | N/A—injected fresh per task, no accumulation |
| Deep Agents | Body in conversation history, subject to standard LangChain trimming |
| OpenAI | N/A—each API call is independent |
| AutoGen | Only relevant memos retrieved per-turn, naturally bounded |
| Voyager | Only top-K skills retrieved per task, naturally bounded |
The most significant architectural trend across these platforms is the adoption of progressive disclosure—a concept borrowed from UI design where information is revealed incrementally based on need.
A naive approach to skill injection—loading everything upfront—creates two problems:
Progressive disclosure solves both problems by maintaining a lightweight index of available skills while loading full content only when needed.
Claude Code uses a dedicated system: skill metadata in system-reminder messages, a Skill tool for activation, and ToolSearch for deferred tool schemas. The framework manages injection automatically with priority-based compaction.
LangChain Deep Agents uses the agent’s existing file-reading capability: SkillsMiddleware injects the index, and the agent loads full content via read_file(). This is more transparent but offers less framework-level optimization.
OpenAI Responses API uses namespace-based grouping with platform-managed search: tool namespaces provide high-level descriptions, and tool_search returns relevant schemas. The platform handles the search logic entirely.
The numbers are compelling. With 12 skills:
That’s a 83–98% reduction in skill-related token consumption per turn. Over a long session with hundreds of turns, the savings compound dramatically.
Looking across all 11 platforms, four distinct architectural patterns emerge:
Used by: CrewAI, SuperAGI, CAMEL-AI, Semantic Kernel
How it works: Full skill content or tool schemas are present in every LLM call.
Pros:
Cons:
Best for: Focused agents with 1-3 core skills that are always relevant.
Used by: Claude Code, LangChain Deep Agents, OpenAI Responses API/Agents SDK
How it works: Lightweight metadata always present; full content loaded on-demand.
Pros:
Cons:
Best for: General-purpose agents that need access to many capabilities but use only a few per task.
Used by: AutoGen Teachability, Voyager
How it works: Vector database queries surface relevant skills/knowledge based on semantic similarity to the current context.
Pros:
Cons:
Best for: Agents that learn from experience and need to accumulate domain knowledge over time.
Used by: DSPy, MetaGPT
How it works: Skills are compiled into fixed prompt content (DSPy) or activated through rigid action templates (MetaGPT).
Pros:
Cons:
Best for: Production pipelines with well-defined tasks where reliability trumps flexibility.
The right skill injection architecture depends on your agent’s profile:
If your agent has a narrow, well-defined role (e.g., a code review bot, a customer support agent for one product), always-on injection (CrewAI/SuperAGI pattern) is simplest and most reliable. The token cost of 2-3 always-present skills is manageable, and you avoid the complexity of activation logic.
If your agent needs broad capabilities but uses only a few per interaction (e.g., a developer assistant, a general-purpose automation agent), progressive disclosure (Claude Code/Deep Agents pattern) is the clear winner. The 83-98% token savings at scale are too significant to ignore.
If your agent needs to learn and improve from interactions (e.g., a personal assistant, a domain expert that accumulates knowledge), semantic retrieval (AutoGen Teachability pattern) provides the learning loop other patterns lack. Just ensure you have quality controls on what enters the knowledge base.
If your agent runs well-defined pipelines (e.g., data processing, report generation, standardized workflows), compiled injection (DSPy pattern) gives you the most predictable, optimized behavior.
For production agent teams where agents need to work out of the box, we recommend a hybrid approach:
Core skills (1-2 per agent, defining their primary domain expertise): always injected into the system prompt, CrewAI-style. These are non-negotiable capabilities the agent needs on every turn.
Extended skills (additional capabilities the agent might need): metadata only in the system prompt, loaded via a search/load mechanism when needed, Deep Agents-style. These expand the agent’s capability set without paying the token cost when they’re not relevant.
Learned knowledge (accumulated domain expertise): stored in a vector database and retrieved semantically per-turn, AutoGen-style. This allows the agent to improve over time without manual skill authoring.
This layered architecture maps naturally to how a system prompt is built: date → persona → system instructions → core skills → skill index → role/team context. The core skills and index add a predictable, manageable token cost, while the full skill bodies only appear when needed.
Regardless of which injection pattern you use, these token management strategies apply universally:
Stack unchanging context (system instructions, tool schemas) at the front of the prompt. On providers that support prompt caching, cached tokens cost 75% less. Claude Code and OpenAI both inject discovered tool schemas at the end of the context specifically to preserve cache hits on the static prefix.
Summarize tool responses rather than keeping full results in context. Store the complete data in external references that the agent can read on demand. This is especially important for agents that make many tool calls per session.
Compact conversation history through summarization. Extract key facts from long exchanges into condensed representations. Every framework with session-based persistence benefits from aggressive history management.
Dynamically fetch relevant information at runtime rather than loading everything upfront. This applies to skills, knowledge bases, and even conversation history. Studies show this can reduce prompt sizes by up to 70%.
Use sub-agents for specific tasks so each agent’s context stays focused. Rather than giving one agent 20 skills, create a team of 5 agents with 4 skills each. Each agent maintains a lean context window, and the team collectively covers the full capability set.
The way AI agent frameworks inject skills into context is one of the most consequential architectural decisions in agent design—yet it’s rarely discussed at this level of detail.
The field is clearly converging on progressive disclosure as the preferred pattern for general-purpose agents, with Claude Code, LangChain Deep Agents, and OpenAI all independently arriving at similar three-tier architectures. Meanwhile, specialized patterns like semantic retrieval (AutoGen, Voyager) and compiled injection (DSPy) serve important niches that progressive disclosure alone doesn’t address.
For practitioners building agent systems today, the key insight is that skill injection isn’t a one-size-fits-all problem. The right approach depends on your agent’s role, the number of skills it needs, whether it needs to learn over time, and your tolerance for token costs versus reliability tradeoffs.
The most robust production systems will likely combine multiple patterns—always-on for core capabilities, progressive disclosure for extended skills, and semantic retrieval for accumulated knowledge—creating agents that are both efficient and expert.
Yasha is a talented software developer specializing in Python, Java, and machine learning. Yasha writes technical articles on AI, prompt engineering, and chatbot development.

Design AI agent teams with intelligent skill injection and context management. No code required.

Prompt injection is the #1 LLM security risk. Learn how attackers hijack AI chatbots through direct and indirect injection, with real-world examples and concret...

Discover why leading engineers are moving away from MCP servers and explore three proven alternatives—CLI-based approaches, script-based tools, and code executi...

A comprehensive guide to using Large Language Models as judges for evaluating AI agents and chatbots. Learn about LLM As a Judge methodology, best practices for...
Cookie Consent
We use cookies to enhance your browsing experience and analyze our traffic. See our privacy policy.