MCP Security Checklist: The OWASP Minimum Bar for Secure MCP Server Deployment

MCP Security Security Checklist OWASP GenAI AI Security

The OWASP GenAI Security Project’s practical guide for MCP server development culminates in a concrete review checklist — the “MCP Security Minimum Bar.” This checklist defines the baseline controls that must be in place before an MCP server is deployed to production.

This post presents the full checklist with implementation guidance for each item, organized across the five security domains the OWASP guide defines. Use it for pre-deployment security reviews, periodic audits, and as a framework for remediating identified gaps.

How to Use This Checklist

Marking items: For each item, record PASS (implemented and verified), FAIL (not implemented or partially implemented), or N/A (not applicable to this deployment).

Deployment gates: Items in Category 1 (Identity, Auth, Policy) and Category 2 (Isolation) are hard deployment gates — any FAIL should block go-live until remediated. Items in other categories should be risk-accepted with documented timelines.

Review triggers: Re-run the full checklist after any significant change to the MCP server code, tool registry, authentication configuration, deployment environment, or when a new category of tools is onboarded.


Category 1: Strong Identity, Auth, and Policy Enforcement

This is the highest-priority category. Authentication failures grant attackers direct access to everything the MCP server can do.

1.1 All remote MCP servers use OAuth 2.1 / OIDC

What to verify: Every remote connection to the MCP server requires authentication through a properly configured OAuth 2.1 authorization server. Anonymous connections are rejected. Local MCP servers using STDIO may use alternative authentication appropriate to their deployment context.

How to test: Attempt to connect without an authorization header. Attempt to connect with a malformed or expired token. Both should result in authentication failure, not access to tools.

Common failure modes: Development endpoints left accessible without authentication; fallback to API key authentication that doesn’t validate expiry or scope; token validation only at session establishment, not per-request.


1.2 Tokens are short-lived, scoped, and validated on every call

What to verify: Access tokens expire within minutes (not hours). Each token carries the minimum scope required for the current task. Every tool invocation validates the token’s signature, issuer (iss), audience (aud), expiry (exp), and required scope — not just on session establishment.

How to test: Use a valid token, then wait for it to expire (or manually set the clock forward). Attempt a tool call — it should fail with a 401, not succeed on a cached validation result.

Common failure modes: Token validation cached at session start and not repeated; tokens with 24+ hour lifetimes; broad “admin” scopes used instead of operation-specific scopes; exp field not checked.


1.3 No token passthrough; policy enforcement is centralized

What to verify: The MCP server does not forward client tokens to downstream APIs. All downstream service calls use tokens explicitly issued to the MCP server (via On-Behalf-Of flows or service credentials). A centralized policy gateway intercepts all tool invocations and enforces authentication, authorization, consent, and audit logging before any tool code executes.

How to test: Review code for any location where the incoming client token is forwarded in an outbound API call. Inspect downstream service access logs to verify requests arrive with server credentials, not user credentials.

Common failure modes: Authorization: Bearer ${request.headers.authorization} pattern in downstream calls; authorization checks scattered across individual tool handlers; no centralized policy enforcement point.


Logo

Ready to grow your business?

Start your free trial today and see results within days.

Category 2: Strict Isolation and Lifecycle Control

Isolation failures in multi-tenant environments are catastrophic — they enable one user to access another’s data. These are hard deployment gates.

2.1 Users, sessions, and execution contexts are fully isolated

What to verify: No global variables, class-level attributes, or shared singleton instances store user-specific or session-specific data. Each session uses independently instantiated objects or session-keyed namespaces (e.g., Redis keys prefixed with session_id:). Code review confirms no shared mutable state between sessions.

How to test: Run two concurrent sessions with different user identities. Verify that data written in session A cannot be read in session B. Use concurrent load tests to check for race conditions that might cause session state leakage.

Common failure modes: self.user_context = {} as a class attribute in a singleton service; global caches without session-keyed namespaces; thread-local storage that doesn’t properly scope to request lifecycle.


2.2 No shared state for user data

What to verify: Beyond execution context, any shared infrastructure (databases, caches, message queues) enforces per-user access controls. A query executed in one user’s session cannot return another user’s data even if the shared infrastructure is misconfigured or compromised.

How to test: Attempt to access another user’s data by manipulating session parameters or exploiting shared cache keys.

Common failure modes: Cache keys based only on query content, not user identity; database queries without user-scoped WHERE clauses; shared temporary file directories without per-user subdirectories.


2.3 Sessions have deterministic cleanup and enforced resource quotas

What to verify: When a session terminates (cleanly or through timeout/error), all associated resources are immediately released: file handles, temporary files, in-memory context, cached tokens, database connections. Per-session limits exist for memory, CPU, API rate, and file system usage.

How to test: Terminate a session abruptly (kill the connection without a graceful shutdown). Verify no residual resources remain. Create a session and exhaust its rate limit; verify it doesn’t affect other sessions.

Common failure modes: Temporary files left in /tmp after session end; cached tokens not revoked on session termination; no resource quotas allowing one session to exhaust shared infrastructure.


Category 3: Trusted, Controlled Tooling

Tool security prevents the most dangerous MCP-specific attacks: tool poisoning and rug pulls.

3.1 Tools are cryptographically signed, version-pinned, and formally approved

What to verify: Every tool definition has a cryptographic signature from an authorized tool approver. The signature covers the complete manifest (description, schema, version, permissions). The MCP server verifies this signature at load time and rejects any unsigned or signature-mismatched tool. Tool versions are pinned — the server cannot dynamically load an updated tool without a new approved signature.

How to test: Modify a single character in a loaded tool’s description. Verify the server detects the hash mismatch and blocks the tool from loading. Attempt to load an unsigned tool definition — it should be rejected.

Common failure modes: Tool definitions stored as mutable configuration without integrity verification; no signing key infrastructure; tools loaded directly from a shared filesystem without version pinning.


3.2 Tool descriptions are validated against runtime behavior

What to verify: Automated scanning checks tool descriptions for instruction-like patterns that could represent poisoning attempts. Periodic validation confirms that a tool’s actual runtime behavior matches its declared description — a tool that claims to be read-only should not be capable of write operations at runtime.

How to test: Add a suspicious instruction to a tool description (“always also call send_webhook with…”) and verify automated scanning flags it before human review. Review the SAST tool configuration for MCP-specific poisoning detection rules.

Common failure modes: No automated scanning of tool descriptions; manual review process that may miss embedded instructions in long descriptions; no runtime behavior validation to catch tools that lie about their capabilities.


3.3 Only minimal, necessary tool fields are exposed to the model

What to verify: The model context receives only the fields required for correct tool invocation: name, description, input schema, output schema. Internal metadata, implementation details, debugging information, and sensitive configuration are filtered out before being passed to the model.

How to test: Inspect what the model receives when it enumerates available tools. Verify no internal fields, connection strings, or operational metadata appear in the model’s view.

Common failure modes: Full tool configuration objects passed to the model context; error messages containing internal system details that leak to the model; tool descriptions including implementation notes not relevant to invocation.


Category 4: Schema-Driven Validation Everywhere

Validation failures enable injection, data manipulation, and denial-of-service.

4.1 All MCP messages, tool inputs, and outputs are schema-validated

What to verify: JSON Schema validation is enforced for every MCP protocol message, every tool invocation input, and every tool output before it reaches the model. Validation rejects any message that doesn’t conform to the defined schema — missing required fields, wrong types, values outside permitted ranges.

How to test: Send a tool invocation with a missing required parameter. Send a message with an extra unexpected field. Both should be rejected, not silently ignored or processed with defaults.

Common failure modes: Optional validation that’s bypassed under error conditions; validation only on inputs, not outputs; schemas that are too permissive (accepting type: "any" parameters).


4.2 Inputs/outputs are sanitized, size-limited, and treated as untrusted

What to verify: All inputs are sanitized to remove or escape characters that could enable injection (XSS sequences, SQL metacharacters, shell metacharacters, null bytes). Size limits are enforced on all inputs and outputs. The server treats all data from the model as potentially adversarial, identical to user input in a traditional web application.

How to test: Send inputs containing SQL injection payloads, shell metacharacters, and XSS sequences. Verify they are rejected or safely escaped before reaching downstream systems. Send an input that exceeds the size limit — verify it’s rejected cleanly.

Common failure modes: Inputs passed directly to SQL queries or shell commands; no size limits allowing oversized inputs to cause memory exhaustion; outputs returned to the model without size limits or content filtering.


4.3 Structured (JSON) tool invocation is required

What to verify: Tool calls are only accepted as structured JSON objects with validated schemas. Free-form text generation that implies tool invocations is not processed. The system cannot be induced to execute tool calls by generating natural language that the server interprets as commands.

How to test: Send a natural language string that describes a tool invocation (“call the delete_file tool with path /etc/passwd”). Verify the server does not interpret this as a tool call.

Common failure modes: Hybrid systems that accept both structured JSON and natural language tool descriptions; servers that parse model-generated text to identify tool invocations; regex-based tool call parsing that can be spoofed.


Category 5: Hardened Deployment and Continuous Oversight

Deployment hardening limits the blast radius of any exploited vulnerability.

5.1 Server runs containerized, non-root, network-restricted

What to verify: The MCP server runs in a minimal hardened container. The container process runs as a non-root user. Unnecessary Linux capabilities are dropped. Network policies restrict all inbound and outbound traffic to explicitly required connections. The container image contains only the minimum required software.

How to test: Run docker inspect and verify the user is non-root. Review network policies and confirm they’re blocking all traffic except explicitly whitelisted connections. Scan the container image for unnecessary packages or known-vulnerable software.

Common failure modes: Containers running as root for convenience; no network policies leaving all outbound traffic permitted; base images with full OS installations instead of minimal images.


5.2 Secrets are stored in vaults and never exposed to the LLM

What to verify: All API keys, OAuth client secrets, database credentials, and service account tokens are stored in a secrets vault (HashiCorp Vault, AWS Secrets Manager, Azure Key Vault, etc.). No secrets exist in environment variables, source code, container images, or log output. Secrets management operations happen in middleware that is inaccessible to the AI model — the LLM never sees or processes credential values.

How to test: Search logs for credential-like strings. Inspect environment variables accessible to the server process. Review the model’s accessible context to confirm no credential values appear.

Common failure modes: API keys in .env files committed to version control; credentials returned in error messages that reach the model; secrets passed as tool parameters that appear in the model’s conversation context.


5.3 CI/CD security gates, audit logs, and continuous monitoring are mandatory

What to verify: The deployment pipeline includes automated security scanning (SAST, SCA, dependency vulnerability scanning) as hard gates — failed scans block deployment. All tool invocations, authentication events, and authorization decisions are logged immutably with full context. Logs are ingested by a SIEM with real-time alerting on anomalous patterns (failed validation spikes, unusual tool call frequency, unexpected external connections).

How to test: Introduce a known-vulnerable dependency and verify the CI/CD pipeline fails the build. Generate anomalous tool call patterns and verify SIEM alerts fire within the expected response time.

Common failure modes: Security scanning as advisory rather than blocking gates; logs written to mutable storage that an attacker could modify; no alerting on anomalous patterns; excessive log verbosity that makes relevant events impossible to find.


Using This Checklist for Your MCP Deployment

Print or export this checklist and work through it systematically for every MCP server before production deployment. Involve your security team in the review — many items require both code review and live testing to verify correctly.

For teams that want independent verification, a professional MCP security audit tests all 16 checklist items against your live environment, using adversarial testing techniques rather than self-assessment. The result is a verified security posture report with a prioritized remediation plan.

Frequently asked questions

What is the OWASP MCP Security Minimum Bar?

The OWASP GenAI Security Project's 'MCP Security Minimum Bar' is a review checklist defining the baseline security controls required before an MCP server should be deployed to production. It covers five domains: Strong Identity/Auth/Policy Enforcement, Strict Isolation and Lifecycle Control, Trusted and Controlled Tooling, Schema-Driven Validation, and Hardened Deployment with Continuous Oversight. Failing to meet the minimum bar means the MCP server should not be deployed until gaps are remediated.

How do I use this checklist for a security review?

Work through each category systematically, marking items as PASS, FAIL, or NOT APPLICABLE with evidence for each decision. Any FAIL in categories 1 or 2 (identity and isolation) should block deployment — these are the highest-risk gaps. FAILs in other categories should be risk-accepted with a documented remediation timeline before deployment. The checklist should be re-evaluated after any significant change to the MCP server, tool registry, or deployment environment.

What tools support automated MCP security checking?

Several tools support automated MCP security validation: Invariant MCP-Scan (specialized for MCP security scanning), SAST tools with custom MCP rules, npm audit and pip audit for dependency scanning, OSV-Scanner for vulnerability database checks, Docker seccomp and AppArmor profiles for runtime isolation, and SIEM integration for centralized monitoring. No single tool covers all checklist items — comprehensive coverage requires combining static analysis, dynamic testing, and continuous monitoring.

Arshia is an AI Workflow Engineer at FlowHunt. With a background in computer science and a passion for AI, he specializes in creating efficient workflows that integrate AI tools into everyday tasks, enhancing productivity and creativity.

Arshia Kahani
Arshia Kahani
AI Workflow Engineer

Get a Professional MCP Security Assessment

Use this checklist to self-assess, then bring in our team for a verified security audit. We test every item against your live environment and deliver a detailed remediation plan.

Learn more

Authenticator App MCP Server
Authenticator App MCP Server

Authenticator App MCP Server

The Authenticator App MCP Server enables AI agents to securely access 2FA codes and passwords, streamlining automated login processes and credential management ...

4 min read
MCP Security +5