Kokoro TTS MCP Server
Kokoro TTS MCP Server brings natural-sounding, customizable text-to-speech to your AI applications, with support for local and cloud audio storage, ideal for accessibility, automation, and content creation.

What does “Kokoro TTS” MCP Server do?
The Kokoro Text to Speech (TTS) MCP Server is a Model Context Protocol (MCP) server that enables AI assistants and clients to generate high-quality speech audio from text input. By connecting AI workflows with this server, users can convert text to .mp3 files and optionally upload them to Amazon S3 or compatible storage. Kokoro TTS leverages advanced models (via HuggingFace spaces and ONNX weights) to provide customizable voices, speeds, and languages, facilitating seamless integration of text-to-speech capabilities into development environments, chatbots, or automation pipelines. This MCP server is especially valuable for scenarios where synthesized speech is needed for accessibility, notifications, or content creation.
List of Prompts
No explicit prompt templates are documented in the repository.
List of Resources
No explicit resources are documented in the repository files or README.
List of Tools
- Text-to-Speech Generation
Converts input text into an .mp3 audio file using Kokoro TTS models. Offers configuration for voice, speed, and language. - S3 Upload
Optionally uploads generated .mp3 files to a specified Amazon S3 bucket/folder if enabled in configuration. - Local MP3 Management
Stores generated .mp3 files in a designated local folder and can automatically delete them after upload or a retention period.
Use Cases of this MCP Server
- Accessibility Solutions:
Integrate Kokoro TTS into applications to provide speech feedback for visually impaired users or to read content aloud. - Voice Notifications:
Automate voice alerts in monitoring or IoT systems by converting event messages to speech audio. - Content Creation:
Generate voiceovers for videos, podcasts, or interactive media directly from written scripts. - Conversational AI/Chatbots:
Enable chatbots to respond with spoken output, enhancing user engagement in customer support or virtual assistant scenarios. - Audio Archiving & Compliance:
Create audio records of text-based communications for compliance or archival purposes.
How to set it up
Windsurf
- Ensure you have
uv
and all Kokoro model files downloaded. - Clone the Kokoro TTS MCP repository to your local machine.
- Edit your Windsurf configuration file to add the Kokoro TTS MCP server.
- Add the following JSON snippet to your
mcpServers
object:{ "kokoro-tts-mcp": { "command": "uv", "args": [ "--directory", "/path/toyourlocal/kokoro-tts-mcp", "run", "mcp-tts.py" ], "env": { "TTS_VOICE": "af_heart", "TTS_SPEED": "1.0", "TTS_LANGUAGE": "en-us", "AWS_ACCESS_KEY_ID": "", "AWS_SECRET_ACCESS_KEY": "", "AWS_REGION": "us-east-1", "AWS_S3_FOLDER": "mp3", "S3_ENABLED": "true", "MP3_FOLDER": "/path/to/mp3" } } }
- Save your configuration and restart Windsurf.
Claude
- Install prerequisites (Node.js, uv, Kokoro models).
- Add the Kokoro TTS MCP server in Claude’s
mcpServers
section. - Insert the JSON configuration as above.
- Save and restart the Claude environment.
Cursor
- Download the repository and required model files.
- Update the
cursor.json
or equivalent config to include the Kokoro TTS MCP server. - Copy the provided JSON snippet, updating paths as needed.
- Save changes and restart Cursor.
Cline
- Clone the repository and configure environment variables.
- Edit the Cline configuration, adding the Kokoro TTS MCP server as shown.
- Save and restart the Cline client.
Securing API Keys
Always use environment variables to store sensitive information like AWS credentials. Example:
"env": {
"AWS_ACCESS_KEY_ID": "${AWS_ACCESS_KEY_ID}",
"AWS_SECRET_ACCESS_KEY": "${AWS_SECRET_ACCESS_KEY}",
...
}
Set these variables in your system or CI environment, never hard-code secrets in your configuration files.
How to use this MCP inside flows
Using MCP in FlowHunt
To integrate MCP servers into your FlowHunt workflow, start by adding the MCP component to your flow and connecting it to your AI agent:

Click on the MCP component to open the configuration panel. In the system MCP configuration section, insert your MCP server details using this JSON format:
{
"kokoro-tts-mcp": {
"transport": "streamable_http",
"url": "https://yourmcpserver.example/pathtothemcp/url"
}
}
Once configured, the AI agent is now able to use this MCP as a tool with access to all its functions and capabilities. Remember to change “kokoro-tts-mcp” to whatever the actual name of your MCP server is and replace the URL with your own MCP server URL.
Overview
Section | Availability | Details/Notes |
---|---|---|
Overview | ✅ | Text-to-speech server for AI workflows |
List of Prompts | ⛔ | No prompt templates found |
List of Resources | ⛔ | No explicit MCP resources documented |
List of Tools | ✅ | TTS, S3 upload, local file management |
Securing API Keys | ✅ | Documented use of env vars for AWS and config |
Sampling Support (less important in evaluation) | ⛔ | No mention of LLM sampling feature |
Our opinion
Kokoro TTS MCP Server is focused and practical, offering a specialized tool for text-to-speech tasks with cloud integration. It lacks prompt and resource primitives, but is open source, well-configured, and supports secure key management. Sampling and Roots support are not mentioned, limiting advanced agentic capabilities. For TTS use cases, it is robust and useful, though not as feature-rich as more generalized MCP servers.
MCP Score
Has a LICENSE | ✅ (Apache-2.0) |
---|---|
Has at least one tool | ✅ |
Number of Forks | 7 |
Number of Stars | 39 |
Frequently asked questions
- What is the Kokoro TTS MCP Server?
Kokoro TTS MCP Server is a Model Context Protocol server that enables AI agents and clients to convert text input into high-quality speech audio, with options for voice, speed, language, and cloud storage. It’s ideal for adding text-to-speech to chatbots, accessibility tools, and automation workflows.
- What are the main features of Kokoro TTS MCP?
It supports customizable voices, speeds, and languages using HuggingFace models and ONNX weights. Audio can be stored locally or uploaded to Amazon S3. It’s easy to integrate into development environments, chatbots, and automation pipelines.
- How do I secure my AWS credentials for S3 upload?
Never hard-code credentials in configuration files. Use environment variables to securely pass sensitive information like AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to the Kokoro TTS MCP Server.
- What are the typical use cases?
Use cases include accessibility solutions (speech for visually impaired users), voice notifications, content creation (voiceovers for media), conversational AI, and audio archiving for compliance.
- Can I use Kokoro TTS with FlowHunt?
Yes, you can add Kokoro TTS as an MCP component in your FlowHunt workflow, enabling your agents to generate audio responses and use all supported tools and configurations.
- Does Kokoro TTS support advanced LLM sampling or prompt templates?
No, Kokoro TTS is focused on high-quality text-to-speech and does not provide prompt primitives or LLM sampling features.
Integrate Kokoro TTS into Your AI Workflow
Add natural, high-quality speech synthesis to your chatbots and automation with Kokoro TTS MCP Server. Try it in FlowHunt or connect with your own infrastructure.