What is the Kokoro TTS MCP Server?

Kokoro TTS MCP Server is a Model Context Protocol server that enables AI agents and clients to convert text input into high-quality speech audio, with options for voice, speed, language, and cloud storage. It’s ideal for adding text-to-speech to chatbots, accessibility tools, and automation workflows.

What are the main features of Kokoro TTS MCP?

It supports customizable voices, speeds, and languages using HuggingFace models and ONNX weights. Audio can be stored locally or uploaded to Amazon S3. It’s easy to integrate into development environments, chatbots, and automation pipelines.

How do I secure my AWS credentials for S3 upload?

Never hard-code credentials in configuration files. Use environment variables to securely pass sensitive information like AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to the Kokoro TTS MCP Server.

What are the typical use cases?

Use cases include accessibility solutions (speech for visually impaired users), voice notifications, content creation (voiceovers for media), conversational AI, and audio archiving for compliance.

Can I use Kokoro TTS with FlowHunt?

Yes, you can add Kokoro TTS as an MCP component in your FlowHunt workflow, enabling your agents to generate audio responses and use all supported tools and configurations.

Does Kokoro TTS support advanced LLM sampling or prompt templates?

No, Kokoro TTS is focused on high-quality text-to-speech and does not provide prompt primitives or LLM sampling features.

Kokoro TTS MCP Server

Kokoro TTS MCP Server brings natural-sounding, customizable text-to-speech to your AI applications, with support for local and cloud audio storage, ideal for accessibility, automation, and content creation.

AI TTS MCP Server Text-to-Speech

Try in FlowHunt Book a Demo

Contact us to host your MCP Server in FlowHunt

FlowHunt provides an additional security layer between your internal systems and AI tools, giving you granular control over which tools are accessible from your MCP servers. MCP servers hosted in our infrastructure can be seamlessly integrated with FlowHunt's chatbot as well as popular AI platforms like ChatGPT, Claude, and various AI editors.

support@flowhunt.io

What does “Kokoro TTS” MCP Server do?

The Kokoro Text to Speech (TTS) MCP Server is a Model Context Protocol (MCP) server that enables AI assistants and clients to generate high-quality speech audio from text input. By connecting AI workflows with this server, users can convert text to .mp3 files and optionally upload them to Amazon S3 or compatible storage. Kokoro TTS leverages advanced models (via HuggingFace spaces and ONNX weights) to provide customizable voices, speeds, and languages, facilitating seamless integration of text-to-speech capabilities into development environments, chatbots, or automation pipelines. This MCP server is especially valuable for scenarios where synthesized speech is needed for accessibility, notifications, or content creation.

List of Prompts

No explicit prompt templates are documented in the repository.

List of Resources

No explicit resources are documented in the repository files or README.

List of Tools

Text-to-Speech Generation
Converts input text into an .mp3 audio file using Kokoro TTS models. Offers configuration for voice, speed, and language.
S3 Upload
Optionally uploads generated .mp3 files to a specified Amazon S3 bucket/folder if enabled in configuration.
Local MP3 Management
Stores generated .mp3 files in a designated local folder and can automatically delete them after upload or a retention period.

Use Cases of this MCP Server

Accessibility Solutions:
Integrate Kokoro TTS into applications to provide speech feedback for visually impaired users or to read content aloud.
Voice Notifications:
Automate voice alerts in monitoring or IoT systems by converting event messages to speech audio.
Content Creation:
Generate voiceovers for videos, podcasts, or interactive media directly from written scripts.
Conversational AI/Chatbots:
Enable chatbots to respond with spoken output, enhancing user engagement in customer support or virtual assistant scenarios.
Audio Archiving & Compliance:
Create audio records of text-based communications for compliance or archival purposes.

How to set it up

Windsurf

Ensure you have uv and all Kokoro model files downloaded.
Clone the Kokoro TTS MCP repository to your local machine.
Edit your Windsurf configuration file to add the Kokoro TTS MCP server.

Add the following JSON snippet to your mcpServers object:

{
  "kokoro-tts-mcp": {
    "command": "uv",
    "args": [
      "--directory",
      "/path/toyourlocal/kokoro-tts-mcp",
      "run",
      "mcp-tts.py"
    ],
    "env": {
      "TTS_VOICE": "af_heart",
      "TTS_SPEED": "1.0",
      "TTS_LANGUAGE": "en-us",
      "AWS_ACCESS_KEY_ID": "",
      "AWS_SECRET_ACCESS_KEY": "",
      "AWS_REGION": "us-east-1",
      "AWS_S3_FOLDER": "mp3",
      "S3_ENABLED": "true",
      "MP3_FOLDER": "/path/to/mp3"
    }
  }
}

Save your configuration and restart Windsurf.

Claude

Install prerequisites (Node.js, uv, Kokoro models).
Add the Kokoro TTS MCP server in Claude’s mcpServers section.
Insert the JSON configuration as above.
Save and restart the Claude environment.

Cursor

Download the repository and required model files.
Update the cursor.json or equivalent config to include the Kokoro TTS MCP server.
Copy the provided JSON snippet, updating paths as needed.
Save changes and restart Cursor.

Cline

Clone the repository and configure environment variables.
Edit the Cline configuration, adding the Kokoro TTS MCP server as shown.
Save and restart the Cline client.

Securing API Keys

Always use environment variables to store sensitive information like AWS credentials. Example:

"env": {
  "AWS_ACCESS_KEY_ID": "${AWS_ACCESS_KEY_ID}",
  "AWS_SECRET_ACCESS_KEY": "${AWS_SECRET_ACCESS_KEY}",
  ...
}

Set these variables in your system or CI environment, never hard-code secrets in your configuration files.

How to use this MCP inside flows

Using MCP in FlowHunt

To integrate MCP servers into your FlowHunt workflow, start by adding the MCP component to your flow and connecting it to your AI agent:

Click on the MCP component to open the configuration panel. In the system MCP configuration section, insert your MCP server details using this JSON format:

{
  "kokoro-tts-mcp": {
    "transport": "streamable_http",
    "url": "https://yourmcpserver.example/pathtothemcp/url"
  }
}

Once configured, the AI agent is now able to use this MCP as a tool with access to all its functions and capabilities. Remember to change “kokoro-tts-mcp” to whatever the actual name of your MCP server is and replace the URL with your own MCP server URL.

Overview

Section	Availability	Details/Notes
Overview	✅	Text-to-speech server for AI workflows
List of Prompts	⛔	No prompt templates found
List of Resources	⛔	No explicit MCP resources documented
List of Tools	✅	TTS, S3 upload, local file management
Securing API Keys	✅	Documented use of env vars for AWS and config
Sampling Support (less important in evaluation)	⛔	No mention of LLM sampling feature

Our opinion

Kokoro TTS MCP Server is focused and practical, offering a specialized tool for text-to-speech tasks with cloud integration. It lacks prompt and resource primitives, but is open source, well-configured, and supports secure key management. Sampling and Roots support are not mentioned, limiting advanced agentic capabilities. For TTS use cases, it is robust and useful, though not as feature-rich as more generalized MCP servers.

MCP Score

Has a LICENSE	✅ (Apache-2.0)
Has at least one tool	✅
Number of Forks	7
Number of Stars	39

Frequently asked questions

: Kokoro TTS MCP Server is a Model Context Protocol server that enables AI agents and clients to convert text input into high-quality speech audio, with options for voice, speed, language, and cloud storage. It’s ideal for adding text-to-speech to chatbots, accessibility tools, and automation workflows.
: It supports customizable voices, speeds, and languages using HuggingFace models and ONNX weights. Audio can be stored locally or uploaded to Amazon S3. It’s easy to integrate into development environments, chatbots, and automation pipelines.
: Never hard-code credentials in configuration files. Use environment variables to securely pass sensitive information like AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to the Kokoro TTS MCP Server.
: Use cases include accessibility solutions (speech for visually impaired users), voice notifications, content creation (voiceovers for media), conversational AI, and audio archiving for compliance.
: Yes, you can add Kokoro TTS as an MCP component in your FlowHunt workflow, enabling your agents to generate audio responses and use all supported tools and configurations.
: No, Kokoro TTS is focused on high-quality text-to-speech and does not provide prompt primitives or LLM sampling features.

Integrate Kokoro TTS into Your AI Workflow

Add natural, high-quality speech synthesis to your chatbots and automation with Kokoro TTS MCP Server. Try it in FlowHunt or connect with your own infrastructure.

Try in FlowHunt Book a Demo

Learn more

Kokoro TTS

Integrate FlowHunt with Kokoro Text-to-Speech MCP Server to automate high-quality MP3 file generation, enable secure S3 uploads, and streamline TTS delivery for...

Aug 12, 2025 4 min read

AI Kokoro TTS +3

SlideSpeak MCP Server

The SlideSpeak MCP Server connects AI assistants to the SlideSpeak API, enabling automated, programmatic creation of PowerPoint presentations for business, educ...

Jun 18, 2025 4 min read

MCP Server Automation +4

ElevenLabs MCP Server

The ElevenLabs MCP Server integrates ElevenLabs text-to-speech API into AI workflows, enabling automated, high-quality voice synthesis, voice management, and au...

Jun 18, 2025 4 min read

Text-to-Speech AI Integration +3