Kokoro TTS MCP Server

Kokoro TTS MCP Server brings natural-sounding, customizable text-to-speech to your AI applications, with support for local and cloud audio storage, ideal for accessibility, automation, and content creation.

Kokoro TTS MCP Server

What does “Kokoro TTS” MCP Server do?

The Kokoro Text to Speech (TTS) MCP Server is a Model Context Protocol (MCP) server that enables AI assistants and clients to generate high-quality speech audio from text input. By connecting AI workflows with this server, users can convert text to .mp3 files and optionally upload them to Amazon S3 or compatible storage. Kokoro TTS leverages advanced models (via HuggingFace spaces and ONNX weights) to provide customizable voices, speeds, and languages, facilitating seamless integration of text-to-speech capabilities into development environments, chatbots, or automation pipelines. This MCP server is especially valuable for scenarios where synthesized speech is needed for accessibility, notifications, or content creation.

List of Prompts

No explicit prompt templates are documented in the repository.

List of Resources

No explicit resources are documented in the repository files or README.

List of Tools

  • Text-to-Speech Generation
    Converts input text into an .mp3 audio file using Kokoro TTS models. Offers configuration for voice, speed, and language.
  • S3 Upload
    Optionally uploads generated .mp3 files to a specified Amazon S3 bucket/folder if enabled in configuration.
  • Local MP3 Management
    Stores generated .mp3 files in a designated local folder and can automatically delete them after upload or a retention period.

Use Cases of this MCP Server

  • Accessibility Solutions:
    Integrate Kokoro TTS into applications to provide speech feedback for visually impaired users or to read content aloud.
  • Voice Notifications:
    Automate voice alerts in monitoring or IoT systems by converting event messages to speech audio.
  • Content Creation:
    Generate voiceovers for videos, podcasts, or interactive media directly from written scripts.
  • Conversational AI/Chatbots:
    Enable chatbots to respond with spoken output, enhancing user engagement in customer support or virtual assistant scenarios.
  • Audio Archiving & Compliance:
    Create audio records of text-based communications for compliance or archival purposes.

How to set it up

Windsurf

  1. Ensure you have uv and all Kokoro model files downloaded.
  2. Clone the Kokoro TTS MCP repository to your local machine.
  3. Edit your Windsurf configuration file to add the Kokoro TTS MCP server.
  4. Add the following JSON snippet to your mcpServers object:
    {
      "kokoro-tts-mcp": {
        "command": "uv",
        "args": [
          "--directory",
          "/path/toyourlocal/kokoro-tts-mcp",
          "run",
          "mcp-tts.py"
        ],
        "env": {
          "TTS_VOICE": "af_heart",
          "TTS_SPEED": "1.0",
          "TTS_LANGUAGE": "en-us",
          "AWS_ACCESS_KEY_ID": "",
          "AWS_SECRET_ACCESS_KEY": "",
          "AWS_REGION": "us-east-1",
          "AWS_S3_FOLDER": "mp3",
          "S3_ENABLED": "true",
          "MP3_FOLDER": "/path/to/mp3"
        }
      }
    }
    
  5. Save your configuration and restart Windsurf.

Claude

  1. Install prerequisites (Node.js, uv, Kokoro models).
  2. Add the Kokoro TTS MCP server in Claude’s mcpServers section.
  3. Insert the JSON configuration as above.
  4. Save and restart the Claude environment.

Cursor

  1. Download the repository and required model files.
  2. Update the cursor.json or equivalent config to include the Kokoro TTS MCP server.
  3. Copy the provided JSON snippet, updating paths as needed.
  4. Save changes and restart Cursor.

Cline

  1. Clone the repository and configure environment variables.
  2. Edit the Cline configuration, adding the Kokoro TTS MCP server as shown.
  3. Save and restart the Cline client.

Securing API Keys

Always use environment variables to store sensitive information like AWS credentials. Example:

"env": {
  "AWS_ACCESS_KEY_ID": "${AWS_ACCESS_KEY_ID}",
  "AWS_SECRET_ACCESS_KEY": "${AWS_SECRET_ACCESS_KEY}",
  ...
}

Set these variables in your system or CI environment, never hard-code secrets in your configuration files.

How to use this MCP inside flows

Using MCP in FlowHunt

To integrate MCP servers into your FlowHunt workflow, start by adding the MCP component to your flow and connecting it to your AI agent:

FlowHunt MCP flow

Click on the MCP component to open the configuration panel. In the system MCP configuration section, insert your MCP server details using this JSON format:

{
  "kokoro-tts-mcp": {
    "transport": "streamable_http",
    "url": "https://yourmcpserver.example/pathtothemcp/url"
  }
}

Once configured, the AI agent is now able to use this MCP as a tool with access to all its functions and capabilities. Remember to change “kokoro-tts-mcp” to whatever the actual name of your MCP server is and replace the URL with your own MCP server URL.


Overview

SectionAvailabilityDetails/Notes
OverviewText-to-speech server for AI workflows
List of PromptsNo prompt templates found
List of ResourcesNo explicit MCP resources documented
List of ToolsTTS, S3 upload, local file management
Securing API KeysDocumented use of env vars for AWS and config
Sampling Support (less important in evaluation)No mention of LLM sampling feature

Our opinion

Kokoro TTS MCP Server is focused and practical, offering a specialized tool for text-to-speech tasks with cloud integration. It lacks prompt and resource primitives, but is open source, well-configured, and supports secure key management. Sampling and Roots support are not mentioned, limiting advanced agentic capabilities. For TTS use cases, it is robust and useful, though not as feature-rich as more generalized MCP servers.

MCP Score

Has a LICENSE✅ (Apache-2.0)
Has at least one tool
Number of Forks7
Number of Stars39

Frequently asked questions

What is the Kokoro TTS MCP Server?

Kokoro TTS MCP Server is a Model Context Protocol server that enables AI agents and clients to convert text input into high-quality speech audio, with options for voice, speed, language, and cloud storage. It’s ideal for adding text-to-speech to chatbots, accessibility tools, and automation workflows.

What are the main features of Kokoro TTS MCP?

It supports customizable voices, speeds, and languages using HuggingFace models and ONNX weights. Audio can be stored locally or uploaded to Amazon S3. It’s easy to integrate into development environments, chatbots, and automation pipelines.

How do I secure my AWS credentials for S3 upload?

Never hard-code credentials in configuration files. Use environment variables to securely pass sensitive information like AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to the Kokoro TTS MCP Server.

What are the typical use cases?

Use cases include accessibility solutions (speech for visually impaired users), voice notifications, content creation (voiceovers for media), conversational AI, and audio archiving for compliance.

Can I use Kokoro TTS with FlowHunt?

Yes, you can add Kokoro TTS as an MCP component in your FlowHunt workflow, enabling your agents to generate audio responses and use all supported tools and configurations.

Does Kokoro TTS support advanced LLM sampling or prompt templates?

No, Kokoro TTS is focused on high-quality text-to-speech and does not provide prompt primitives or LLM sampling features.

Integrate Kokoro TTS into Your AI Workflow

Add natural, high-quality speech synthesis to your chatbots and automation with Kokoro TTS MCP Server. Try it in FlowHunt or connect with your own infrastructure.

Learn more