DataHub MCP Server Integration

Integrate FlowHunt AI agents with your organization’s DataHub using the MCP Server, unlocking powerful metadata search, lineage exploration, and automated SQL auditing directly within your AI workflows.

DataHub MCP Server Integration

What does “DataHub” MCP Server do?

The DataHub MCP (Model Context Protocol) Server acts as a bridge between AI assistants and your DataHub data ecosystem. By exposing DataHub’s powerful metadata and context APIs via the MCP standard, this server enables AI agents to search across all entity types, fetch detailed metadata, traverse data lineage, and list associated SQL queries. This dramatically improves development workflows by allowing AI models to access up-to-date data context, perform complex queries, and automate metadata exploration directly from your preferred AI interface. DataHub MCP Server supports both DataHub Core and DataHub Cloud, making it a versatile solution for organizations seeking to integrate their metadata platform with AI-driven tools and assistants.

List of Prompts

No prompt templates are detailed or mentioned in the repository or README.

List of Resources

No explicit MCP resource primitives are described in the repository or README.

List of Tools

  • Search across all entity types and using arbitrary filters
    Enables clients to query DataHub entities (datasets, dashboards, pipelines, etc.) using custom filters.
  • Fetch metadata for any entity
    Retrieves comprehensive metadata about a specific DataHub entity.
  • Traverse the lineage graph (upstream and downstream)
    Allows exploration of data lineage, both upstream (sources) and downstream (consumers) for a given entity.
  • List SQL queries associated with a dataset
    Surfaces SQL queries linked to a particular dataset for auditing and understanding data usage.

Use Cases of this MCP Server

  • Comprehensive Data Discovery
    Developers and data scientists can search and filter across all DataHub entities, accelerating data discovery and reducing manual effort.
  • Automated Metadata Fetching
    AI agents can programmatically retrieve detailed entity metadata, supporting automated documentation, quality checks, or onboarding workflows.
  • Lineage Analysis for Impact Assessment
    By traversing upstream and downstream lineage, teams can instantly assess the impact of changes and improve data governance.
  • SQL Query Auditing
    Easily list and analyze SQL queries associated with datasets, aiding in compliance monitoring, performance tuning, and data access optimization.
  • Integration With AI-Powered Agents
    Seamlessly connect DataHub with modern AI assistants to automate repetitive data management and exploration tasks directly from chat or code environments.

How to set it up

Windsurf

No Windsurf-specific instructions found in the repository.

Claude

  1. Install uv.

  2. Locate the full path to the uvx command using which uvx.

  3. Obtain your DataHub URL and personal access token.

  4. Edit your claude_desktop_config.json file:

    {
      "mcpServers": {
        "datahub": {
          "command": "<full-path-to-uvx>",  // e.g. /Users/hsheth/.local/bin/uvx
          "args": ["mcp-server-datahub"],
          "env": {
            "DATAHUB_GMS_URL": "<your-datahub-url>",
            "DATAHUB_GMS_TOKEN": "<your-datahub-token>"
          }
        }
      }
    }
    
  5. Save and (re)start Claude Desktop. Verify connection in the agent interface.

Cursor

  1. Install uv.

  2. Obtain your DataHub URL and personal access token.

  3. Edit .cursor/mcp.json:

    {
      "mcpServers": {
        "datahub": {
          "command": "uvx",
          "args": ["mcp-server-datahub"],
          "env": {
            "DATAHUB_GMS_URL": "<your-datahub-url>",
            "DATAHUB_GMS_TOKEN": "<your-datahub-token>"
          }
        }
      }
    }
    
  4. Save the file and restart Cursor. Check the MCP status panel.

Cline

No Cline-specific instructions found in the repository.

Generic/Other MCP Clients

  1. Install uv.

  2. Prepare your DataHub URL and personal access token.

  3. Use this configuration:

    command: uvx
    args:
      - mcp-server-datahub
    env:
      DATAHUB_GMS_URL: <your-datahub-url>
      DATAHUB_GMS_TOKEN: <your-datahub-token>
    
  4. Integrate this command in your MCP client configuration.

Securing API Keys

Always store sensitive credentials like DATAHUB_GMS_TOKEN in environment variables, not in plaintext files. In your configuration, use the env field as shown above to inject secrets securely.

How to use this MCP inside flows

Using MCP in FlowHunt

To integrate MCP servers into your FlowHunt workflow, start by adding the MCP component to your flow and connecting it to your AI agent:

FlowHunt MCP flow

Click on the MCP component to open the configuration panel. In the system MCP configuration section, insert your MCP server details using this JSON format:

{
  "datahub": {
    "transport": "streamable_http",
    "url": "https://yourmcpserver.example/pathtothemcp/url"
  }
}

Once configured, the AI agent is now able to use this MCP as a tool with access to all its functions and capabilities. Remember to change “datahub” to whatever the actual name of your MCP server is and replace the URL with your own MCP server URL.


Overview

SectionAvailabilityDetails/Notes
OverviewPresent in README and repo description
List of PromptsNo prompt templates found
List of ResourcesNo explicit MCP resource primitives described
List of ToolsTools described in README features section
Securing API KeysEnvironment variables in setup instructions
Sampling Support (less important in evaluation)No mention of sampling in README or code

I would rate this MCP server at about 6/10. It has a clear open-source license, multiple real tools, and basic secure setup instructions, but lacks documented prompt templates, explicit resource primitives, and advanced MCP features like sampling or roots.


MCP Score

Has a LICENSE✅ (Apache-2.0)
Has at least one tool
Number of Forks13
Number of Stars37

Frequently asked questions

What does the DataHub MCP Server do?

It exposes DataHub's metadata and context APIs via the MCP standard, enabling AI agents to search, retrieve metadata, traverse lineage, and list SQL queries on your organizational data, directly from FlowHunt or other AI tools.

Which DataHub platforms are supported?

Both DataHub Core and DataHub Cloud are supported, so you can connect regardless of your deployment.

What are the main use cases?

Common use cases include comprehensive data discovery, automated metadata fetching, lineage analysis for impact assessment, SQL query auditing, and integration with AI-powered agents for workflow automation.

How do I securely provide credentials?

Always use environment variables for sensitive credentials like DATAHUB_GMS_TOKEN. Inject them using the 'env' field in your configuration files to keep secrets safe.

Are prompt templates or resource primitives included?

No explicit prompt templates or MCP resource primitives are included with this server.

What tools does this MCP server offer?

It provides searching across all entity types, fetching metadata, lineage traversal, and listing SQL queries associated with datasets.

How do I connect DataHub MCP to FlowHunt?

Add an MCP component in your FlowHunt flow, configure it with your DataHub MCP server JSON as shown in the documentation, and connect it to your AI agent for immediate access to DataHub capabilities.

Connect FlowHunt with DataHub via MCP

Empower your AI workflows with real-time access to organizational metadata, lineage, and data discovery tools using the DataHub MCP Server. Automate data management and governance directly from FlowHunt.

Learn more