Video Extraction Server

#video-extraction#text-extraction

Description

MCP Video & Audio Text Extraction Server A Model Context Protocol (MCP) server that enables text extraction from various video platforms and audio files, allowing compatible host applications (like Claude Desktop, Cursor) to access video content and perform text transcription. What is it? MCP Video & Audio Text Extraction Server is a Model Context Protocol (MCP) server that can download videos from various platforms, extract audio, and convert it to text. The server utilizes OpenAI's Whisper model for high-quality audio-to-text conversion. How to use it? Clone the repository and install dependencies Ensure FFmpeg is installed Run the server Configure your MCP host application (like Claude Desktop) to use the server Key Features Support video downloads from multiple platforms including YouTube, Bilibili, TikTok, etc. Extract audio content from videos High-quality speech recognition using Whisper model Multi-language text recognition support Asynchronous processing for large files Standardized MCP tools interface Use Cases Provide text transcription capabilities for applications that need to process video content Batch process video content and extract text information Create custom applications requiring audio/video text extraction functionality Enable AI assistants to understand video content FAQ What are the system requirements to run the server? > Requires Python 3.9+, FFmpeg, minimum 8GB RAM, GPU acceleration recommended What should I know about first run? > The system will automatically download the Whisper model file (approximately 1GB), which may take several minutes to tens of minutes What audio formats are supported? > Supports common audio formats including mp3, wav, m4a, etc. This description maintains the core information from the original README while adopting a similar structure and style to the reference page. Would you like me to adjust or add anything to this description?

Capabilities

Supports video downloads from multiple platforms including YouTube, Bilibili, and TikTok.
Extracts audio content from videos and converts it to text.
High-quality speech recognition using the Whisper model.
Multi-language text recognition support.
Asynchronous processing for handling large files efficiently.

Download URL

https://github.com/SealinGp/mcp-video-extraction

Links & Contact

SealinGp/mcp-video-extraction

Contact me on GitHub

Added on 4/28/2025

Share this client:

Recommended Clients

Temporal MCP

Temporal MCP is a bridge connecting AI assistants like Claude with the powerful Temporal workflow engine. By implementing the Model Context Protocol (MCP), it allows AI assistants to discover, execute, and monitor complex workflow orchestrations—all through natural language conversations.

#golang#workflow

View Details

TOME

Tome is a MacOS app (Windows and Linux support coming soon) designed for working with local LLMs and MCP servers, built by the team at Runebook. Tome manages your MCP servers so there's no fiddling with uv/npm or json files - connect it to Ollama, copy/paste some MCP servers, and chat with an MCP-powered model in seconds. This is our very first Technical Preview so bear in mind things will be rough around the edges. Since the world of MCP servers and local models is ever-shifting (read: very janky), we recommend joining us on Discord to share tips, tricks, and issues you run into. Also make sure to star this repo on GitHub to stay on top of updates and feature releases.

#tome#mcp-client

View Details

ARGO - Local AI

Argo is a localized large model agent builder. Our goal is to lower the barriers of AI application development and enable more users to easily assemble large language models, local knowledge, and function calls to build their own AI applications. Users can share these creations in our community, download AI agents from others, and contribute to an active developer ecosystem.

#agent#RAG

View Details

Lutra

Lutra is an MCP compatible client that transforms conversations into actionable, automated workflows.

#ai-chatbot#mcp-client

View Details

Vibeframe - MCP UI for VS Code

Vibeframe is a VSCode extension that allows developers to create rich visual interfaces for MCP servers, integrating seamlessly into the VSCode environment.

#vibeframe#mcp-client

View Details

CLAP - Cognitive Layer Agents Package

CLAP (Cognitive Layer Agents Package) is a powerful multi-agent framework built in Python that supports the development of sophisticated AI agents capable of reasoning, planning, and interacting with external tools and systems.

#mathgpt math-solver

View Details

eechat

eechat is an open-source, lightweight, and extensible messaging platform that empowers users to connect with MCP (Model Context Protocol) servers and interact with various LLM providers — including OpenAI, Anthropic, Groq, and more — through a clean desktop interface. It combines the power of tool-augmented AI with a smooth user experience across Windows, macOS, and Linux. With eechat, developers and AI enthusiasts can effortlessly toggle between models, invoke MCP tools, and monitor usage in real-time — all within a unified, secure, and responsive environment.

#eechat#mpc-client

View Details

AINGDESK

AingDesk is a simple and easy-to-use AI assistant that supports knowledge bases, model APIs, sharing, internet search, and intelligent agents. It is rapidly growing and improving.

#aingdesk#ai-assistant

View Details

Agentica

Agentica is an open-source framework designed to simplify the integration of AI agents with Large Language Models (LLMs). It focuses on structured function calls, providing a reliable and user-friendly experience for developers.

#ai-agents#mcp-client

View Details