Huanshere/VideoLingo
Fork: 887 Star: 9117 (更新于 2025-01-12 21:09:59)
license: Apache-2.0
Language: Python .
Netflix-level subtitle cutting, translation, alignment, and even dubbing - one-click fully automated AI video subtitle team | Netflix级字幕切割、翻译、对齐、甚至加上配音,一键全自动视频搬运AI字幕组
最后发布版本: v2.1.1 ( 2024-12-10 17:00:37)
🌟 Overview (Try VideoLingo For Free!)
VideoLingo is an all-in-one video translation, localization, and dubbing tool aimed at generating Netflix-quality subtitles. It eliminates stiff machine translations and multi-line subtitles while adding high-quality dubbing, enabling global knowledge sharing across language barriers.
Key features:
-
🎥 YouTube video download via yt-dlp
-
🎙️ Word-level and Low-illusion subtitle recognition with WhisperX
-
📝 NLP and AI-powered subtitle segmentation
-
📚 Custom + AI-generated terminology for coherent translation
-
🔄 3-step Translate-Reflect-Adaptation for cinematic quality
-
✅ Netflix-standard, Single-line subtitles Only
-
🗣️ Dubbing with GPT-SoVITS, Azure, OpenAI, and more
-
🚀 One-click startup and processing in Streamlit
-
📝 Detailed logging with progress resumption
Difference from similar projects: Single-line subtitles only, superior translation quality, seamless dubbing experience
🎥 Demo
Russian Translationhttps://github.com/user-attachments/assets/25264b5b-6931-4d39-948c-5a1e4ce42fa7 |
GPT-SoVITS Dubbinghttps://github.com/user-attachments/assets/47d965b2-b4ab-4a0b-9d08-b49a7bf3508c |
Language Support
Input Language Support(more to come):
🇺🇸 English 🤩 | 🇷🇺 Russian 😊 | 🇫🇷 French 🤩 | 🇩🇪 German 🤩 | 🇮🇹 Italian 🤩 | 🇪🇸 Spanish 🤩 | 🇯🇵 Japanese 😐 | 🇨🇳 Chinese* 😊
*Chinese uses a separate punctuation-enhanced whisper model, for now...
Translation supports all languages, while dubbing language depends on the chosen TTS method.
Installation
Note: To use NVIDIA GPU acceleration on Windows, please complete the following steps first:
- Install CUDA Toolkit 12.6
- Install CUDNN 9.3.0
- Add
C:\Program Files\NVIDIA\CUDNN\v9.3\bin\12.6
to your system PATH- Restart your computer
Note: FFmpeg is required. Please install it via package managers:
- Windows:
choco install ffmpeg
(via Chocolatey)- macOS:
brew install ffmpeg
(via Homebrew)- Linux:
sudo apt install ffmpeg
(Debian/Ubuntu) orsudo dnf install ffmpeg
(Fedora)
- Clone the repository
git clone https://github.com/Huanshere/VideoLingo.git
cd VideoLingo
- Install dependencies(requires
python=3.10
)
conda create -n videolingo python=3.10.0 -y
conda activate videolingo
python install.py
- Start the application
streamlit run st.py
Docker
Alternatively, you can use Docker (requires CUDA 12.4 and NVIDIA Driver version >550), see Docker docs:
docker build -t videolingo .
docker run -d -p 8501:8501 --gpus all videolingo
API
VideoLingo supports OpenAI-Like API format and various dubbing interfaces:
-
claude-3-5-sonnet-20240620
,gemini-2.0-flash-exp
,gpt-4o
,deepseek-coder
, ... (sorted by performance) -
azure-tts
,openai-tts
,siliconflow-fishtts
,fish-tts
,GPT-SoVITS
,edge-tts
,*custom-tts
(ask gpt to help you define in custom_tts.py)
Note: VideoLingo is now integrated with 302.ai, one API KEY for both LLM and TTS! Also supports fully local deployment using Ollama for LLM and Edge-TTS for dubbing, no cloud API required!
For detailed installation, API configuration, and batch mode instructions, please refer to the documentation: English | 中文
Current Limitations
-
WhisperX transcription performance may be affected by video background noise, as it uses wav2vac model for alignment. For videos with loud background music, please enable Voice Separation Enhancement. Additionally, subtitles ending with numbers or special characters may be truncated early due to wav2vac's inability to map numeric characters (e.g., "1") to their spoken form ("one").
-
Using weaker models can lead to errors during intermediate processes due to strict JSON format requirements for responses. If this error occurs, please delete the
output
folder and retry with a different LLM, otherwise repeated execution will read the previous erroneous response causing the same error. -
The dubbing feature may not be 100% perfect due to differences in speech rates and intonation between languages, as well as the impact of the translation step. However, this project has implemented extensive engineering processing for speech rates to ensure the best possible dubbing results.
-
Multilingual video transcription recognition will only retain the main language. This is because whisperX uses a specialized model for a single language when forcibly aligning word-level subtitles, and will delete unrecognized languages.
-
Cannot dub multiple characters separately, as whisperX's speaker distinction capability is not sufficiently reliable.
📄 License
This project is licensed under the Apache 2.0 License. Special thanks to the following open source projects for their contributions:
whisperX, yt-dlp, json_repair, BELLE
📬 Contact Us
- Join our Discord: https://discord.gg/9F2G92CWPp
- Submit Issues or Pull Requests on GitHub
- Follow me on Twitter: @Huanshere
- Email me at: team@videolingo.io
⭐ Star History
If you find VideoLingo helpful, please give us a ⭐️!
最近版本更新:(数据更新于 2024-12-11 19:20:14)
2024-12-10 17:00:37 v2.1.1
2024-12-05 14:45:01 v2.1.0
2024-12-03 18:14:06 v2.0.4
2024-12-02 12:19:58 v2.0.3
2024-12-01 16:18:02 v2.0.2-deprecated
2024-11-27 17:20:50 v2.0.1
2024-11-17 17:02:12 v2.0
2024-11-14 00:25:19 v1.8.0
2024-11-11 15:09:38 v1.7.1
2024-10-30 18:14:08 v1.7.0
主题(topics):
ai-translation, dubbing, localization, video-translation, voice-cloning
Huanshere/VideoLingo同语言 Python最近更新仓库
2025-01-18 21:26:31 sunnypilot/sunnypilot
2025-01-17 23:34:10 Skyvern-AI/skyvern
2025-01-17 19:49:33 ultralytics/ultralytics
2025-01-17 19:12:03 XiaoMi/ha_xiaomi_home
2025-01-17 08:27:45 comfyanonymous/ComfyUI
2025-01-17 04:56:19 QuivrHQ/MegaParse