Multimodal AI
Vision and audio AI integration. Image analysis, transcription, text-to-speech.
Quick Start
bash
npx ai-multimodal vision ./image.png "Describe this"
What It Does
- •Analyze images with GPT-4 Vision
- •Extract text from images (OCR)
- •Transcribe audio with Whisper
- •Generate speech from text
Usage
bash
# Vision npx ai-multimodal vision ./photo.jpg "What's in this?" # OCR npx ai-multimodal ocr ./screenshot.png # Transcribe npx ai-multimodal transcribe ./audio.mp3 # Text to speech npx ai-multimodal tts "Hello" ./output.mp3
Part of the LXGIC Dev Toolkit
One of 110+ free developer tools from LXGIC Studios.
- •GitHub: https://github.com/lxgicstudios
- •Twitter: https://x.com/lxgicstudios
- •Website: https://lxgicstudios.com
License
MIT. Free forever.