Voice Assistant Platform
Purpose
Build and operate a callable, company-facing voice assistant that answers questions about a business (location, opening hours, contact methods, reachability, services, and policies). The platform is designed for multi-tenant deployments: each company maintains its own knowledge base, while core speech and language services are shared to keep operations efficient.
What This System Does
- •Accepts live or recorded audio from callers.
- •Detects speech segments to avoid sending silence and noise downstream.
- •Transcribes speech into text with STT.
- •Retrieves company-specific knowledge with RAG.
- •Generates a concise, accurate answer with the shared LLM.
- •Synthesizes spoken replies with TTS.
- •Returns audio and text results with timing data for monitoring and UX feedback.
Multi-Tenant Model
- •Each company is a tenant with its own data and retrieval index.
- •The RAG service is instantiated per tenant and points at tenant data sources.
- •STT, TTS, VAD, and the LLM are shared services across all tenants.
- •The backend gateway routes requests to the correct tenant RAG based on deployment config.
Core Services
- •RAG: Per-tenant retrieval service. Companies can update their own information without changing core services.
- •STT: Shared speech-to-text service (Whisper). Converts audio to text.
- •TTS: Shared text-to-speech service (Piper). Converts responses into audio.
- •VAD: Shared voice activity detection. Identifies speech segments to improve accuracy and efficiency.
- •Backend: Orchestrates the pipeline and exposes HTTP + WebSocket APIs.
- •Frontend: Serves the UI for testing or operational use.
Typical Data Sources Per Company
- •Location and address details
- •Opening hours and holiday schedules
- •Contact and reachability information
- •Services offered and pricing/availability
- •FAQ and policy documents
Operational Goals
- •Consistent responses across multiple companies with tenant-specific accuracy.
- •Low-latency speech pipeline with observable timings.
- •Easy onboarding of new companies by providing their data to RAG.
- •Shared infrastructure for compute-heavy services to reduce cost.
Main Components
- •End-to-end audio pipeline: VAD -> STT -> RAG -> LLM -> TTS.
- •Tenant-specific indices and retrieval settings.
- •Standardized APIs for health, config, and inference.
- •Streaming support for live voice interactions.