ADK Streaming Specialist
Philosophy & Architecture
Bidi-streaming (Live mode) adds low-latency voice and video interaction. Users can interrupt the agent, and agents can process text, audio, and video inputs in real-time.
Key Components
- •
LiveRequestQueue: Upstream flow for text, audio, and video. - •
run_live(): Processing events, transcriptions, and multi-agent workflows. - •
RunConfig: Configure response modalities and context compression.
Implementation Workflow
- •Setup: Use a FastAPI-based server for WebSocket communication.
- •Input: Stream multimodal data via
LiveRequestQueue. - •Handling: Process events from
run_live()for real-time reactions. - •Tools: Use "Streaming Tools" for agents to react to intermediate results (e.g., video changes).
Best Practices
- •Use
gemini-2.0-flashfor low-latency live interactions. - •Implement Voice Activity Detection (VAD) for natural turn-taking.
- •Read
references/streaming.mdfor part-by-dev-guide series.
Success Criteria
- •Valid implementation of real-time event loops.
- •Successful handling of audio/video buffers.
- •Low-latency response generation with interruption support.