Perception Pipeline Skill

This skill provides comprehensive knowledge for deploying and managing the Ahling Command Center perception layer: Frigate NVR, DoubleTake facial recognition, CompreFace, and the Wyoming voice pipeline (Whisper, Piper, OpenWakeWord).

Trigger Phrases

•"frigate", "nvr", "camera", "object detection"
•"doubletake", "facial recognition", "face detection"
•"whisper", "speech to text", "stt", "transcription"
•"piper", "text to speech", "tts", "voice synthesis"
•"wyoming", "voice pipeline", "wake word"
•"compreface", "face embedding", "face matching"

Perception Architecture

code

┌─────────────────────────────────────────────────────────────────────────┐
│                      PERCEPTION LAYER (Phase 5)                          │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│   ┌───────────────────────────────────────────────────────────────┐    │
│   │                      VIDEO PIPELINE                            │    │
│   │   ┌─────────┐    ┌─────────────┐    ┌─────────────┐          │    │
│   │   │ Cameras │───▶│   Frigate   │───▶│ DoubleTake  │          │    │
│   │   │ (RTSP)  │    │   (NVR)     │    │   (Face)    │          │    │
│   │   └─────────┘    └─────────────┘    └─────────────┘          │    │
│   │                         │                  │                  │    │
│   │                         ▼                  ▼                  │    │
│   │                  ┌─────────────┐    ┌─────────────┐          │    │
│   │                  │  CompreFace │    │    MQTT     │          │    │
│   │                  │ (Embedding) │    │  (Events)   │          │    │
│   │                  └─────────────┘    └─────────────┘          │    │
│   └───────────────────────────────────────────────────────────────┘    │
│                                                                          │
│   ┌───────────────────────────────────────────────────────────────┐    │
│   │                      VOICE PIPELINE                            │    │
│   │   ┌─────────┐    ┌─────────────┐    ┌─────────────┐          │    │
│   │   │ Willow  │───▶│  Whisper    │───▶│   Ollama    │          │    │
│   │   │(Satellite)│  │   (STT)     │    │  (Intent)   │          │    │
│   │   └─────────┘    └─────────────┘    └─────────────┘          │    │
│   │                         │                  │                  │    │
│   │                         ▼                  ▼                  │    │
│   │                  ┌─────────────┐    ┌─────────────┐          │    │
│   │                  │   Piper     │◀───│    Home     │          │    │
│   │                  │   (TTS)     │    │  Assistant  │          │    │
│   │                  └─────────────┘    └─────────────┘          │    │
│   └───────────────────────────────────────────────────────────────┘    │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Frigate NVR

Hardware Acceleration (AMD ROCm)

Frigate supports AMD GPUs for object detection. Configure for RX 7900 XTX:

yaml

# frigate/config/config.yml
mqtt:
  host: mosquitto
  port: 1883
  user: frigate
  password: ${MQTT_PASSWORD}

detectors:
  ov:
    type: openvino
    device: GPU
    model:
      path: /openvino-model/ssdlite_mobilenet_v2.xml

ffmpeg:
  hwaccel_args: preset-vaapi  # AMD VA-API acceleration

cameras:
  front_door:
    ffmpeg:
      inputs:
        - path: rtsp://camera:554/stream1
          roles:
            - detect
            - record
        - path: rtsp://camera:554/stream2
          roles:
            - rtmp
    detect:
      width: 1920
      height: 1080
      fps: 5
    objects:
      track:
        - person
        - car
        - dog
        - cat
    record:
      enabled: true
      retain:
        days: 7
        mode: motion
      events:
        retain:
          default: 14
          mode: active_objects
    snapshots:
      enabled: true
      retain:
        default: 14
    zones:
      driveway:
        coordinates: 0,480,640,480,640,360,0,360
      porch:
        coordinates: 640,480,1280,480,1280,360,640,360

Docker Compose

yaml

services:
  frigate:
    image: ghcr.io/blakeblackshear/frigate:stable
    container_name: frigate
    privileged: true
    shm_size: "256mb"
    devices:
      - /dev/dri/renderD128:/dev/dri/renderD128  # AMD GPU
    volumes:
      - ./frigate/config:/config
      - frigate_media:/media/frigate
      - type: tmpfs
        target: /tmp/cache
        tmpfs:
          size: 1000000000
    ports:
      - "5000:5000"
      - "8554:8554"  # RTSP
      - "8555:8555/tcp"  # WebRTC
      - "8555:8555/udp"
    environment:
      FRIGATE_RTSP_PASSWORD: ${RTSP_PASSWORD}
    deploy:
      resources:
        reservations:
          devices:
            - driver: amd
              count: 1
              capabilities: [gpu]

MQTT Event Integration

python

# Frigate publishes to these MQTT topics
FRIGATE_TOPICS = [
    "frigate/events",                    # All detection events
    "frigate/+/person",                  # Person detection by camera
    "frigate/+/car",                     # Car detection by camera
    "frigate/+/motion",                  # Motion events
    "frigate/+/person/snapshot",         # Person snapshot JPEG
    "frigate/reviews",                   # Review queue events
]

# Event payload structure
{
    "type": "new",  # new, update, end
    "before": {...},
    "after": {
        "id": "1234567890.123456-random",
        "camera": "front_door",
        "frame_time": 1234567890.123456,
        "snapshot_time": 1234567890.123456,
        "label": "person",
        "sub_label": null,
        "top_score": 0.95,
        "false_positive": null,
        "start_time": 1234567890.123456,
        "end_time": null,
        "score": 0.95,
        "box": [100, 200, 300, 400],
        "area": 40000,
        "ratio": 1.5,
        "region": [0, 0, 640, 480],
        "current_zones": ["driveway"],
        "entered_zones": ["driveway"],
        "thumbnail": null,
        "has_snapshot": true,
        "has_clip": true,
        "stationary": false
    }
}

DoubleTake Facial Recognition

Docker Compose

yaml

services:
  doubletake:
    image: jakowenko/double-take:latest
    container_name: doubletake
    volumes:
      - doubletake_data:/.storage
    ports:
      - "3000:3000"
    environment:
      DETECTOR: compreface
      COMPREFACE_URL: http://compreface:8000
      COMPREFACE_API_KEY: ${COMPREFACE_API_KEY}
      FRIGATE_URL: http://frigate:5000
      MQTT_HOST: mosquitto
      MQTT_USERNAME: doubletake
      MQTT_PASSWORD: ${MQTT_PASSWORD}

Configuration

yaml

# doubletake/config.yml
mqtt:
  host: mosquitto
  username: doubletake
  password: ${MQTT_PASSWORD}
  topics:
    frigate: frigate/events
    matches: double-take/matches/+
    cameras: double-take/cameras/+

frigate:
  url: http://frigate:5000
  update_sub_labels: true

detectors:
  compreface:
    url: http://compreface:8000
    key: ${COMPREFACE_API_KEY}
    detect:
      min_face_size: 50
    recognize:
      threshold: 0.6

notify:
  gotify:
    url: http://gotify:80
    token: ${GOTIFY_TOKEN}

time:
  timezone: America/Chicago

CompreFace

Docker Compose

yaml

services:
  compreface-postgres:
    image: postgres:14
    container_name: compreface-db
    environment:
      POSTGRES_USER: compreface
      POSTGRES_PASSWORD: ${COMPREFACE_DB_PASSWORD}
      POSTGRES_DB: frs
    volumes:
      - compreface_db:/var/lib/postgresql/data

  compreface-admin:
    image: exadel/compreface-admin:latest
    container_name: compreface-admin
    environment:
      POSTGRES_USER: compreface
      POSTGRES_PASSWORD: ${COMPREFACE_DB_PASSWORD}
      POSTGRES_URL: jdbc:postgresql://compreface-postgres:5432/frs
      SPRING_PROFILES_ACTIVE: dev
    ports:
      - "8000:8000"
    depends_on:
      - compreface-postgres

  compreface-api:
    image: exadel/compreface-api:latest
    container_name: compreface-api
    environment:
      POSTGRES_USER: compreface
      POSTGRES_PASSWORD: ${COMPREFACE_DB_PASSWORD}
      POSTGRES_URL: jdbc:postgresql://compreface-postgres:5432/frs
      SPRING_PROFILES_ACTIVE: dev
    depends_on:
      - compreface-postgres

  compreface-core:
    image: exadel/compreface-core:latest
    container_name: compreface-core
    environment:
      ML_PORT: 3000

Wyoming Voice Pipeline

Whisper (Speech-to-Text)

yaml

services:
  wyoming-whisper:
    image: rhasspy/wyoming-whisper:latest
    container_name: wyoming-whisper
    command: >
      --model base.en
      --language en
      --device cuda
      --compute-type float16
    runtime: nvidia  # Use ROCm for AMD
    ports:
      - "10300:10300"
    volumes:
      - whisper_data:/data
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia  # Change to amd for ROCm
              count: 1
              capabilities: [gpu]
        limits:
          memory: 4G

Piper (Text-to-Speech)

yaml

services:
  wyoming-piper:
    image: rhasspy/wyoming-piper:latest
    container_name: wyoming-piper
    command: >
      --voice en_US-lessac-medium
      --speaker 0
      --length-scale 1.0
      --noise-scale 0.667
      --noise-w 0.8
    ports:
      - "10200:10200"
    volumes:
      - piper_data:/data
    deploy:
      resources:
        limits:
          memory: 1G

OpenWakeWord

yaml

services:
  wyoming-openwakeword:
    image: rhasspy/wyoming-openwakeword:latest
    container_name: wyoming-openwakeword
    command: >
      --preload-model 'hey_jarvis'
      --preload-model 'ok_nabu'
      --threshold 0.5
      --trigger-level 1
    ports:
      - "10400:10400"
    volumes:
      - openwakeword_data:/data

Willow Voice Satellite

For ESP32-S3 based voice satellites:

yaml

# Home Assistant configuration for Willow
wyoming:
  - name: "Living Room Satellite"
    host: 192.168.1.50
    port: 10500

assist_pipeline:
  - name: ahling_voice
    language: en
    conversation_engine: conversation.ollama
    stt_engine: stt.wyoming_whisper
    tts_engine: tts.wyoming_piper
    wake_word_entity: wake_word.openwakeword

Voice Pipeline Integration

Full Pipeline Docker Compose

yaml

version: "3.8"

services:
  wyoming-whisper:
    image: rhasspy/wyoming-whisper:latest
    container_name: wyoming-whisper
    command: --model base.en --language en
    ports:
      - "10300:10300"
    volumes:
      - whisper_data:/data
    deploy:
      resources:
        limits:
          memory: 4G

  wyoming-piper:
    image: rhasspy/wyoming-piper:latest
    container_name: wyoming-piper
    command: --voice en_US-lessac-medium
    ports:
      - "10200:10200"
    volumes:
      - piper_data:/data

  wyoming-openwakeword:
    image: rhasspy/wyoming-openwakeword:latest
    container_name: wyoming-openwakeword
    command: --preload-model 'hey_jarvis' --threshold 0.5
    ports:
      - "10400:10400"

volumes:
  whisper_data:
  piper_data:

Home Assistant Voice Configuration

yaml

# Home Assistant configuration.yaml
wyoming:

# Add Wyoming integrations via UI or configuration:
# - Whisper: host=wyoming-whisper, port=10300
# - Piper: host=wyoming-piper, port=10200
# - OpenWakeWord: host=wyoming-openwakeword, port=10400

assist_pipeline:
  - name: ahling_voice_pipeline
    language: en
    conversation_engine: conversation.ollama_ahling
    stt_engine: stt.faster_whisper
    tts_engine: tts.piper
    wake_word_entity: wake_word.openwakeword

# Ollama conversation agent
conversation:
  - platform: ollama
    name: ollama_ahling
    url: http://ollama:11434
    model: ahling-home
    max_history: 10
    template: |
      You are the Ahling Command Center voice assistant.
      Control smart home devices and answer questions.
      Be concise - responses will be spoken aloud.

Resource Allocation

VRAM Budget (RX 7900 XTX - 24GB)

Component	VRAM	Priority
Frigate (OpenVINO)	2GB	High
Whisper (base.en)	1GB	High
Piper	0.5GB	Medium
CompreFace	1GB	Low
Total Perception	4.5GB	-

Memory Budget (61GB RAM)

Component	RAM	Notes
Frigate	4GB	+ 256MB shm
DoubleTake	1GB
CompreFace	2GB
Whisper	4GB
Piper	1GB
Total	12GB

Event Flow

code

Camera (RTSP) ──▶ Frigate ──▶ MQTT ──▶ Home Assistant
                    │                        │
                    ▼                        ▼
              DoubleTake ──▶ Notification  Automation
                    │
                    ▼
              CompreFace ──▶ Face Match ──▶ Presence

Microphone ──▶ OpenWakeWord ──▶ Whisper ──▶ Ollama ──▶ HA
                                              │
                                              ▼
                                           Piper ──▶ Speaker

Best Practices

•Frigate: Use sub-streams for detection, main stream for recording
•Whisper: Use base.en model for best speed/accuracy balance
•Piper: Pre-cache commonly used phrases
•CompreFace: Train with multiple angles per person
•MQTT: Use QoS 1 for important events
•Memory: Monitor GPU memory with rocm-smi

Troubleshooting

Frigate GPU Detection

bash

# Check VA-API support
vainfo

# Check Frigate logs
docker logs frigate 2>&1 | grep -i gpu

Whisper Performance

bash

# Test Whisper directly
echo "test" | nc -v wyoming-whisper 10300

# Check Wyoming status
curl http://homeassistant:8123/api/wyoming/info

Related Skills

•[[home-assistant-brain]] - HA integration
•[[ollama-mastery]] - Voice intent processing
•[[microsoft-agents]] - Multi-agent voice workflows

Perception Pipeline Skill

Trigger Phrases

Perception Architecture

Frigate NVR

Hardware Acceleration (AMD ROCm)

Docker Compose

MQTT Event Integration

DoubleTake Facial Recognition

Docker Compose

Configuration

CompreFace

Docker Compose

Wyoming Voice Pipeline

Whisper (Speech-to-Text)

Piper (Text-to-Speech)

OpenWakeWord

Willow Voice Satellite

Voice Pipeline Integration

Full Pipeline Docker Compose

Home Assistant Voice Configuration

Resource Allocation

VRAM Budget (RX 7900 XTX - 24GB)

Memory Budget (61GB RAM)

Event Flow

Best Practices

Troubleshooting

Frigate GPU Detection

Whisper Performance

Related Skills

References