AI & MCP Integration Guidelines

Name: ai-mcp-integration
Rating: 78
Author: veerababumanyam

Overview

RawDrive's AI Service provides:

•Face Recognition: Detect faces, generate embeddings, cluster people
•Photo Curation: Quality scoring, auto-selection
•Semantic Search: Natural language search using CLIP
•Photo Analysis: AI-generated captions, tags, scene detection

The service uses configurable LLM providers via environment variables. Never hardcode provider names or API keys.

Key Files

Purpose	Location
FastAPI entry	`services/ai-service/src/main.py`
MCP server	`services/ai-service/src/mcp/server.py`
Face detection	`backend/src/app/services/face_detection_service.py`
AI providers	`backend/src/app/services/ai_providers/`
Content detection	`backend/src/app/services/content_detection_service.py`

MCP Tool Pattern

python

# services/ai-service/src/mcp/server.py
from typing import Annotated
from fastmcp import FastMCP

mcp_server = FastMCP(name="rawdrive-ai", version="1.0.0")

@mcp_server.tool()
async def detect_faces(
    photo_id: Annotated[str, "Photo ID in database"],
    workspace_id: Annotated[str, "Workspace ID for isolation"],
    detect_attributes: Annotated[bool, "Detect age/gender"] = True,
) -> dict:
    """
    Detect faces in a photo and generate embeddings.
    Returns bounding boxes, 512-dim embeddings, and attributes.
    """
    service = FaceRecognitionService()
    result = await service.detect_faces(photo_id, workspace_id)
    return {
        "photo_id": result.photo_id,
        "face_count": len(result.faces),
        "faces": [f.to_dict() for f in result.faces],
    }

LLM Integration

python

# backend/src/app/services/ai_providers/base.py

class AIProvider:
    """Base class for AI providers - configured via environment."""

    def __init__(self):
        # Load from env - NEVER hardcode
        self.provider = os.environ.get("AI_PROVIDER")
        self.api_key = os.environ.get("AI_API_KEY")
        self.model = os.environ.get("AI_MODEL")

    async def analyze_photo(self, image: bytes, asset_id: str) -> dict:
        """Analyze photo using configured provider."""
        raise NotImplementedError

Face Recognition

python

# backend/src/app/services/face_detection_service.py

class FaceDetectionService:
    """Multi-provider face detection with failover."""

    async def detect_faces(
        self,
        workspace_id: UUID,
        asset_id: UUID,
    ) -> list[DetectedFace]:
        """
        Detect faces using provider chain:
        1. Cloud Vision (primary)
        2. Gemini (fallback)
        3. Local DeepFace (last resort)
        """
        for provider in self._providers:
            try:
                return await provider.detect(workspace_id, asset_id)
            except ProviderError:
                continue
        raise AllProvidersFailedError()

Semantic Search

python

# Uses CLIP embeddings stored in pgvector
async def search(query: str, workspace_id: UUID) -> list[SearchResult]:
    # Encode query
    embedding = clip_model.encode(query)

    # Search pgvector
    results = await db.execute("""
        SELECT asset_id, 1 - (embedding <=> $1::vector) as score
        FROM asset_embeddings
        WHERE workspace_id = $2
        ORDER BY embedding <=> $1::vector
        LIMIT 20
    """, [embedding, workspace_id])

    return results

Frontend Integration

typescript

// frontend/src/services/aiService.ts

class AIService {
  async analyzePhoto(assetId: string): Promise<PhotoAnalysis> {
    // Proxy through backend - never expose API keys to frontend
    return api.post('/api/v1/ai/analyze', { assetId });
  }

  async searchPhotos(query: string, galleryId?: string): Promise<SearchResult[]> {
    return api.post('/api/v1/ai/search', { query, galleryId });
  }

  async detectFaces(assetId: string): Promise<FaceResult[]> {
    return api.post('/api/v1/ai/faces/detect', { assetId });
  }
}

Cost Optimization

python

# Cache by image hash to avoid reprocessing
async def analyze_with_cache(image_hash: str, image: bytes):
    cache_key = f"ai:analysis:{image_hash}"
    if cached := await redis.get(cache_key):
        return json.loads(cached)

    result = await ai_provider.analyze(image)
    await redis.setex(cache_key, 86400, json.dumps(result))
    return result

# Use local models for embeddings (cheaper)
# Use LLM for complex understanding (captions, quality)

Best Practices

Do's

•Cache AI results by image hash
•Include workspace_id in all MCP tools
•Use local models for embeddings
•Batch process when possible
•Return processing_time_ms for monitoring

Don'ts

•Don't expose raw AI responses to frontend
•Don't hardcode API keys or provider names
•Don't skip workspace isolation in MCP tools
•Don't process images synchronously in API handlers