AgentSkillsCN

ai-mcp-integration

RawDrive的AI与MCP(模型上下文协议)集成模式。适用于与ai-service合作、实现AI功能,或构建MCP工具时使用。

SKILL.md
--- frontmatter
name: ai-mcp-integration
aliases: [ai, mcp, llm, agents, face-detection, smart-tagging, embeddings, gemini]
description: AI and MCP (Model Context Protocol) integration patterns for RawDrive. Use when working with the ai-service, implementing AI features, or building MCP tools.

AI & MCP Integration Guidelines

Overview

RawDrive's AI Service provides:

  • Face Recognition: Detect faces, generate embeddings, cluster people
  • Photo Curation: Quality scoring, auto-selection
  • Semantic Search: Natural language search using CLIP
  • Photo Analysis: AI-generated captions, tags, scene detection

The service uses configurable LLM providers via environment variables. Never hardcode provider names or API keys.

Key Files

PurposeLocation
FastAPI entryservices/ai-service/src/main.py
MCP serverservices/ai-service/src/mcp/server.py
Face detectionbackend/src/app/services/face_detection_service.py
AI providersbackend/src/app/services/ai_providers/
Content detectionbackend/src/app/services/content_detection_service.py

MCP Tool Pattern

python
# services/ai-service/src/mcp/server.py
from typing import Annotated
from fastmcp import FastMCP

mcp_server = FastMCP(name="rawdrive-ai", version="1.0.0")

@mcp_server.tool()
async def detect_faces(
    photo_id: Annotated[str, "Photo ID in database"],
    workspace_id: Annotated[str, "Workspace ID for isolation"],
    detect_attributes: Annotated[bool, "Detect age/gender"] = True,
) -> dict:
    """
    Detect faces in a photo and generate embeddings.
    Returns bounding boxes, 512-dim embeddings, and attributes.
    """
    service = FaceRecognitionService()
    result = await service.detect_faces(photo_id, workspace_id)
    return {
        "photo_id": result.photo_id,
        "face_count": len(result.faces),
        "faces": [f.to_dict() for f in result.faces],
    }

LLM Integration

python
# backend/src/app/services/ai_providers/base.py

class AIProvider:
    """Base class for AI providers - configured via environment."""

    def __init__(self):
        # Load from env - NEVER hardcode
        self.provider = os.environ.get("AI_PROVIDER")
        self.api_key = os.environ.get("AI_API_KEY")
        self.model = os.environ.get("AI_MODEL")

    async def analyze_photo(self, image: bytes, asset_id: str) -> dict:
        """Analyze photo using configured provider."""
        raise NotImplementedError

Face Recognition

python
# backend/src/app/services/face_detection_service.py

class FaceDetectionService:
    """Multi-provider face detection with failover."""

    async def detect_faces(
        self,
        workspace_id: UUID,
        asset_id: UUID,
    ) -> list[DetectedFace]:
        """
        Detect faces using provider chain:
        1. Cloud Vision (primary)
        2. Gemini (fallback)
        3. Local DeepFace (last resort)
        """
        for provider in self._providers:
            try:
                return await provider.detect(workspace_id, asset_id)
            except ProviderError:
                continue
        raise AllProvidersFailedError()

Semantic Search

python
# Uses CLIP embeddings stored in pgvector
async def search(query: str, workspace_id: UUID) -> list[SearchResult]:
    # Encode query
    embedding = clip_model.encode(query)

    # Search pgvector
    results = await db.execute("""
        SELECT asset_id, 1 - (embedding <=> $1::vector) as score
        FROM asset_embeddings
        WHERE workspace_id = $2
        ORDER BY embedding <=> $1::vector
        LIMIT 20
    """, [embedding, workspace_id])

    return results

Frontend Integration

typescript
// frontend/src/services/aiService.ts

class AIService {
  async analyzePhoto(assetId: string): Promise<PhotoAnalysis> {
    // Proxy through backend - never expose API keys to frontend
    return api.post('/api/v1/ai/analyze', { assetId });
  }

  async searchPhotos(query: string, galleryId?: string): Promise<SearchResult[]> {
    return api.post('/api/v1/ai/search', { query, galleryId });
  }

  async detectFaces(assetId: string): Promise<FaceResult[]> {
    return api.post('/api/v1/ai/faces/detect', { assetId });
  }
}

Cost Optimization

python
# Cache by image hash to avoid reprocessing
async def analyze_with_cache(image_hash: str, image: bytes):
    cache_key = f"ai:analysis:{image_hash}"
    if cached := await redis.get(cache_key):
        return json.loads(cached)

    result = await ai_provider.analyze(image)
    await redis.setex(cache_key, 86400, json.dumps(result))
    return result

# Use local models for embeddings (cheaper)
# Use LLM for complex understanding (captions, quality)

Best Practices

Do's

  • Cache AI results by image hash
  • Include workspace_id in all MCP tools
  • Use local models for embeddings
  • Batch process when possible
  • Return processing_time_ms for monitoring

Don'ts

  • Don't expose raw AI responses to frontend
  • Don't hardcode API keys or provider names
  • Don't skip workspace isolation in MCP tools
  • Don't process images synchronously in API handlers