AgentSkillsCN

Media

媒体

SKILL.md

Media Processing Skill

Slice: slices/media/ Type: Audio/Video Processing

Purpose

Video and audio processing workflows. Use this skill when:

  • YouTube videos need transcription
  • Audio files need speech-to-text
  • Media needs embedding for search
  • Content summarization is required

Quick Start

python
from slices.media import MediaService, MediaTask, MediaType

service = MediaService(http_client=client)
task = MediaTask(
    source_url="https://youtube.com/watch?v=example",
    media_type=MediaType.VIDEO,
    operations=["transcribe", "summarize"],
)
result = await service.execute(task)
print(result.transcript)

API Reference

MediaService

MethodDescriptionReturns
execute(task)Execute media processingMediaResult

Operations

OperationDescriptionOutput
transcribeSpeech-to-text via Whispertranscript text
summarizeLLM-based summarizationsummary text
embedGenerate vector embeddingsfloat array

Media Types

  • VIDEO: YouTube, MP4, WebM
  • AUDIO: MP3, WAV, M4A
  • IMAGE: Screenshots, thumbnails
  • DOCUMENT: PDFs with embedded media

Processing Pipeline

code
Source URL
    ↓
[Ingestion] → Download/extract media
    ↓
[Transcription] → Whisper STT
    ↓
[Analysis] → LLM summarization
    ↓
[Embedding] → TensorZero vectors
    ↓
[Storage] → MinIO artifacts

Integration Points

ServiceURLPurpose
PMOVES.YTlocalhost:8089YouTube ingestion
FFmpeg-Whisperlocalhost:9000Transcription
TensorZerolocalhost:3030Embeddings
MinIOlocalhost:9001Artifact storage

Example: YouTube Processing

python
# Full YouTube video processing
task = MediaTask(
    source_url="https://youtube.com/watch?v=dQw4w9WgXcQ",
    media_type=MediaType.VIDEO,
    operations=["transcribe", "summarize", "embed"],
    language="en",
)
result = await service.execute(task)

# Access results
print(f"Transcript: {result.transcript[:200]}...")
print(f"Summary: {result.summary}")
print(f"Artifacts: {[a.type for a in result.artifacts]}")