AgentSkillsCN

speech-to-text

使用 Groq 或 OpenAI Whisper 将音频文件转录为文本。

SKILL.md
--- frontmatter
name: speech-to-text
description: Transcribe audio files to text using Groq or OpenAI Whisper.
metadata:
  cyberagent:
    tool: speech-to-text
    subcommand: run
    timeout_class: long
input_schema:
  type: object
  properties:
    file:
      type: string
    provider:
      type: string
    model:
      type: string
    language:
      type: string
    response_format:
      type: string
    fallback_provider:
      type: string
    fallback_model:
      type: string
output_schema:
  type: object
  properties:
    text:
      type: string
    segments:
      type: array
    provider:
      type: string
    model:
      type: string
    language:
      type: string

Use this skill to transcribe audio files into text. Provide a local file path and optionally specify a provider, model, and language. When a provider fails, a fallback provider can be used if configured.

Examples:

  • Transcribe with Groq (default): {"file": "/tmp/voice.wav"}
  • Transcribe with OpenAI: {"file": "/tmp/voice.wav", "provider": "openai", "model": "whisper-1"}
  • Include verbose segments: {"file": "/tmp/voice.wav", "response_format": "verbose_json"}