百度语音识别 - 音频转文本

概述

本 skill 用于将用户发送的语音消息（ogg/opus 格式）转换为文本。使用百度语音识别 API，专为国内服务器 + 代理环境优化。

脚本	路径	用途
`ogg_to_text.sh`	`/root/.openclaw/workspace/scripts/ogg_to_text.sh`	推荐 - 简单易用
`speech_to_text.sh`	`/root/.openclaw/workspace/scripts/speech_to_text.sh`	完整功能版
`baidu_speech_to_text.py`	`/root/.openclaw/workspace/scripts/baidu_speech_to_text.py`	Python 主脚本

bash

/root/.openclaw/workspace/scripts/ogg_to_text.sh <音频文件路径>

bash

# 转换用户语音消息
/root/.openclaw/workspace/scripts/ogg_to_text.sh /root/.openclaw/media/inbound/xxxxx.ogg

bash

/root/.openclaw/workspace/scripts/speech_to_text.sh <音频文件> --pro

用户通过 WhatsApp/Discord 发送的语音消息保存在：

code

/root/.openclaw/media/inbound/

文件格式通常为 .ogg（Opus 编码），脚本会自动转换为 PCM 格式。

当用户发送语音消息并请求转文本时：

•执行转换：

bash

/root/.openclaw/workspace/scripts/ogg_to_text.sh /root/.openclaw/media/inbound/<文件名>.ogg

使用环境变量提供百度 API 账号信息（请勿写入仓库）：

code

export BAIDU_APP_ID="your_app_id"
export BAIDU_API_KEY="your_api_key"
export BAIDU_SECRET_KEY="your_secret_key"

端点：

错误	原因	解决方案
SSL 错误	代理影响	确保使用 wrapper 脚本（.sh），不要直接调用 Python
识别结果为空	静音或无语音	告知用户音频可能没有语音内容
3301 错误	音频质量差	请用户重新录制
3303 错误	语音过长	需要分段处理