AgentSkillsCN

groq

【何】通过 Groq API(聊天)+ Ollama(嵌入)实现快速的大语言模型推理。 【如何】Groq 用于聊天补全(llama-3.3-70b-versatile),Ollama nomic-embed-text 用于嵌入。 【何时】当您需要快速推理、为 RAG 嵌入文本,或进行聊天补全时,此技能将大显身手。 【为何】Groq 提供最快的大语言模型推理速度;Ollama 则负责本地嵌入(Groq 本身并无嵌入 API)。 触发条件: “groq embed”、“groq chat”、“groq complete”、“用 Groq 嵌入”、“快速大语言模型”。

SKILL.md
--- frontmatter
name: groq
description: |
  [WHAT] Fast LLM inference via Groq API (chat) + Ollama (embeddings)
  [HOW] Groq for chat completions (llama-3.3-70b-versatile), Ollama nomic-embed-text for embeddings
  [WHEN] Need fast inference, embedding text for RAG, chat completions
  [WHY] Groq provides fastest LLM inference; Ollama handles local embeddings (Groq has no embedding API)

  Triggers: "groq embed", "groq chat", "groq complete", "embed with groq", "fast llm"

groq

Fast LLM inference via Groq API for chat, Ollama for embeddings.

Setup

Environment:

  • GROQ_API_KEY - Required for chat completions
  • Ollama running locally for embeddings (ollama serve)

Install dependencies:

bash
cd ~/.claude/skills/groq
pip install groq requests

Pull embedding model (first time):

bash
ollama pull nomic-embed-text

Usage

Chat Completion

bash
# Simple chat
./scripts/chat.py "Explain quantum computing in 2 sentences"

# With system prompt
./scripts/chat.py "Write a haiku" --system "You are a poet"

# Different model
./scripts/chat.py "Hello" --model llama-3.1-8b-instant

# JSON output
./scripts/chat.py "List 3 colors as JSON array" --json

Embeddings

bash
# Embed text (returns JSON array of floats)
./scripts/embed.sh "Hello world"

# Embed from stdin
echo "Some text to embed" | ./scripts/embed.sh

# Python direct
./scripts/embed.py "Hello world"

Models

Chat Models (Groq)

ModelContextSpeedUse Case
llama-3.3-70b-versatile128kFastDefault, general purpose
llama-3.1-8b-instant128kFastestSimple tasks
llama3-70b-81928kFastLegacy
gemma2-9b-it8kFastInstruction following

Embedding Model (Ollama)

ModelDimensionsNotes
nomic-embed-text768Local, fast, good quality

Output Format

Chat

Plain text response to stdout. Errors to stderr.

Embed

JSON array of floats:

json
[0.123, -0.456, 0.789, ...]

When to Use

ScenarioCommand
Quick question./scripts/chat.py "What is X?"
Code generation./scripts/chat.py "Write Python for Y"
Embed for RAG./scripts/embed.sh "document text"
Batch embedcat docs.txt | while read line; do ./scripts/embed.sh "$line"; done

Error Handling

  • Missing GROQ_API_KEY: Chat fails with clear error
  • Ollama not running: Embed falls back to error message
  • Rate limits: Groq has generous limits but will return 429 if exceeded

Related Skills

SkillUse When
oracleNeed GPT-5, Claude, multi-model comparison
lev-findUnified search with embeddings already indexed
brave-searchWeb search, not embeddings