Ollama Skill
Comprehensive assistance with Ollama development - the local AI model runtime for running and interacting with large language models programmatically.
When to Use This Skill
This skill should be triggered when:
- •Running local AI models with Ollama
- •Building applications that interact with Ollama's API
- •Implementing chat completions, embeddings, or streaming responses
- •Setting up Ollama authentication or cloud models
- •Configuring Ollama server (environment variables, ports, proxies)
- •Using Ollama with OpenAI-compatible libraries
- •Troubleshooting Ollama installations or GPU compatibility
- •Implementing tool calling, structured outputs, or vision capabilities
- •Working with Ollama in Docker or behind proxies
- •Creating, copying, pushing, or managing Ollama models
Quick Reference
Ollama runs as a docker container. Every interaction with the ollama CLI has to be made via docker exec. For example, to list all models, you can run:
docker exec -it ollama ollama list
Use the following shell aliases to simplify interactions with the Ollama container:
# Alias to pull the latest Ollama image and run the container
alias ollama-start='docker image ls --format "{{.Repository}}:{{.Tag}}" | grep ollama | head -n 1 | xargs -I {} docker run -d -v ~/.ollama:/root/.ollama -p 11434:11434 --name ollama {}'
# Alias to stop the Ollama container
alias ollama-stop='docker stop ollama && docker rm ollama > /dev/null'
# Alias to exec into the running Ollama container
alias ollama-shell='docker exec -it ollama /bin/bash'
# Alias to make it feel like ollama is running without container
alias ollama='docker exec -it ollama ollama "$@"'
1. Basic Chat Completion (cURL)
Generate a simple chat response:
curl http://localhost:11434/api/chat -d '{
"model": "gemma3",
"messages": [
{
"role": "user",
"content": "Why is the sky blue?"
}
]
}'
2. Simple Text Generation (cURL)
Generate a text response from a prompt:
curl http://localhost:11434/api/generate -d '{
"model": "gemma3",
"prompt": "Why is the sky blue?"
}'
3. Python Chat with OpenAI Library
Use Ollama with the OpenAI Python library:
from openai import OpenAI
client = OpenAI(
base_url='http://localhost:11434/v1/',
api_key='ollama', # required but ignored
)
chat_completion = client.chat.completions.create(
messages=[
{
'role': 'user',
'content': 'Say this is a test',
}
],
model='llama3.2',
)
4. Vision Model (Image Analysis)
Ask questions about images:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:11434/v1/", api_key="ollama")
response = client.chat.completions.create(
model="llava",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": "data:image/png;base64,iVBORw0KG...",
},
],
}
],
max_tokens=300,
)
5. Generate Embeddings
Create vector embeddings for text:
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
embeddings = client.embeddings.create(
model="all-minilm",
input=["why is the sky blue?", "why is the grass green?"],
)
6. Structured Outputs (JSON Schema)
Get structured JSON responses:
from pydantic import BaseModel
from openai import OpenAI
client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")
class FriendInfo(BaseModel):
name: str
age: int
is_available: bool
class FriendList(BaseModel):
friends: list[FriendInfo]
completion = client.beta.chat.completions.parse(
temperature=0,
model="llama3.1:8b",
messages=[
{"role": "user", "content": "Return a list of friends in JSON format"}
],
response_format=FriendList,
)
friends_response = completion.choices[0].message
if friends_response.parsed:
print(friends_response.parsed)
7. JavaScript/TypeScript Chat
Use Ollama with the OpenAI JavaScript library:
import OpenAI from "openai";
const openai = new OpenAI({
baseURL: "http://localhost:11434/v1/",
apiKey: "ollama", // required but ignored
});
const chatCompletion = await openai.chat.completions.create({
messages: [{ role: "user", content: "Say this is a test" }],
model: "llama3.2",
});
8. Authentication for Cloud Models
Sign in to use cloud models:
# Sign in from CLI ollama signin # Then use cloud models ollama run gpt-oss:120b-cloud
Or use API keys for direct cloud access:
export OLLAMA_API_KEY=your_api_key
curl https://ollama.com/api/generate \
-H "Authorization: Bearer $OLLAMA_API_KEY" \
-d '{
"model": "gpt-oss:120b",
"prompt": "Why is the sky blue?",
"stream": false
}'
9. Configure Ollama Server
Set environment variables for server configuration:
macOS:
# Set environment variable launchctl setenv OLLAMA_HOST "0.0.0.0:11434" # Restart Ollama application
Linux (systemd):
# Edit service systemctl edit ollama.service # Add under [Service] Environment="OLLAMA_HOST=0.0.0.0:11434" # Reload and restart systemctl daemon-reload systemctl restart ollama
Windows:
1. Quit Ollama from task bar 2. Search "environment variables" in Settings 3. Edit or create OLLAMA_HOST variable 4. Set value: 0.0.0.0:11434 5. Restart Ollama from Start menu
10. Check Model GPU Loading
Verify if your model is using GPU:
ollama ps
Output shows:
- •
100% GPU- Fully loaded on GPU - •
100% CPU- Fully loaded in system memory - •
48%/52% CPU/GPU- Split between both
Key Concepts
Base URLs
- •Local API (default):
http://localhost:11434/api - •Cloud API:
https://ollama.com/api - •OpenAI Compatible:
/v1/endpoints for OpenAI libraries
Authentication
- •Local: No authentication required for
http://localhost:11434 - •Cloud Models: Requires signing in (
ollama signin) or API key - •API Keys: For programmatic access to
https://ollama.com/api
Models
- •Local Models: Run on your machine (e.g.,
gemma3,llama3.2,qwen3) - •Cloud Models: Suffix
-cloud(e.g.,gpt-oss:120b-cloud,qwen3-coder:480b-cloud) - •Vision Models: Support image inputs (e.g.,
llava)
Common Environment Variables
- •
OLLAMA_HOST- Change bind address (default:127.0.0.1:11434) - •
OLLAMA_CONTEXT_LENGTH- Context window size (default:2048tokens) - •
OLLAMA_MODELS- Model storage directory - •
OLLAMA_ORIGINS- Allow additional web origins for CORS - •
HTTPS_PROXY- Proxy server for model downloads
Error Handling
Status Codes:
- •
200- Success - •
400- Bad Request (invalid parameters) - •
404- Not Found (model doesn't exist) - •
429- Too Many Requests (rate limit) - •
500- Internal Server Error - •
502- Bad Gateway (cloud model unreachable)
Error Format:
{
"error": "the model failed to generate a response"
}
Streaming vs Non-Streaming
- •Streaming (default): Returns response chunks as JSON objects (NDJSON)
- •Non-Streaming: Set
"stream": falseto get complete response in one object
Reference Files
This skill includes comprehensive documentation in references/:
- •
llms-txt.md - Complete API reference covering:
- •All API endpoints (
/api/generate,/api/chat,/api/embed, etc.) - •Authentication methods (signin, API keys)
- •Error handling and status codes
- •OpenAI compatibility layer
- •Cloud models usage
- •Streaming responses
- •Configuration and environment variables
- •All API endpoints (
- •
llms.md - Documentation index listing all available topics:
- •API reference (version, model details, chat, generate, embeddings)
- •Capabilities (embeddings, streaming, structured outputs, tool calling, vision)
- •CLI reference
- •Cloud integration
- •Platform-specific guides (Linux, macOS, Windows, Docker)
- •IDE integrations (VS Code, JetBrains, Xcode, Zed, Cline)
Use the reference files when you need:
- •Detailed API parameter specifications
- •Complete endpoint documentation
- •Advanced configuration options
- •Platform-specific setup instructions
- •Integration guides for specific tools
Working with This Skill
For Beginners
Start with these common patterns:
- •Simple generation: Use
/api/generateendpoint with a prompt - •Chat interface: Use
/api/chatwith messages array - •OpenAI compatibility: Use OpenAI libraries with
base_url='http://localhost:11434/v1/' - •Check GPU usage: Run
ollama psto verify model loading
Read llms-txt.md section on "Introduction" and "Quickstart" for foundational concepts.
For Intermediate Users
Focus on:
- •Embeddings for semantic search and RAG applications
- •Structured outputs with JSON schema validation
- •Vision models for image analysis
- •Streaming for real-time response generation
- •Authentication for cloud models
Check the specific API endpoints in llms-txt.md for detailed parameter options.
For Advanced Users
Explore:
- •Tool calling for function execution
- •Custom model creation with Modelfiles
- •Server configuration with environment variables
- •Proxy setup for network-restricted environments
- •Docker deployment with custom configurations
- •Performance optimization with GPU settings
Refer to platform-specific sections in llms.md and configuration details in llms-txt.md.
Common Use Cases
Building a chatbot:
- •Use
/api/chatendpoint - •Maintain message history in your application
- •Stream responses for better UX
- •Handle errors gracefully
Creating embeddings for search:
- •Use
/api/embedendpoint - •Store embeddings in vector database
- •Perform similarity search
- •Implement RAG (Retrieval Augmented Generation)
Running behind a firewall:
- •Set
HTTPS_PROXYenvironment variable - •Configure proxy in Docker if containerized
- •Ensure certificates are trusted
Using cloud models:
- •Run
ollama signinonce - •Pull cloud models with
-cloudsuffix - •Use same API endpoints as local models
Troubleshooting
Model Not Loading on GPU
Check:
ollama ps
Solutions:
- •Verify GPU compatibility in documentation
- •Check CUDA/ROCm installation
- •Review available VRAM
- •Try smaller model variants
Cannot Access Ollama Remotely
Problem: Ollama only accessible from localhost
Solution:
# Set OLLAMA_HOST to bind to all interfaces export OLLAMA_HOST="0.0.0.0:11434"
See "How do I configure Ollama server?" in llms-txt.md for platform-specific instructions.
Proxy Issues
Problem: Cannot download models behind proxy
Solution:
# Set proxy (HTTPS only, not HTTP) export HTTPS_PROXY=https://proxy.example.com # Restart Ollama
See "How do I use Ollama behind a proxy?" in llms-txt.md.
CORS Errors in Browser
Problem: Browser extension or web app cannot access Ollama
Solution:
# Allow specific origins export OLLAMA_ORIGINS="chrome-extension://*,moz-extension://*"
See "How can I allow additional web origins?" in llms-txt.md.
Resources
Official Documentation
- •Main docs: https://docs.ollama.com
- •API Reference: https://docs.ollama.com/api
- •Model Library: https://ollama.com/models
Official Libraries
- •Python: https://github.com/ollama/ollama-python
- •JavaScript: https://github.com/ollama/ollama-js
Community
- •GitHub: https://github.com/ollama/ollama
- •Community Libraries: See GitHub README for full list
Notes
- •This skill was generated from official Ollama documentation
- •All examples are tested and working with Ollama's API
- •Code samples include proper language detection for syntax highlighting
- •Reference files preserve structure from official docs with working links
- •OpenAI compatibility means most OpenAI code works with minimal changes
Quick Command Reference
# CLI Commands ollama signin # Sign in to ollama.com ollama run gemma3 # Run a model interactively ollama pull gemma3 # Download a model ollama ps # List running models ollama list # List installed models # Check API Status curl http://localhost:11434/api/version # Environment Variables (Common) export OLLAMA_HOST="0.0.0.0:11434" export OLLAMA_CONTEXT_LENGTH=8192 export OLLAMA_ORIGINS="*" export HTTPS_PROXY="https://proxy.example.com"