Ollama Skill

Comprehensive assistance with Ollama development - the local AI model runtime for running and interacting with large language models programmatically.

When to Use This Skill

This skill should be triggered when:

•Running local AI models with Ollama
•Building applications that interact with Ollama's API
•Implementing chat completions, embeddings, or streaming responses
•Setting up Ollama authentication or cloud models
•Configuring Ollama server (environment variables, ports, proxies)
•Using Ollama with OpenAI-compatible libraries
•Troubleshooting Ollama installations or GPU compatibility
•Implementing tool calling, structured outputs, or vision capabilities
•Working with Ollama in Docker or behind proxies
•Creating, copying, pushing, or managing Ollama models

Quick Reference

Ollama runs as a docker container. Every interaction with the ollama CLI has to be made via docker exec. For example, to list all models, you can run:

bash

docker exec -it ollama ollama list

Use the following shell aliases to simplify interactions with the Ollama container:

bash

# Alias to pull the latest Ollama image and run the container
alias ollama-start='docker image ls --format "{{.Repository}}:{{.Tag}}" | grep ollama | head -n 1 | xargs -I {} docker run -d -v ~/.ollama:/root/.ollama -p 11434:11434 --name ollama {}'

# Alias to stop the Ollama container
alias ollama-stop='docker stop ollama && docker rm ollama > /dev/null'

# Alias to exec into the running Ollama container
alias ollama-shell='docker exec -it ollama /bin/bash'

# Alias to make it feel like ollama is running without container
alias ollama='docker exec -it ollama ollama "$@"'

1. Basic Chat Completion (cURL)

Generate a simple chat response:

bash

curl http://localhost:11434/api/chat -d '{
  "model": "gemma3",
  "messages": [
    {
      "role": "user",
      "content": "Why is the sky blue?"
    }
  ]
}'

2. Simple Text Generation (cURL)

Generate a text response from a prompt:

bash

curl http://localhost:11434/api/generate -d '{
  "model": "gemma3",
  "prompt": "Why is the sky blue?"
}'

3. Python Chat with OpenAI Library

Use Ollama with the OpenAI Python library:

python

from openai import OpenAI

client = OpenAI(
    base_url='http://localhost:11434/v1/',
    api_key='ollama',  # required but ignored
)

chat_completion = client.chat.completions.create(
    messages=[
        {
            'role': 'user',
            'content': 'Say this is a test',
        }
    ],
    model='llama3.2',
)

4. Vision Model (Image Analysis)

Ask questions about images:

python

from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1/", api_key="ollama")

response = client.chat.completions.create(
    model="llava",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": "data:image/png;base64,iVBORw0KG...",
                },
            ],
        }
    ],
    max_tokens=300,
)

5. Generate Embeddings

Create vector embeddings for text:

python

client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")

embeddings = client.embeddings.create(
    model="all-minilm",
    input=["why is the sky blue?", "why is the grass green?"],
)

6. Structured Outputs (JSON Schema)

Get structured JSON responses:

python

from pydantic import BaseModel
from openai import OpenAI

client = OpenAI(base_url="http://localhost:11434/v1", api_key="ollama")

class FriendInfo(BaseModel):
    name: str
    age: int
    is_available: bool

class FriendList(BaseModel):
    friends: list[FriendInfo]

completion = client.beta.chat.completions.parse(
    temperature=0,
    model="llama3.1:8b",
    messages=[
        {"role": "user", "content": "Return a list of friends in JSON format"}
    ],
    response_format=FriendList,
)

friends_response = completion.choices[0].message
if friends_response.parsed:
    print(friends_response.parsed)

7. JavaScript/TypeScript Chat

Use Ollama with the OpenAI JavaScript library:

javascript

import OpenAI from "openai";

const openai = new OpenAI({
  baseURL: "http://localhost:11434/v1/",
  apiKey: "ollama",  // required but ignored
});

const chatCompletion = await openai.chat.completions.create({
  messages: [{ role: "user", content: "Say this is a test" }],
  model: "llama3.2",
});

8. Authentication for Cloud Models

bash

# Sign in from CLI
ollama signin

# Then use cloud models
ollama run gpt-oss:120b-cloud

Or use API keys for direct cloud access:

bash

export OLLAMA_API_KEY=your_api_key

curl https://ollama.com/api/generate \
  -H "Authorization: Bearer $OLLAMA_API_KEY" \
  -d '{
    "model": "gpt-oss:120b",
    "prompt": "Why is the sky blue?",
    "stream": false
  }'

9. Configure Ollama Server

Set environment variables for server configuration:

macOS:

bash

# Set environment variable
launchctl setenv OLLAMA_HOST "0.0.0.0:11434"

# Restart Ollama application

Linux (systemd):

bash

# Edit service
systemctl edit ollama.service

# Add under [Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"

# Reload and restart
systemctl daemon-reload
systemctl restart ollama

Windows:

code

1. Quit Ollama from task bar
2. Search "environment variables" in Settings
3. Edit or create OLLAMA_HOST variable
4. Set value: 0.0.0.0:11434
5. Restart Ollama from Start menu

10. Check Model GPU Loading

Verify if your model is using GPU:

bash

ollama ps

Output shows:

•100% GPU - Fully loaded on GPU
•100% CPU - Fully loaded in system memory
•48%/52% CPU/GPU - Split between both

Key Concepts

Base URLs

•Local API (default): http://localhost:11434/api
•Cloud API: https://ollama.com/api
•OpenAI Compatible: /v1/ endpoints for OpenAI libraries

Authentication

•Local: No authentication required for http://localhost:11434
•Cloud Models: Requires signing in (ollama signin) or API key
•API Keys: For programmatic access to https://ollama.com/api

Models

•Local Models: Run on your machine (e.g., gemma3, llama3.2, qwen3)
•Cloud Models: Suffix -cloud (e.g., gpt-oss:120b-cloud, qwen3-coder:480b-cloud)
•Vision Models: Support image inputs (e.g., llava)

Common Environment Variables

•OLLAMA_HOST - Change bind address (default: 127.0.0.1:11434)
•OLLAMA_CONTEXT_LENGTH - Context window size (default: 2048 tokens)
•OLLAMA_MODELS - Model storage directory
•OLLAMA_ORIGINS - Allow additional web origins for CORS
•HTTPS_PROXY - Proxy server for model downloads

Error Handling

Status Codes:

•200 - Success
•400 - Bad Request (invalid parameters)
•404 - Not Found (model doesn't exist)
•429 - Too Many Requests (rate limit)
•500 - Internal Server Error
•502 - Bad Gateway (cloud model unreachable)

Error Format:

json

{
  "error": "the model failed to generate a response"
}

Streaming vs Non-Streaming

•Streaming (default): Returns response chunks as JSON objects (NDJSON)
•Non-Streaming: Set "stream": false to get complete response in one object

Reference Files

This skill includes comprehensive documentation in references/:

•
llms-txt.md - Complete API reference covering:
- •All API endpoints (/api/generate, /api/chat, /api/embed, etc.)
- •Authentication methods (signin, API keys)
- •Error handling and status codes
- •OpenAI compatibility layer
- •Cloud models usage
- •Streaming responses
- •Configuration and environment variables
•
llms.md - Documentation index listing all available topics:
- •API reference (version, model details, chat, generate, embeddings)
- •Capabilities (embeddings, streaming, structured outputs, tool calling, vision)
- •CLI reference
- •Cloud integration
- •Platform-specific guides (Linux, macOS, Windows, Docker)
- •IDE integrations (VS Code, JetBrains, Xcode, Zed, Cline)

Use the reference files when you need:

•Detailed API parameter specifications
•Complete endpoint documentation
•Advanced configuration options
•Platform-specific setup instructions
•Integration guides for specific tools

Working with This Skill

For Beginners

Start with these common patterns:

•Simple generation: Use /api/generate endpoint with a prompt
•Chat interface: Use /api/chat with messages array
•OpenAI compatibility: Use OpenAI libraries with base_url='http://localhost:11434/v1/'
•Check GPU usage: Run ollama ps to verify model loading

Read llms-txt.md section on "Introduction" and "Quickstart" for foundational concepts.

For Intermediate Users

Focus on:

•Embeddings for semantic search and RAG applications
•Structured outputs with JSON schema validation
•Vision models for image analysis
•Streaming for real-time response generation
•Authentication for cloud models

Check the specific API endpoints in llms-txt.md for detailed parameter options.

For Advanced Users

Explore:

•Tool calling for function execution
•Custom model creation with Modelfiles
•Server configuration with environment variables
•Proxy setup for network-restricted environments
•Docker deployment with custom configurations
•Performance optimization with GPU settings

Refer to platform-specific sections in llms.md and configuration details in llms-txt.md.

Common Use Cases

Building a chatbot:

•Use /api/chat endpoint
•Maintain message history in your application
•Stream responses for better UX
•Handle errors gracefully

Creating embeddings for search:

•Use /api/embed endpoint
•Store embeddings in vector database
•Perform similarity search
•Implement RAG (Retrieval Augmented Generation)

Running behind a firewall:

•Set HTTPS_PROXY environment variable
•Configure proxy in Docker if containerized
•Ensure certificates are trusted

Using cloud models:

•Run ollama signin once
•Pull cloud models with -cloud suffix
•Use same API endpoints as local models

Troubleshooting

Model Not Loading on GPU

Check:

bash

ollama ps

Solutions:

•Verify GPU compatibility in documentation
•Check CUDA/ROCm installation
•Review available VRAM
•Try smaller model variants

Cannot Access Ollama Remotely

Problem: Ollama only accessible from localhost

Solution:

bash

# Set OLLAMA_HOST to bind to all interfaces
export OLLAMA_HOST="0.0.0.0:11434"

See "How do I configure Ollama server?" in llms-txt.md for platform-specific instructions.

Proxy Issues

Problem: Cannot download models behind proxy

Solution:

bash

# Set proxy (HTTPS only, not HTTP)
export HTTPS_PROXY=https://proxy.example.com

# Restart Ollama

See "How do I use Ollama behind a proxy?" in llms-txt.md.

CORS Errors in Browser

Problem: Browser extension or web app cannot access Ollama

Solution:

bash

# Allow specific origins
export OLLAMA_ORIGINS="chrome-extension://*,moz-extension://*"

See "How can I allow additional web origins?" in llms-txt.md.

Resources

Official Documentation

•Main docs: https://docs.ollama.com
•API Reference: https://docs.ollama.com/api
•Model Library: https://ollama.com/models

Official Libraries

•Python: https://github.com/ollama/ollama-python
•JavaScript: https://github.com/ollama/ollama-js

Community

•GitHub: https://github.com/ollama/ollama
•Community Libraries: See GitHub README for full list

Notes

•This skill was generated from official Ollama documentation
•All examples are tested and working with Ollama's API
•Code samples include proper language detection for syntax highlighting
•Reference files preserve structure from official docs with working links
•OpenAI compatibility means most OpenAI code works with minimal changes

Quick Command Reference

bash

# CLI Commands
ollama signin                    # Sign in to ollama.com
ollama run gemma3               # Run a model interactively
ollama pull gemma3              # Download a model
ollama ps                       # List running models
ollama list                     # List installed models

# Check API Status
curl http://localhost:11434/api/version

# Environment Variables (Common)
export OLLAMA_HOST="0.0.0.0:11434"
export OLLAMA_CONTEXT_LENGTH=8192
export OLLAMA_ORIGINS="*"
export HTTPS_PROXY="https://proxy.example.com"