Ollama Client Skill
Overview
This skill provides a Python wrapper for interacting with Ollama's REST API to generate text completions using the Phi-4 model (14B parameters, 16K context window). It handles timeouts, retries, and structured logging for all LLM operations.
When to Use
Use this skill when you need to:
- •Generate text completions from Phi-4
- •Run prompts for clinical analysis tasks
- •Generate JSON-structured outputs from LLM
- •Handle LLM inference with timeout protection
Installation
IMPORTANT: This skill has its own isolated virtual environment (.venv) managed by uv. Do NOT use system Python.
Initialize the skill's environment:
# From the skill directory cd .agent/skills/ollama-client uv sync # Creates .venv and installs dependencies from pyproject.toml
Dependencies are in pyproject.toml:
- •
requests- HTTP client for Ollama API
Usage
CRITICAL: Always use uv run to execute code with this skill's .venv, NOT system Python.
Basic Text Generation
# From .agent/skills/ollama-client/ directory
# Run with: uv run python -c "..."
from ollama_client import OllamaClient
# Initialize client
client = OllamaClient(
host="http://localhost:11434", # Default from OLLAMA_HOST env var
model="phi4:14b", # Default from OLLAMA_MODEL env var
timeout=300 # 5 minutes default
)
# Generate completion
result = client.generate(
prompt="Summarize the following clinical note: ...",
temperature=0.1, # Low temperature for deterministic outputs
max_tokens=1000, # Optional token limit
stop_sequences=["END"] # Optional stop sequences
)
print(result["response"])
print(f"Execution time: {result['execution_time_ms']}ms")
With Environment Variables
import os # Set in .env or docker-compose.yml os.environ['OLLAMA_HOST'] = 'http://localhost:11434' os.environ['OLLAMA_MODEL'] = 'phi4:14b' # Client uses env vars automatically client = OllamaClient()
Using from Another Module
When importing this skill from agents or other code:
import sys from pathlib import Path # Add skill to path (use relative path from your location) skill_path = Path(__file__).parent.parent.parent / ".agent/skills/ollama-client" sys.path.insert(0, str(skill_path)) from ollama_client import OllamaClient client = OllamaClient()
Health Check
# Check if Ollama server is accessible
if client.is_available():
print("Ollama server is healthy")
else:
print("Ollama server unavailable")
Configuration
Environment Variables:
- •
OLLAMA_HOST: Server URL (default:http://localhost:11434) - •
OLLAMA_MODEL: Model name (default:phi4:14b)
Parameters:
- •
temperature: Sampling temperature (0.0-1.0, default: 0.1 for deterministic outputs) - •
max_tokens: Maximum tokens to generate (optional) - •
stop_sequences: List of strings to stop generation (optional) - •
timeout: Request timeout in seconds (default: 300)
Error Handling
The skill raises exceptions for:
- •Timeout: If request exceeds timeout duration
- •Connection Error: If Ollama server is unreachable
- •API Error: If Ollama returns an error response
All errors include execution time for debugging.
Best Practices
- •Low Temperature: Use
temperature=0.1for clinical tasks requiring consistency - •Timeouts: Set appropriate timeouts based on prompt complexity (simple: 60s, complex: 300s)
- •Health Checks: Verify server availability before critical operations
- •Error Logging: Always log errors with execution time for troubleshooting
Integration with Agents
Agents use this skill for all LLM operations:
- •ToC Subagent: Section topic segmentation
- •Summary Subagent: Clinical entity extraction
- •Recommendation Subagent: Treatment plan generation
- •Evaluator Agent: Quality validation reasoning
Implementation
See ollama_client.py for the full Python implementation.