Ollama Integration
Integrate Ollama for local LLM inference in TypeScript applications. Ollama provides a simple API for running language models locally.
When to Apply
Use this skill when:
- •Running LLMs locally without cloud APIs
- •Generating text or embeddings with Ollama
- •Building AI features that need to work offline
- •Implementing RAG pipelines with local models
- •Testing AI applications without API costs
Prerequisites
Install Ollama
bash
# macOS brew install ollama # Linux curl -fsSL https://ollama.com/install.sh | sh # Start the server ollama serve
Pull Required Models
bash
# Embedding model (768 dimensions) ollama pull nomic-embed-text # Chat/generation model ollama pull mistral # Alternative models ollama pull llama2 ollama pull codellama
OllamaClient Implementation
Complete TypeScript client for Ollama API:
typescript
// src/utils/ollama.ts
import fetch from 'cross-fetch';
interface OllamaResponse {
model: string;
created_at: string;
response: string;
done: boolean;
}
interface OllamaEmbeddingResponse {
embedding: number[];
}
interface OllamaChatMessage {
role: 'system' | 'user' | 'assistant';
content: string;
}
interface OllamaChatResponse {
model: string;
created_at: string;
message: OllamaChatMessage;
done: boolean;
}
export class OllamaClient {
private baseUrl: string;
constructor(baseUrl?: string) {
this.baseUrl = baseUrl || process.env.OLLAMA_HOST || 'http://localhost:11434';
}
// Generate embeddings for text
async generateEmbedding(
text: string,
model: string = 'nomic-embed-text'
): Promise<number[]> {
const response = await fetch(`${this.baseUrl}/api/embeddings`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ model, prompt: text }),
});
if (!response.ok) {
throw new Error(`Failed to generate embedding: ${response.statusText}`);
}
const data: OllamaEmbeddingResponse = await response.json();
return data.embedding;
}
// Generate text response (non-streaming)
async generateResponse(
prompt: string,
context?: string,
model: string = 'mistral'
): Promise<string> {
const fullPrompt = context
? `Context: ${context}\n\nQuestion: ${prompt}\n\nAnswer:`
: prompt;
const response = await fetch(`${this.baseUrl}/api/generate`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model,
prompt: fullPrompt,
stream: false,
}),
});
if (!response.ok) {
throw new Error(`Failed to generate response: ${response.statusText}`);
}
const data: OllamaResponse = await response.json();
return data.response;
}
// Generate text response (streaming)
async generateStreamingResponse(
prompt: string,
onChunk: (chunk: string) => void,
context?: string,
model: string = 'mistral'
): Promise<void> {
const fullPrompt = context
? `Context: ${context}\n\nQuestion: ${prompt}\n\nAnswer:`
: prompt;
const response = await fetch(`${this.baseUrl}/api/generate`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model,
prompt: fullPrompt,
stream: true,
}),
});
if (!response.ok) {
throw new Error(`Failed to generate streaming response: ${response.statusText}`);
}
const reader = response.body?.getReader();
if (!reader) {
throw new Error('Failed to get response reader');
}
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
const lines = chunk.split('\n').filter(Boolean);
for (const line of lines) {
try {
const data: OllamaResponse = JSON.parse(line);
if (data.response) {
onChunk(data.response);
}
} catch {
// Skip malformed JSON lines
}
}
}
}
// Chat completion API
async chat(
messages: OllamaChatMessage[],
model: string = 'mistral'
): Promise<string> {
const response = await fetch(`${this.baseUrl}/api/chat`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model,
messages,
stream: false,
}),
});
if (!response.ok) {
throw new Error(`Failed to chat: ${response.statusText}`);
}
const data: OllamaChatResponse = await response.json();
return data.message.content;
}
// List available models
async listModels(): Promise<string[]> {
const response = await fetch(`${this.baseUrl}/api/tags`);
if (!response.ok) {
throw new Error(`Failed to list models: ${response.statusText}`);
}
const data = await response.json();
return data.models?.map((m: { name: string }) => m.name) || [];
}
// Check if Ollama is running
async isHealthy(): Promise<boolean> {
try {
const response = await fetch(`${this.baseUrl}/api/tags`);
return response.ok;
} catch {
return false;
}
}
}
API Endpoints
| Endpoint | Method | Purpose |
|---|---|---|
/api/embeddings | POST | Generate embeddings |
/api/generate | POST | Generate text completion |
/api/chat | POST | Chat completion |
/api/tags | GET | List available models |
/api/pull | POST | Pull a model |
Usage Examples
Basic Text Generation
typescript
const ollama = new OllamaClient(); const response = await ollama.generateResponse( 'Explain machine learning in simple terms' ); console.log(response);
With Context (RAG)
typescript
const context = 'Our company was founded in 2020 and has 50 employees.'; const question = 'When was the company founded?'; const response = await ollama.generateResponse(question, context); // "Based on the context, the company was founded in 2020."
Streaming Response
typescript
await ollama.generateStreamingResponse( 'Write a short poem about coding', (chunk) => process.stdout.write(chunk) );
Chat Conversation
typescript
const messages = [
{ role: 'system', content: 'You are a helpful coding assistant.' },
{ role: 'user', content: 'How do I reverse a string in JavaScript?' },
];
const response = await ollama.chat(messages);
console.log(response);
Generate Embeddings
typescript
const text = 'Machine learning is a subset of artificial intelligence.';
const embedding = await ollama.generateEmbedding(text);
console.log(`Embedding dimensions: ${embedding.length}`); // 768 for nomic-embed-text
Model Selection
Embedding Models
| Model | Dimensions | Speed | Quality |
|---|---|---|---|
nomic-embed-text | 768 | Fast | Good |
mxbai-embed-large | 1024 | Medium | Better |
all-minilm | 384 | Very Fast | Acceptable |
Generation Models
| Model | Size | Speed | Use Case |
|---|---|---|---|
mistral | 7B | Fast | General purpose |
llama2 | 7B | Fast | General purpose |
codellama | 7B | Fast | Code generation |
mixtral | 8x7B | Slow | High quality |
Environment Configuration
bash
# Default Ollama host export OLLAMA_HOST=http://localhost:11434 # For Docker/CI environments export OLLAMA_HOST=http://ollama:11434
Testing with Ollama
typescript
import { OllamaClient } from '../src/utils/ollama';
let ollama: OllamaClient;
beforeAll(() => {
ollama = new OllamaClient();
});
test('should generate embedding', async () => {
const embedding = await ollama.generateEmbedding('test text');
expect(embedding).toHaveLength(768);
expect(embedding.every(n => typeof n === 'number')).toBe(true);
});
test('should generate response', async () => {
const response = await ollama.generateResponse('Say hello');
expect(response).toBeTruthy();
expect(typeof response).toBe('string');
});
CI/CD Integration
In GitHub Actions, use the Ollama service container:
yaml
services:
ollama:
image: ollama/ollama:latest
ports:
- 11434:11434
env:
OLLAMA_HOST: http://ollama:11434
steps:
- name: Pull models
run: |
wget -q -O - --post-data='{"name": "nomic-embed-text"}' \
--header='Content-Type: application/json' \
http://ollama:11434/api/pull
wget -q -O - --post-data='{"name": "mistral"}' \
--header='Content-Type: application/json' \
http://ollama:11434/api/pull
Error Handling
typescript
async function safeGenerate(prompt: string): Promise<string | null> {
const ollama = new OllamaClient();
// Check if Ollama is running
if (!await ollama.isHealthy()) {
console.error('Ollama is not running');
return null;
}
try {
return await ollama.generateResponse(prompt);
} catch (error) {
console.error('Generation failed:', error);
return null;
}
}
Troubleshooting
| Issue | Solution |
|---|---|
| "Connection refused" | Start Ollama: ollama serve |
| "Model not found" | Pull model: ollama pull <model> |
| Slow responses | Use smaller model or reduce prompt length |
| Out of memory | Use quantized model or smaller context |
| Timeout errors | Increase timeout or use streaming |
Package Dependencies
json
{
"dependencies": {
"cross-fetch": "^4.1.0"
}
}
References
- •Related skill:
rag-pipelinefor complete RAG implementation - •Related skill:
pgvector-embeddingsfor storing embeddings - •Related skill:
github-workflows-ollamafor CI/CD setup - •Ollama documentation
- •Ollama API reference