Microsoft.Extensions.AI
Overview
Microsoft.Extensions.AI provides a set of provider-agnostic abstractions for AI services in .NET, centered on IChatClient for chat completions and IEmbeddingGenerator<,> for vector embeddings. It follows the same patterns as ILogger and HttpClient in .NET -- services program against interfaces, providers are configured at the composition root, and middleware can be composed in pipelines for caching, telemetry, rate limiting, and function calling.
NuGet Packages
dotnet add package Microsoft.Extensions.AI dotnet add package Microsoft.Extensions.AI.OpenAI # OpenAI / Azure OpenAI provider dotnet add package Microsoft.Extensions.AI.Ollama # Ollama local models dotnet add package Microsoft.Extensions.AI.AzureAIInference # Azure AI model catalog
Basic Chat Completion
using Microsoft.Extensions.AI;
IChatClient chatClient = new OllamaChatClient(new Uri("http://localhost:11434"), "llama3.1");
var response = await chatClient.GetResponseAsync("What is dependency injection?");
Console.WriteLine(response.Messages.Last().Text);
Streaming Chat Completion
using Microsoft.Extensions.AI;
IChatClient chatClient = new OpenAIChatClient("gpt-4o", apiKey);
await foreach (var update in chatClient.GetStreamingResponseAsync("Explain SOLID principles"))
{
Console.Write(update.Text);
}
Console.WriteLine();
Multi-Turn Conversations
var messages = new List<ChatMessage>
{
new(ChatRole.System, "You are a C# tutor. Give concise explanations with code examples."),
new(ChatRole.User, "What are records in C#?")
};
var response = await chatClient.GetResponseAsync(messages);
Console.WriteLine(response.Messages.Last().Text);
// Continue the conversation
messages.AddRange(response.Messages);
messages.Add(new ChatMessage(ChatRole.User, "How do they differ from classes?"));
var followUp = await chatClient.GetResponseAsync(messages);
Console.WriteLine(followUp.Messages.Last().Text);
Chat Options
var options = new ChatOptions
{
Temperature = 0.3f,
MaxOutputTokens = 2048,
TopP = 0.9f,
StopSequences = new[] { "```" },
ResponseFormat = ChatResponseFormat.Json
};
var response = await chatClient.GetResponseAsync(
"Return a JSON object with fields: name, age, city", options);
Console.WriteLine(response.Messages.Last().Text);
Embedding Generation
using Microsoft.Extensions.AI;
IEmbeddingGenerator<string, Embedding<float>> embeddingGenerator =
new OllamaEmbeddingGenerator(new Uri("http://localhost:11434"), "all-minilm");
var embeddings = await embeddingGenerator.GenerateAsync(new[]
{
"Dependency injection in .NET",
"Entity Framework Core migrations",
"ASP.NET Core middleware pipeline"
});
foreach (var embedding in embeddings)
{
Console.WriteLine($"Dimensions: {embedding.Vector.Length}");
Console.WriteLine($"First 5: [{string.Join(", ", embedding.Vector.ToArray().Take(5))}]");
}
// Cosine similarity between two embeddings
float CosineSimilarity(ReadOnlyMemory<float> a, ReadOnlyMemory<float> b)
{
var aSpan = a.Span;
var bSpan = b.Span;
float dot = 0, normA = 0, normB = 0;
for (int i = 0; i < aSpan.Length; i++)
{
dot += aSpan[i] * bSpan[i];
normA += aSpan[i] * aSpan[i];
normB += bSpan[i] * bSpan[i];
}
return dot / (MathF.Sqrt(normA) * MathF.Sqrt(normB));
}
float similarity = CosineSimilarity(embeddings[0].Vector, embeddings[1].Vector);
Console.WriteLine($"Similarity: {similarity:F4}");
Tool Calling (Function Calling)
using Microsoft.Extensions.AI;
using System.ComponentModel;
[Description("Gets the current weather for a city")]
static string GetWeather([Description("City name")] string city) =>
city.ToLower() switch
{
"seattle" => "62F, cloudy",
"new york" => "75F, sunny",
_ => "Unknown"
};
[Description("Calculates the tip amount")]
static decimal CalculateTip(
[Description("Bill amount")] decimal amount,
[Description("Tip percentage")] decimal percentage) =>
amount * percentage / 100;
var chatOptions = new ChatOptions
{
Tools = new AITool[]
{
AIFunctionFactory.Create(GetWeather),
AIFunctionFactory.Create(CalculateTip)
},
ToolMode = ChatToolMode.Auto
};
var messages = new List<ChatMessage>
{
new(ChatRole.User, "What's the weather in Seattle? Also, what's a 20% tip on $85?")
};
var response = await chatClient.GetResponseAsync(messages, chatOptions);
Console.WriteLine(response.Messages.Last().Text);
Middleware Pipelines
Compose cross-cutting concerns as middleware layers around any IChatClient.
using Microsoft.Extensions.AI;
using Microsoft.Extensions.Caching.Distributed;
using Microsoft.Extensions.Caching.Memory;
IChatClient pipeline = new ChatClientBuilder(new OpenAIChatClient("gpt-4o", apiKey))
.UseDistributedCache(new MemoryDistributedCache(
Options.Create(new MemoryDistributedCacheOptions())))
.UseOpenTelemetry()
.UseFunctionInvocation()
.Build();
// All calls through this pipeline get caching, telemetry, and auto function invocation
var response = await pipeline.GetResponseAsync("What is LINQ?");
Provider Comparison
| Provider Package | Backend | Use Case |
|---|---|---|
Microsoft.Extensions.AI.OpenAI | OpenAI API | Cloud-hosted GPT models |
Microsoft.Extensions.AI.AzureAIInference | Azure AI | Enterprise Azure models |
Microsoft.Extensions.AI.Ollama | Ollama | Local/self-hosted models |
Dependency Injection
var builder = WebApplication.CreateBuilder(args);
// Register IChatClient -- swap provider here without changing consuming code
builder.Services.AddChatClient(sp =>
new ChatClientBuilder(new OpenAIChatClient("gpt-4o", apiKey))
.UseDistributedCache(sp.GetRequiredService<IDistributedCache>())
.UseOpenTelemetry()
.UseFunctionInvocation()
.Build());
builder.Services.AddEmbeddingGenerator<string, Embedding<float>>(sp =>
new OllamaEmbeddingGenerator(new Uri("http://localhost:11434"), "all-minilm"));
builder.Services.AddDistributedMemoryCache();
var app = builder.Build();
app.MapPost("/chat", async (ChatRequest request, IChatClient chatClient) =>
{
var messages = new List<ChatMessage>
{
new(ChatRole.User, request.Message)
};
var response = await chatClient.GetResponseAsync(messages);
return Results.Ok(new { reply = response.Messages.Last().Text });
});
app.MapPost("/embed", async (EmbedRequest request,
IEmbeddingGenerator<string, Embedding<float>> generator) =>
{
var embeddings = await generator.GenerateAsync(new[] { request.Text });
return Results.Ok(new { vector = embeddings[0].Vector.ToArray() });
});
app.Run();
record ChatRequest(string Message);
record EmbedRequest(string Text);
Service Pattern with IChatClient
public sealed class SummaryService(IChatClient chatClient)
{
public async Task<string> SummarizeAsync(string text, CancellationToken ct = default)
{
var messages = new List<ChatMessage>
{
new(ChatRole.System, "Summarize the following text in 3 bullet points."),
new(ChatRole.User, text)
};
var options = new ChatOptions
{
Temperature = 0.2f,
MaxOutputTokens = 512
};
var response = await chatClient.GetResponseAsync(messages, options, ct);
return response.Messages.Last().Text ?? "";
}
}
public sealed class ClassificationService(IChatClient chatClient)
{
public async Task<string> ClassifyAsync(string input, string[] categories, CancellationToken ct = default)
{
var categoryList = string.Join(", ", categories);
var messages = new List<ChatMessage>
{
new(ChatRole.System,
$"Classify the input into one of these categories: {categoryList}. Return only the category name."),
new(ChatRole.User, input)
};
var options = new ChatOptions
{
Temperature = 0.0f,
MaxOutputTokens = 50
};
var response = await chatClient.GetResponseAsync(messages, options, ct);
return response.Messages.Last().Text?.Trim() ?? "";
}
}
Best Practices
- •Program against
IChatClientandIEmbeddingGenerator<,>interfaces in all service code; configure the concrete provider (OpenAIChatClient,OllamaChatClient) only at the composition root to enable swapping providers without code changes. - •Use
ChatClientBuilderto compose middleware (caching, telemetry, function invocation) in a pipeline rather than implementing cross-cutting concerns in every service method. - •Register
IChatClientviaAddChatClientin DI and inject it with constructor injection; avoid resolving clients fromIServiceProvidermanually. - •Set
Temperature = 0.0ffor deterministic tasks (classification, extraction, structured output) and0.7f-1.0ffor creative tasks (writing, brainstorming). - •Always pass
CancellationTokenthrough toGetResponseAsyncandGetStreamingResponseAsyncso that HTTP request cancellations propagate to the underlying provider. - •Use
ChatResponseFormat.Jsonwith structured output prompts when you need machine-parseable responses rather than free-form text. - •Cache embedding results with
UseDistributedCachewhen the same texts are embedded repeatedly (e.g., static document chunks) to avoid redundant API calls. - •Create focused service classes (e.g.,
SummaryService,ClassificationService) that encapsulate system prompts and chat options rather than spreading prompt engineering across controllers. - •Use
AIFunctionFactory.Createto wrap strongly-typed C# methods as tools with automatic schema generation from[Description]attributes, rather than manually defining JSON schemas. - •Monitor token usage and latency by adding
UseOpenTelemetry()to the pipeline and exporting traces to your observability backend (Jaeger, Application Insights, Aspire Dashboard).