Fine-Tuning & Customization
Customize LLMs for specific domains using parameter-efficient fine-tuning and alignment techniques.
Unsloth 2026: 7x longer context RL, FP8 RL on consumer GPUs, rsLoRA support. TRL: OpenEnv integration, vLLM server mode, transformers 5.0.0+ compatible.
Decision Framework: Fine-Tune or Not?
| Approach | Try First | When It Works |
|---|---|---|
| Prompt Engineering | Always | Simple tasks, clear instructions |
| RAG | External knowledge needed | Knowledge-intensive tasks |
| Fine-Tuning | Last resort | Deep specialization, format control |
Fine-tune ONLY when:
- •Prompt engineering tried and insufficient
- •RAG doesn't capture domain nuances
- •Specific output format consistently required
- •Persona/style must be deeply embedded
- •You have ~1000+ high-quality examples
LoRA vs QLoRA (Unsloth 2026)
| Criteria | LoRA | QLoRA |
|---|---|---|
| Model fits in VRAM | Use LoRA | |
| Memory constrained | Use QLoRA | |
| Training speed | 39% faster | |
| Memory savings | 75%+ (dynamic 4-bit quants) | |
| Quality | Baseline | ~Same (Unsloth recovered accuracy loss) |
| 70B LLaMA | <48GB VRAM with QLoRA |
Quick Reference: LoRA Training
python
from unsloth import FastLanguageModel
from trl import SFTTrainer
# Load with 4-bit quantization (QLoRA)
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/Meta-Llama-3.1-8B",
max_seq_length=2048,
load_in_4bit=True,
)
# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
model,
r=16, # Rank (16-64 typical)
lora_alpha=32, # Scaling (2x r)
lora_dropout=0.05,
target_modules=[
"q_proj", "k_proj", "v_proj", "o_proj", # Attention
"gate_proj", "up_proj", "down_proj", # MLP (QLoRA paper)
],
)
# Train
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=dataset,
max_seq_length=2048,
)
trainer.train()
DPO Alignment
python
from trl import DPOTrainer, DPOConfig
config = DPOConfig(
learning_rate=5e-6, # Lower for alignment
beta=0.1, # KL penalty coefficient
per_device_train_batch_size=4,
num_train_epochs=1,
)
# Preference dataset: {prompt, chosen, rejected}
trainer = DPOTrainer(
model=model,
ref_model=ref_model, # Frozen reference
args=config,
train_dataset=preference_dataset,
tokenizer=tokenizer,
)
trainer.train()
Synthetic Data Generation
python
async def generate_synthetic(topic: str, n: int = 100) -> list[dict]:
"""Generate training examples using teacher model."""
examples = []
for _ in range(n):
response = await client.chat.completions.create(
model="gpt-5.2", # Teacher
messages=[{
"role": "system",
"content": f"Generate a training example about {topic}. "
"Include instruction and response."
}],
response_format={"type": "json_object"}
)
examples.append(json.loads(response.choices[0].message.content))
return examples
Key Hyperparameters
| Parameter | Recommended | Notes |
|---|---|---|
| Learning rate | 2e-4 | LoRA/QLoRA standard |
| Epochs | 1-3 | More risks overfitting |
| LoRA r | 16-64 | Higher = more capacity |
| LoRA alpha | 2x r | Scaling factor |
| Batch size | 4-8 | Per device |
| Warmup | 3% | Ratio of steps |
Anti-Patterns (FORBIDDEN)
python
# NEVER fine-tune without trying alternatives first model.fine_tune(data) # Try prompt engineering & RAG first! # NEVER use low-quality training data data = scrape_random_web() # Garbage in, garbage out # NEVER skip evaluation trainer.train() deploy(model) # Always evaluate before deploy! # ALWAYS use separate eval set train, eval = split(data, test_size=0.1) trainer = SFTTrainer(..., eval_dataset=eval)
Detailed Documentation
| Resource | Description |
|---|---|
| references/lora-qlora.md | Parameter-efficient fine-tuning |
| references/dpo-alignment.md | Direct Preference Optimization |
| references/synthetic-data.md | Training data generation |
| references/when-to-finetune.md | Decision framework |
Related Skills
- •
llm-evaluation- Evaluate fine-tuned models - •
embeddings- When to use embeddings instead - •
rag-retrieval- When RAG is better than fine-tuning - •
langfuse-observability- Track training experiments
Capability Details
lora-qlora
Keywords: LoRA, QLoRA, PEFT, parameter efficient, adapter, low-rank Solves:
- •Fine-tune large models on consumer hardware
- •Configure LoRA hyperparameters
- •Choose target modules for adapters
dpo-alignment
Keywords: DPO, RLHF, preference, alignment, human feedback, preference data Solves:
- •Align models to human preferences
- •Create preference datasets
- •Configure DPO training
synthetic-data
Keywords: synthetic data, data generation, teacher model, distillation Solves:
- •Generate training data with LLMs
- •Implement teacher-student training
- •Scale training data quality
when-to-finetune
Keywords: should I fine-tune, fine-tune decision, customize model Solves:
- •Decide when fine-tuning is appropriate
- •Evaluate alternatives to fine-tuning
- •Assess data requirements