Local GPU Training Manager
Run Unsloth training on your local GPU.
Prerequisites Check
1. Verify CUDA
python
import torch
print(f"CUDA available: {torch.cuda.is_available()}")
print(f"GPU: {torch.cuda.get_device_name(0)}")
print(f"VRAM: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB")
If CUDA not available:
- •Check NVIDIA drivers:
nvidia-smi - •Check CUDA:
nvcc --version - •Reinstall PyTorch:
pip install torch --index-url https://download.pytorch.org/whl/cu121
2. Check VRAM
See references/HARDWARE_GUIDE.md for requirements:
| VRAM | Recommended Setup |
|---|---|
| 8GB | 7B, 4-bit, batch=1, LoRA r=8 |
| 12GB | 7B, 4-bit, batch=2, LoRA r=16 |
| 16GB | 7-13B, 4-bit, batch=2, LoRA r=16-32 |
| 24GB | 7-14B, 4-bit, batch=4, LoRA r=32 |
3. Check Dependencies
bash
pip install unsloth torch transformers trl peft datasets accelerate bitsandbytes
Docker Option
Use the official Unsloth Docker image for a pre-configured environment (supports all GPUs including Blackwell/50-series):
bash
docker run -d \ -e JUPYTER_PASSWORD="unsloth" \ -p 8888:8888 \ -v $(pwd)/work:/workspace/work \ --gpus all \ unsloth/unsloth
Access Jupyter at http://localhost:8888. Example notebooks are in /workspace/unsloth-notebooks/.
Environment variables:
- •
JUPYTER_PASSWORD- Jupyter auth (default:unsloth) - •
JUPYTER_PORT- Port (default:8888) - •
USER_PASSWORD- User/sudo password (default:unsloth)
Run Training
Option 1: Notebook
bash
jupyter notebook notebooks/sft_template.ipynb
Option 2: Script
bash
# Edit configuration in script, then run python scripts/train_sft.py
GPU Selection (Multi-GPU)
python
import os os.environ["CUDA_VISIBLE_DEVICES"] = "0" # Use first GPU
Monitor Training
Terminal
bash
# Watch GPU usage watch -n 1 nvidia-smi # Or use nvitop (more detailed) pip install nvitop && nvitop
WandB (Optional)
bash
export WANDB_API_KEY="your-key" # Add report_to="wandb" in TrainingArguments
Troubleshooting
OOM Error
Try in order:
- •Reduce batch_size (to 1)
- •Increase gradient_accumulation
- •Reduce max_seq_length
- •Reduce LoRA rank
- •
torch.cuda.empty_cache()
Loss Not Decreasing
- •Check learning rate (try higher or lower)
- •Verify chat template matches model
- •Inspect data format
Training Too Slow
- •Enable bf16 if supported
- •Use
packing=Truefor short sequences - •Reduce logging_steps
See references/TROUBLESHOOTING.md for more solutions.
Resume from Checkpoint
python
TrainingArguments(
resume_from_checkpoint=True, # Auto-find latest
# Or: resume_from_checkpoint="outputs/checkpoint-500"
)
Save Model
Training script automatically saves:
- •
outputs/lora_adapter/- LoRA weights - •
outputs/merged_16bit/- Merged model (optional)
Test Inference
python
from unsloth import FastLanguageModel
model, tokenizer = FastLanguageModel.from_pretrained("outputs/lora_adapter")
FastLanguageModel.for_inference(model)
messages = [{"role": "user", "content": "Hello!"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to("cuda")
outputs = model.generate(inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0]))
Handoff
Offer funsloth-upload for Hub upload with model card.
Tips
- •Close other GPU apps before training
- •Monitor temps - keep under 85C
- •Use UPS for long runs
- •Save frequently with
save_steps
Bundled Resources
- •notebooks/sft_template.ipynb - Notebook template
- •scripts/train_sft.py - Script template
- •references/HARDWARE_GUIDE.md - VRAM requirements
- •references/TROUBLESHOOTING.md - Common issues