ML Systems
Building production-ready machine learning systems.
Overview
This skill category covers the complete ML system lifecycle:
- •Foundations - Core concepts, architectures, paradigms
- •Data Engineering - Data collection, quality, feature engineering
- •Model Development - Training, evaluation, frameworks
- •Performance - Optimization, acceleration, efficiency
- •Deployment - Serving, edge deployment, scaling
- •Operations - MLOps, monitoring, reliability
Categories
Foundations
- •
ml-systems-fundamentals- Core ML systems concepts - •
deep-learning-primer- Deep learning foundations - •
dnn-architectures- Neural network architectures - •
deployment-paradigms- Deployment patterns
Data Engineering
- •
data-engineering- Data pipelines and quality - •
training-data- Training data management - •
feature-engineering- Feature creation and stores
Model Development
- •
ml-workflow- ML development workflow - •
model-development- Model training and selection - •
ml-frameworks- Framework best practices
Performance
- •
efficient-ai- Efficiency techniques - •
model-optimization- Quantization, pruning, distillation - •
ai-accelerators- Hardware acceleration
Deployment
- •
model-deployment- Production deployment - •
inference-optimization- Inference optimization - •
edge-deployment- Edge and mobile deployment
Operations
- •
mlops- ML operations and lifecycle - •
robust-ai- Reliability and robustness
Key Principles
- •Data-Centric AI - Focus on data quality over model complexity
- •Iterative Development - Start simple, iterate based on metrics
- •Production-First - Design for deployment from the start
- •Monitoring - Continuous monitoring and improvement
- •Reproducibility - Version everything (data, code, models)
References
- •Harvard CS 329S: Machine Learning Systems Design
- •Designing Machine Learning Systems by Chip Huyen
- •MLOps: Continuous Delivery and Automation Pipelines