ML Best Practices
Model Selection Guidelines
Problem Type Classification
- •Supervised Learning: Labeled data for training
- •Regression: Predict continuous values (Linear Regression, Random Forest, Gradient Boosting)
- •Classification: Predict discrete labels (Logistic Regression, SVM, Decision Trees, Neural Networks)
- •Unsupervised Learning: Unlabeled data exploration
- •Clustering: Group similar data points (K-Means, DBSCAN, Hierarchical)
- •Dimensionality Reduction: Reduce feature space (PCA, t-SNE, UMAP)
- •Anomaly Detection: Identify outliers (Isolation Forest, One-Class SVM)
- •Reinforcement Learning: Learn through interaction with environment
- •Policy-based: Learn policy directly (REINFORCE, PPO)
- •Value-based: Learn value function (DQN, SARSA)
Algorithm Selection Criteria
- •Data Size: Small vs. large datasets
- •Feature Types: Numerical, categorical, text, image
- •Interpretability: Need for model explanations
- •Training Time: Constraints on model training
- •Inference Latency: Real-time vs. batch predictions
- •Accuracy Requirements: Trade-offs with complexity
Common ML Frameworks
- •scikit-learn: Traditional ML algorithms, easy to use
- •TensorFlow/Keras: Deep learning, production-ready
- •PyTorch: Research-friendly, dynamic computation graphs
- •XGBoost/LightGBM: Gradient boosting for tabular data
- •Hugging Face Transformers: Pre-trained NLP models
Feature Engineering Techniques
Numerical Features
- •Scaling: Standardization (z-score) or Min-Max scaling
- •Binning: Convert continuous to categorical
- •Polynomial Features: Create interaction terms
- •Log Transformations: Handle skewed distributions
- •Normalization: Scale to unit norm
Categorical Features
- •One-Hot Encoding: Binary columns for each category
- •Label Encoding: Map categories to integers
- •Ordinal Encoding: Preserve order for ordinal categories
- •Target Encoding: Replace with target mean (with regularization)
- •Embedding: Learn dense representations (for high cardinality)
Text Features
- •Bag of Words: Word frequency counts
- •TF-IDF: Term frequency-inverse document frequency
- •N-grams: Capture word sequences
- •Word Embeddings: Pre-trained (Word2Vec, GloVe) or learned
- •Transformer Embeddings: Contextual embeddings (BERT, RoBERTa)
Feature Selection
- •Filter Methods: Statistical tests, correlation analysis
- •Wrapper Methods: Recursive feature elimination, forward/backward selection
- •Embedded Methods: L1 regularization, tree-based feature importance
- •Dimensionality Reduction: PCA, LDA, autoencoders
Hyperparameter Tuning Strategies
Search Strategies
- •Grid Search: Exhaustive search over parameter grid
- •Random Search: Random sampling from parameter space
- •Bayesian Optimization: Use probabilistic model to guide search
- •Evolutionary Algorithms: Genetic algorithms for parameter evolution
- •Successive Halving: Early stopping for poor configurations
Common Hyperparameters
- •Tree-based Models: max_depth, n_estimators, learning_rate, min_samples_split
- •Neural Networks: learning_rate, batch_size, number of layers, number of units
- •SVM: C, kernel, gamma
- •K-Means: n_clusters, init, n_init
Tuning Best Practices
- •Cross-Validation: Use k-fold or stratified k-fold for robust evaluation
- •Early Stopping: Stop training when validation performance degrades
- •Learning Rate Schedules: Decay learning rate over time
- •Ensembling: Combine multiple models for better performance
Evaluation Metrics and Validation Methods
Regression Metrics
- •Mean Squared Error (MSE): Average of squared errors
- •Root Mean Squared Error (RMSE): Square root of MSE
- •Mean Absolute Error (MAE): Average of absolute errors
- •R-squared: Proportion of variance explained
- •Mean Absolute Percentage Error (MAPE): Percentage-based error
Classification Metrics
- •Accuracy: Overall correct predictions
- •Precision: True positives / (true positives + false positives)
- •Recall: True positives / (true positives + false negatives)
- •F1-Score: Harmonic mean of precision and recall
- •ROC-AUC: Area under ROC curve
- •Confusion Matrix: Detailed breakdown of predictions
Validation Methods
- •Train-Test Split: Simple holdout validation
- •K-Fold Cross-Validation: Divide data into k folds
- •Stratified K-Fold: Preserve class distribution in folds
- •Time Series Split: Respect temporal order
- •Nested Cross-Validation: Outer loop for evaluation, inner for tuning
Bias-Variance Trade-off
- •High Bias: Underfitting, model too simple
- •High Variance: Overfitting, model too complex
- •Sweet Spot: Balance between bias and variance
- •Regularization: Reduce variance by adding constraints
Model Interpretation
Feature Importance
- •Permutation Importance: Shuffle feature values and measure impact
- •SHAP Values: Game-theoretic approach to feature attribution
- •LIME: Local interpretable model-agnostic explanations
- •Partial Dependence Plots: Show relationship between feature and predictions
Model-Agnostic Methods
- •SHAP: Consistent, local feature attribution
- •LIME: Local linear approximations
- •Permutation Importance: Global feature importance
- •Partial Dependence: Global relationship visualization
Model-Specific Methods
- •Linear Models: Coefficients directly show feature impact
- •Tree-based Models: Feature importance from split criteria
- •Neural Networks: Attention weights, saliency maps