Fast MLX
Workflow
- •Looks for opportunities to compile functions of mostly elementwise operations.
- •For models with fixed shape inputs or where the shapes don't change much, compile the entire graph
- •Replace slow implementations with MLX fast ops
- •Identify evaluation boundaries and unintended sync points (
mx.eval,item(), NumPy conversions). - •Check dtype promotion and scalar usage; keep precision consistent with intent.
- •Review compilation strategy; avoid unnecessary recompiles and closure captures.
- •Reduce peak memory via lazy loading order and releasing temporaries before
mx.eval. - •Suggest profiling steps if the bottleneck is unclear.
References
- •Read
references/fast-mlx-guide.mdfor detailed tips and examples. Use it as the source of truth.
Output expectations
- •Provide concrete code changes with brief rationale
- •Call out changes that need user confirmation (e.g., enabling async eval or shapeless compile).