fast-mlx

针对 MLX 代码，从性能与内存占用两方面进行优化。当需要实现或加速 MLX 模型与算法、降低延迟与吞吐瓶颈、调整惰性求值策略、优化类型提升、加速运算操作、改进编译效率、优化内存使用，或开展性能剖析时，可调用此技能。

name: fast-mlx description: Optimize MLX code for performance and memory. Use when asked to implement or speed up MLX models or algorithms, reduce latency/throughput bottlenecks, tune lazy evaluation, type promotion, fast ops, compilation, memory use, or profiling.

Fast MLX

Workflow

•Looks for opportunities to compile functions of mostly elementwise operations.
•For models with fixed shape inputs or where the shapes don't change much, compile the entire graph
•Replace slow implementations with MLX fast ops
•Identify evaluation boundaries and unintended sync points (mx.eval, item(), NumPy conversions).
•Check dtype promotion and scalar usage; keep precision consistent with intent.
•Review compilation strategy; avoid unnecessary recompiles and closure captures.
•Reduce peak memory via lazy loading order and releasing temporaries before mx.eval.
•Suggest profiling steps if the bottleneck is unclear.

References

•Read references/fast-mlx-guide.md for detailed tips and examples. Use it as the source of truth.

Output expectations

•Provide concrete code changes with brief rationale
•Call out changes that need user confirmation (e.g., enabling async eval or shapeless compile).