AgentSkillsCN

fast-mlx

针对 MLX 代码,从性能与内存占用两方面进行优化。当需要实现或加速 MLX 模型与算法、降低延迟与吞吐瓶颈、调整惰性求值策略、优化类型提升、加速运算操作、改进编译效率、优化内存使用,或开展性能剖析时,可调用此技能。

SKILL.md
--- frontmatter
name: fast-mlx
description: Optimize MLX code for performance and memory. Use when asked to implement or speed up MLX models or algorithms, reduce latency/throughput bottlenecks, tune lazy evaluation, type promotion, fast ops, compilation, memory use, or profiling.

Fast MLX

Workflow

  • Looks for opportunities to compile functions of mostly elementwise operations.
  • For models with fixed shape inputs or where the shapes don't change much, compile the entire graph
  • Replace slow implementations with MLX fast ops
  • Identify evaluation boundaries and unintended sync points (mx.eval, item(), NumPy conversions).
  • Check dtype promotion and scalar usage; keep precision consistent with intent.
  • Review compilation strategy; avoid unnecessary recompiles and closure captures.
  • Reduce peak memory via lazy loading order and releasing temporaries before mx.eval.
  • Suggest profiling steps if the bottleneck is unclear.

References

  • Read references/fast-mlx-guide.md for detailed tips and examples. Use it as the source of truth.

Output expectations

  • Provide concrete code changes with brief rationale
  • Call out changes that need user confirmation (e.g., enabling async eval or shapeless compile).