Apache Spark Optimization
Production patterns for optimizing Apache Spark jobs including partitioning strategies, memory management, shuffle optimization, and performance tuning.
Do not use this skill when
- •The task is unrelated to apache spark optimization
- •You need a different domain or tool outside this scope
Instructions
- •Clarify goals, constraints, and required inputs.
- •Apply relevant best practices and validate outcomes.
- •Provide actionable steps and verification.
- •If detailed examples are required, open
resources/implementation-playbook.md.
Use this skill when
- •Optimizing slow Spark jobs
- •Tuning memory and executor configuration
- •Implementing efficient partitioning strategies
- •Debugging Spark performance issues
- •Scaling Spark pipelines for large datasets
- •Reducing shuffle and data skew