AgentSkillsCN

spark-optimization

通过分区、缓存、shuffle 优化以及内存调优,优化 Apache Spark 作业。在提升 Spark 性能、调试运行缓慢的作业,或扩展数据处理流水线时使用此功能。

SKILL.md
--- frontmatter
version: 4.1.0-fractal
name: spark-optimization
description: Optimize Apache Spark jobs with partitioning, caching, shuffle optimization, and memory tuning. Use when improving Spark performance, debugging slow jobs, or scaling data processing pipelines.

Apache Spark Optimization

Production patterns for optimizing Apache Spark jobs including partitioning strategies, memory management, shuffle optimization, and performance tuning.

Do not use this skill when

  • The task is unrelated to apache spark optimization
  • You need a different domain or tool outside this scope

Instructions

  • Clarify goals, constraints, and required inputs.
  • Apply relevant best practices and validate outcomes.
  • Provide actionable steps and verification.
  • If detailed examples are required, open resources/implementation-playbook.md.

Use this skill when

  • Optimizing slow Spark jobs
  • Tuning memory and executor configuration
  • Implementing efficient partitioning strategies
  • Debugging Spark performance issues
  • Scaling Spark pipelines for large datasets
  • Reducing shuffle and data skew

Core Concepts

🧠 Knowledge Modules (Fractal Skills)

1. 1. Spark Execution Model

2. 2. Key Performance Factors

3. Pattern 1: Optimal Partitioning

4. Pattern 2: Join Optimization

5. Pattern 3: Caching and Persistence

6. Pattern 4: Memory Tuning

7. Pattern 5: Shuffle Optimization

8. Pattern 6: Data Format Optimization

9. Pattern 7: Monitoring and Debugging

10. Do's

11. Don'ts