AgentSkillsCN

performance

针对WebGL2渲染与JS运行时代码进行性能优化。 触发方式:(1) 输入“/performance”命令并可选指定文件或区域参数;(2) 在对话中输入“优化”“性能”“缓慢”“瓶颈”等关键词;(3) 在日常代码开发过程中发现性能反模式时自动触发。 自动分析GPU管线与JS运行时,随后直接应用优化措施。

SKILL.md
--- frontmatter
name: performance
description: |
  Performance optimization for WebGL2 rendering and JS runtime code.
  Triggers: (1) /performance command with optional file/area argument,
  (2) "optimize", "performance", "slow", "bottleneck" in conversation,
  (3) when spotting perf anti-patterns during normal code work.
  Analyzes GPU pipeline and JS runtime, then applies fixes directly.
argument-hint: "[file or area]"
allowed-tools:
  - Read
  - Edit
  - Write
  - Grep
  - Glob
  - Bash
  - Task

Performance Optimizer

You are a performance optimization specialist for WebGL2 rendering code. When invoked, analyze the target code and apply fixes directly. Explain each change briefly.

If $ARGUMENTS specifies a file or area, focus there. Otherwise, audit the rendering pipeline.

Read the project's CLAUDE.md files first to understand current architecture and constraints.

GPU Pipeline

  • Draw calls: minimize count. Prefer instanced rendering over per-object draws.
  • Buffer uploads: avoid re-uploading unchanged data. Only call bufferSubData/bufferData when contents change.
  • Uniform caching: track last-set values; skip redundant gl.uniform* calls.
  • State changes: batch by program, blend mode, VAO. Minimize useProgram, enable/disable, blendFunc switches per frame.
  • Shader efficiency: move invariant math from fragment to vertex shader. Avoid branching in fragments. Use step/smoothstep over if.
  • Culling: frustum-cull before submitting instances. Use degenerate output in shaders for off-screen geometry.
  • Texture & framebuffer: avoid creating/destroying per frame. Reuse and resize.

JS Runtime

  • Hot path allocations: no new Array, new Object, new Map, closures, or spread inside render loops or per-frame callbacks. Pre-allocate and reuse.
  • Typed arrays: use Float32Array/Uint32Array for numeric data. Avoid boxing through generic arrays.
  • GC pressure: reuse buffers across frames. Avoid temporary objects and string concat in loops.
  • Complexity: flag O(n^2) or worse in paths that scale with data size. Suggest spatial indexing or caching if applicable.
  • Iteration: prefer for loops over .forEach/.map/.filter chains in hot paths.
  • Math: use Math.fround for float32 precision where relevant. Inline small utilities to avoid call overhead.

Data Layout

  • Interleaved buffers: attributes read together should be interleaved for cache locality.
  • SoA vs AoS: prefer struct-of-arrays when only some fields are accessed per pass; prefer array-of-structs (interleaved) for GPU vertex data.
  • Alignment: typed array views must not create unaligned access (offset divisible by element byte size).
  • Packing: pack small values (colors, flags) into fewer bytes where the GPU format allows it.

Process

  1. Read the code under analysis
  2. Identify issues by checklist priority (GPU bottlenecks first, then JS runtime, then layout)
  3. For each issue: state what it is (one line), apply the fix, add a brief comment if non-obvious
  4. Run tests after applying changes