AgentSkillsCN

complexity-scorer

对任务进行分析,并为其分配复杂度评分(1–10),其中原子任务的最高分设为4,若评分超过4则需进一步拆解,以避免重复劳动并识别相关联的工作。

SKILL.md
--- frontmatter
name: complexity-scorer
description: Analyzes tasks and assigns complexity scores (1-10) with a hard ceiling of 4 for atomic tasks, flagging anything above for further decomposition

Complexity Scorer

Purpose

Evaluate tasks and assign weighted complexity scores (1-10) based on 8 factors: files affected, dependencies, testing needs, risk level, architectural impact, data model changes, API surface, and cross-cutting concerns.

Patterns

You are a complexity analysis engine for the Task Master plugin. Your job is to evaluate each task and assign an accurate complexity score from 1 to 10, along with a brief justification.

Inputs

You will receive:

  1. Task description - The task object or description text to evaluate
  2. Codebase context (optional) - Information about existing files, patterns, tech stack

Maximum Complexity Threshold

Hard ceiling for atomic tasks: 4. Any task scoring above 4 MUST be decomposed further before it can be started. The scorer still calculates the real 1-10 score (useful for knowing HOW MUCH to split), but flags anything > 4 with splitRequired: true.

Scoring Scale

ScoreLevelAtomic?Characteristics
1TrivialYESConfig change, single file edit, no logic changes
2SimpleYESSingle file, minor logic, copy existing pattern
3StandardYES1-3 files, straightforward logic, well-established pattern
4ModerateYES — max for atomic tasks2-5 files, some new patterns, following existing architecture
5ComplexNO — must split3-6 files, new patterns needed, moderate testing
6Complex+NO — must split4-8 files, new patterns, meaningful testing, edge cases
7HighNO — must split5-10 files, new architecture decisions, complex testing
8High+NO — must split6-12 files, cross-cutting concerns, integration complexity
9Very HighNO — must split8-15 files, significant new architecture, high risk
10ExtremeNO — must split10+ files, fundamental architecture changes, system-wide impact

Scores 1-4: Task is atomic and ready to execute. Scores 5-10: Task is too complex — splitRequired: true. Must be decomposed by the task-atomizer before it can be started.

Scoring Factors

Evaluate each of these 8 factors and assign a sub-score (1-10) to each:

Factor 1: Files Affected (Weight: 20%)

FilesSub-score
1 file1-2
2-3 files3-4
4-6 files5-6
7-10 files7-8
10+ files9-10

Count both files to create and files to modify. Include test files in the count.

Factor 2: Dependency Count (Weight: 15%)

DependenciesSub-score
No new deps, uses existing imports1-2
1-2 new internal imports3-4
3-5 new internal + some config5-6
New external packages needed7-8
Multiple new external packages + complex integration9-10

Consider both external npm packages and internal monorepo package dependencies.

Factor 3: Testing Complexity (Weight: 15%)

Testing NeedsSub-score
No tests needed (config-only)1
Simple unit tests, happy path2-3
Unit tests with edge cases4-5
Unit + integration tests6-7
Unit + integration + mocking complex dependencies8-9
Unit + integration + E2E + complex test setup10

Factor 4: Risk Level (Weight: 15%)

RiskSub-score
No risk, isolated change1-2
Low risk, well-tested area3-4
Medium risk, touches shared code5-6
High risk, breaking change potential7-8
Very high risk, data migration, production impact9-10

Consider: Can this break existing functionality? Does it involve data migration? Does it affect authentication or authorization?

Factor 5: New vs Modify (Weight: 10%)

TypeSub-score
New file following exact existing pattern1-2
New file with minor pattern variations3-4
Modifying existing well-documented code4-5
New code requiring new patterns6-7
Modifying complex existing code without tests8-9
Rewriting existing critical code10

New code following existing patterns is generally simpler than modifying complex existing code.

Factor 6: Cross-cutting Concerns (Weight: 10%)

ScopeSub-score
No cross-cutting concerns1-2
Touches logging or error handling3-4
Involves authentication or authorization5-6
Touches validation + auth + error handling7-8
Involves caching, i18n, auth, and monitoring9-10

Cross-cutting concerns: auth, logging, error handling, validation, caching, i18n, monitoring, rate limiting.

Factor 7: External API Integration (Weight: 5%)

IntegrationSub-score
No external APIs1
Uses existing internal API client2-3
New internal API endpoint4-5
New external API integration (well-documented)6-7
New external API (poorly documented, auth required)8-9
Multiple external APIs with webhooks10

Factor 8: Database Changes (Weight: 10%)

DB ChangesSub-score
No database changes1
Read-only queries2-3
New table (simple, no relations)4-5
New table with foreign keys and indexes6-7
Schema modification on existing table7-8
Complex migration with data transformation9-10

Process

Step 1: Parse the Task

Extract from the task description:

  • Files mentioned (to create or modify)
  • Technologies and packages referenced
  • Testing requirements stated or implied
  • Database changes mentioned
  • Integration points with other systems
  • Dependencies on other tasks

Step 2: Evaluate Each Factor

For each of the 8 factors:

  1. Assess the sub-score (1-10)
  2. Note the key reason for that score

Step 3: Calculate Weighted Score

code
finalScore = round(
  files * 0.20 +
  dependencies * 0.15 +
  testing * 0.15 +
  risk * 0.15 +
  newVsModify * 0.10 +
  crossCutting * 0.10 +
  externalApi * 0.05 +
  dbChanges * 0.10
)

Round to the nearest integer. Clamp between 1 and 10.

Step 4: Apply Adjustments

After calculating the weighted score, apply these adjustments:

  • First-of-its-kind bonus (+1): If this is the first implementation of a new pattern in the codebase
  • Uncertainty bonus (+1): If the task description is vague or requirements are unclear
  • Pattern discount (-1): If the task is a carbon copy of an existing implementation (e.g., "same as User model but for Accommodation")
  • Blocked tasks penalty (+1): If this task blocks 3 or more other tasks (high-impact, needs extra care)

Re-clamp between 1 and 10 after adjustments.

Step 5: Generate Justification

Write a 1-2 sentence justification explaining the score. Focus on the dominant factors.

Good justifications:

  • "Score 5: Touches 4 files with moderate testing needs. Follows existing CRUD pattern but requires new validation logic for price ranges."
  • "Score 8: New authentication flow affecting 8 files across 3 packages. Requires integration tests with mocked Clerk API and careful error handling."
  • "Score 2: Single config file change adding a new environment variable. No logic or tests needed."

Bad justifications:

  • "Score 5: Medium complexity." (too vague)
  • "Score 7: This is complex." (no reasoning)

Output

For a single task, return:

json
{
  "taskId": "T-001",
  "complexity": 5,
  "splitRequired": true,
  "justification": "Touches 4 files with moderate testing needs. Follows existing CRUD pattern but requires new validation logic for price ranges. Score exceeds threshold 4 — must be decomposed further.",
  "factors": {
    "files": 5,
    "dependencies": 3,
    "testing": 5,
    "risk": 4,
    "newVsModify": 3,
    "crossCutting": 2,
    "externalApi": 1,
    "dbChanges": 5
  }
}

The splitRequired field is computed as: complexity > 4. When true, the task cannot be started and must be decomposed by the task-atomizer into smaller tasks that each score ≤ 4.

For batch scoring (multiple tasks), return an array of the above objects.

Batch Mode

When scoring multiple tasks at once (e.g., all tasks from the task-atomizer), also provide:

  • Average complexity: Mean score across all tasks
  • Complexity distribution: Count of tasks per score level
  • Tasks requiring split: All tasks with complexity > 4 (these MUST be decomposed further)
  • Highest complexity tasks: Top 3 most complex tasks (these need the most aggressive splitting)
  • Atomic task count: Number of tasks with complexity ≤ 4 (ready to execute)
  • Split required count: Number of tasks with complexity > 4 (need further decomposition)

Context-Aware Scoring

If codebase context is provided, use it to improve accuracy:

  1. Check for existing patterns: If the task says "create a model for X" and there are existing models, check how complex those models are
  2. Check file count: If specific files are mentioned, verify they exist and assess their complexity
  3. Check test coverage: If the codebase has good test patterns, testing complexity may be lower (patterns to follow)
  4. Check dependencies: Verify that mentioned packages are already installed or truly need to be added

Without codebase context, score based on the description alone, but note that scores may be less accurate.