Code Duplication Prevention

Activation

When this skill is triggered, ALWAYS display this banner first:

code

╭─────────────────────────────────────────────────────────────╮
│  🔍 SKILL ACTIVATED: code-duplication-check                 │
├─────────────────────────────────────────────────────────────┤
│  Detected: Request to write new component                   │
│  Action: Searching codebase for existing implementations... │
╰─────────────────────────────────────────────────────────────╯

When to Use

This skill activates automatically when it detects requests to write specific components. Pattern matching on:

•"write a callback"
•"create a utility" / "add a helper"
•"implement a new loss" / "add a loss function"
•"add a metric" / "create a metric"
•"build a multi-GPU function" / "add multi-GPU support"
•"create a data augmentation" / "add augmentation"
•"implement a scheduler" / "add scheduler"
•"write a transform" / "add transform"

Key principle: Search first, code second. Only write new code after checking for existing implementations.

Search Process

Scope

Search entire src/ directory comprehensively.

Strategy

Combine two approaches:

•Keyword-based: Exact matches on function/class names, file names, key terms
•Semantic understanding: Conceptual similarity (e.g., "learning rate warmup" finds scheduler code)

What to Track

•ML-specific patterns: PyTorch Lightning callbacks, multi-GPU utilities, WandB logging, checkpoint handlers, losses, metrics, augmentations, schedulers
•General Python utilities: Reusable functions/classes (data processing, file I/O, config helpers)

Matching Strictness

Use judgment based on context:

•Strict for well-defined ML components (callbacks, losses, metrics)
•Broader for general utilities (more variation in implementations)

Result Limiting

•Show 1 match if very confident it's the right one
•Show up to 3 matches if uncertain about which is best
•Rank by relevance (name similarity, usage patterns, code structure)

Presentation Flow

When matches are found, PAUSE before writing any code and present:

code

I found similar code in the codebase:

**Match 1**: <ClassName/FunctionName> (<file_path>:<line_start>-<line_end>)
```<language>
<code snippet>

[Show up to 3 matches if applicable]

What would you like to do?

•Use existing - I'll show you how to import and use this
•Write new - I'll explain why this doesn't fit and write new code
•Adapt existing - I'll show what changes are needed

code


Each match shows:
- Code snippet (function/class definition)
- File location with line numbers
- Brief context if helpful

## Three Workflow Paths

### Path 1: Use Existing

1. Show import statement
2. Show usage example (how to add to trainer, what parameters it needs, etc.)
3. Confirm with user
4. Done

Example:

To use this callback:

•Import: from src.trainer.callbacks import MultiGPUCheckCallback
•Usage: Add to trainer callbacks list in your config or training script
•Parameters: None required (auto-detects GPU count)

code


### Path 2: Write New

1. **Justify first**: Explain why existing code doesn't meet requirements
2. Proceed to write the requested code
3. Document rationale in code comments or commit message

Example:

The existing callback only logs GPU count. Your request needs to track per-GPU memory usage and log to WandB, which isn't present. I'll write new code with these capabilities.

code


### Path 3: Adapt Existing

1. Show the existing code
2. Show proposed modifications (highlighted or explained)
3. Explain what changes are needed and why
4. Ask for confirmation before editing

Example:

I can adapt the existing MultiGPUCheckCallback by:

•Adding memory tracking using torch.cuda.memory_allocated()
•Adding WandB logging in on_train_batch_end()
•Adding configuration for memory threshold alerts

Proceed with these changes?

code


## Special Cases

### Registry-Based Components

For components using the project's registry pattern (`@register_model`, `@register_loss`):

1. **Check registry first**: Look in `src/<component_type>/__init__.py` for registered items
2. **Then check implementations**: Search individual files in `src/<component_type>/`
3. **Suggest registry pattern**: If match found in registry, show how to use it via `get_<component>()`

Example:

I found a registered loss function in the loss registry:

Available: sisdr (Scale-Invariant Signal-to-Distortion Ratio)

Usage:

python

from src.losses import get_loss_function
loss_fn = get_loss_function({"type": "sisdr"})

Is this what you need?

code


### Component-Specific Directories

Prioritize searches based on component type:
- **Callbacks**: Prioritize `src/trainer/`, check `src/utils/`
- **Losses/Metrics**: Check registry first, then `src/losses/`
- **Utilities**: Prioritize `src/utils/`, search all `src/`
- **Data augmentations**: `src/data/augmentations.py`, `src/data/`
- **Multi-GPU helpers**: `src/trainer/`, `src/config/`, `src/utils/`

### No Matches Found

If no similar code is found:
- **Proceed silently** without interruption
- Write the requested code normally
- No need to announce "no matches found"

Low-friction design: only intervene when reuse is possible.

## Implementation Tools

Use these tools for search and presentation:

1. **Grep**: Search for keywords and patterns

•Use with output_mode="files_with_matches" first to find files
•Then output_mode="content" with -A/-B context for snippets
•Try both exact keywords and related terms

code


2. **Glob**: Pattern matching for specific file types

•**/*.py for all Python files
•src/trainer/**/*.py for callbacks
•src/losses/*.py for losses

code


3. **Read**: Get code snippets with line numbers

4. **AskUserQuestion**: Present the 3-choice decision

## Search Examples

### Example 1: Callback Request

User: "Write a callback to log learning rate to WandB"

Search process:

•Grep for "callback" + "learning rate" in src/trainer/
•Grep for "lr" + "wandb" + "log" in src/
•Check if WandBLogger or similar exists
•Present matches with 3 options

code


### Example 2: Utility Request

User: "Create a utility to check if we're in DDP mode"

Search process:

•Grep for "ddp" + "distributed" in src/
•Grep for "multi.*gpu" + "trainer" in src/
•Check src/utils/ and src/config/ for existing checks
•Present matches or proceed if none found

code


### Example 3: Loss Function Request

User: "Implement a spectral convergence loss"

Search process:

•Check loss registry: grep "spectral" in src/losses/init.py
•Grep for "spectral" + "loss" in src/losses/
•Check multiscale_losses.py (likely location)
•Present matches or write new if truly novel

code


## Red Flags - Never Do This

❌ Write code immediately without searching first
❌ Show matches without file location and line numbers
❌ Proceed to "write new" without justifying why existing code won't work
❌ Search only in one directory (always comprehensive search)
❌ Announce "no matches found" (silent when nothing found)
❌ Show more than 3 matches (information overload)

## Success Criteria

✅ Auto-detects component writing requests
✅ Searches comprehensively before presenting options
✅ Shows code snippets with file locations
✅ Presents 3 clear options: use, write new, adapt
✅ Handles all three workflow paths completely
✅ Silent when no matches (low friction)
✅ Integrates with registry patterns when applicable

## Notes

- This skill reduces code duplication across the research team
- Promotes discovery of existing utilities and patterns
- Forces deliberate decision-making about new code vs. reuse
- Minimal friction: only intervenes when helpful