/implement-paper

Implements machine learning, AI, LLM, or AI agent papers from arXiv as Python projects.

Usage

code

/implement-paper <arxiv_url_or_id> [options]

Arguments

•<arxiv_url_or_id>: arXiv paper URL (e.g., https://arxiv.org/abs/2301.00001) or ID (e.g., 2301.00001)

Options

•--framework <name>: ML framework to use (default: pytorch). Options: pytorch, tensorflow, jax
•--minimal: Create minimal implementation without training/evaluation scripts
•--clone-ref: Clone reference implementation if found
•--fix-from-review: Apply fixes from an existing REVIEW.md file (skips implementation, goes directly to fix phase)

Workflow

When this skill is invoked, follow these phases in order:

Phase 1: Paper Acquisition and Analysis

•

Fetch paper metadata and PDF

bash

cd /Users/shibuiyusuke/tmp/paper2code
uv run python -c "
from src import fetch_paper, extract_text_from_pdf
from pathlib import Path

paper, pdf_path = fetch_paper('<arxiv_url_or_id>', Path('./paper_impl'))
print(f'Title: {paper.title}')
print(f'arXiv ID: {paper.arxiv_id}')
print(f'PDF saved to: {pdf_path}')
"

•

Search for reference implementations

bash

uv run python -c "
from src import fetch_paper, find_reference_implementation, extract_text_from_pdf
from pathlib import Path

paper, pdf_path = fetch_paper('<arxiv_url_or_id>', Path('./paper_impl'))
pdf_text = extract_text_from_pdf(pdf_path) if pdf_path else ''
repos = find_reference_implementation(paper, pdf_text)

for repo in repos:
    official = ' [OFFICIAL]' if repo.is_official else ''
    print(f'{repo.url}{official} (stars: {repo.stars})')
"

•
Read the PDF using the Read tool to understand the paper:
- •Read the downloaded PDF at paper_impl/{arxiv_id}/
- •Focus on: Abstract, Introduction, Method/Approach sections, Experiments, Appendix

Phase 2: Paper Understanding

Analyze the paper systematically to extract implementation requirements:

•
Core Algorithm/Model
- •What is the main contribution?
- •What are the key equations/formulas?
- •What is the model architecture (for neural networks)?
- •What are the algorithmic steps (for non-NN methods)?
•
Input/Output Specifications
- •What are the expected inputs (shapes, types, ranges)?
- •What are the outputs?
- •What preprocessing is required?
•
Hyperparameters
- •List all hyperparameters with their default values
- •Note which are critical vs. optional
•
Dependencies
- •What external libraries are needed?
- •Are there pretrained models or datasets required?
•
Training Details (if applicable)
- •Loss function(s)
- •Optimizer and learning rate schedule
- •Batch size and training epochs
- •Data augmentation techniques

Phase 3: Implementation

Create the project structure under paper_impl/{arxiv_id}/:

code

paper_impl/{arxiv_id}/
├── README.md           # Paper info, usage, citation
├── requirements.txt    # Dependencies
├── src/
│   ├── __init__.py
│   ├── model.py        # Main model/algorithm
│   ├── layers.py       # Custom layers/modules (if needed)
│   ├── utils.py        # Utility functions
│   └── config.py       # Hyperparameters and configuration
├── scripts/
│   ├── train.py        # Training script
│   ├── evaluate.py     # Evaluation script
│   └── demo.py         # Demo/inference script
└── tests/
    └── test_model.py   # Unit tests

Implementation Guidelines

•
Start with the core model/algorithm
- •Implement the central contribution first
- •Map equations directly to code with comments
- •Use descriptive variable names matching paper notation

•

Code quality requirements

•Use type hints for all functions
•Add docstrings referencing paper sections/equations
•Include shape comments for tensor operations

python

def forward(self, x: torch.Tensor) -> torch.Tensor:
    """Forward pass implementing Eq. (3) from Section 3.2.

    Args:
        x: Input tensor of shape (batch_size, seq_len, d_model)

    Returns:
        Output tensor of shape (batch_size, seq_len, d_model)
    """
    # Eq. (3): y = softmax(QK^T / sqrt(d_k)) V
    attn = torch.softmax(q @ k.transpose(-2, -1) / math.sqrt(d_k), dim=-1)  # (B, H, L, L)
    return attn @ v  # (B, H, L, D)

•
Numerical stability
- •Use log_softmax instead of softmax + log when possible
- •Add small epsilon to denominators to prevent division by zero
- •Use torch.clamp for values that should be bounded
•
Match paper exactly
- •Use the same initialization schemes
- •Implement the same normalization (LayerNorm, BatchNorm, etc.)
- •Follow the exact order of operations

Phase 4: Verification

•

Shape tests

python

def test_model_shapes():
    model = Model(config)
    x = torch.randn(2, 10, 512)  # (batch, seq, dim)
    y = model(x)
    assert y.shape == (2, 10, 512)

•

Gradient flow test

python

def test_gradients():
    model = Model(config)
    x = torch.randn(2, 10, 512, requires_grad=True)
    y = model(x)
    loss = y.sum()
    loss.backward()
    assert x.grad is not None
    assert not torch.isnan(x.grad).any()

•
Compare with reference (if available)
- •Load reference implementation
- •Compare outputs for same inputs
- •Document any differences

Phase 5: Documentation

Create README.md with:

markdown

# {Paper Title}

Implementation of [{Paper Title}]({arxiv_url}) in {framework}.

## Paper Information

- **Title**: {title}
- **Authors**: {authors}
- **arXiv**: [{arxiv_id}]({arxiv_url})
- **Published**: {date}

## Abstract

{abstract}

## Installation

```bash
pip install -r requirements.txt

Usage

Quick Start

python

from src.model import Model
from src.config import Config

config = Config()
model = Model(config)
output = model(input_data)

Training

bash

python scripts/train.py --config config.yaml

Evaluation

bash

python scripts/evaluate.py --checkpoint path/to/model.pt

Implementation Notes

•{Note any deviations from paper}
•{Note any ambiguities resolved}
•{Note any assumptions made}

Citation

bibtex

{bibtex_citation}

License

This implementation is provided for research purposes.

code


### Phase 6: Apply Fixes from Review (--fix-from-review)

When the `--fix-from-review` option is specified, skip Phases 1-5 and directly apply fixes from an existing review:

1. **Locate the review file**
   ```bash
   cd /Users/shibuiyusuke/tmp/paper2code
   uv run python -c "
   from src import extract_arxiv_id, fetch_paper_metadata
   from pathlib import Path

   paper = fetch_paper_metadata(extract_arxiv_id('<arxiv_url_or_id>'))
   impl_dir = Path('./paper_impl') / paper.clean_id
   review_path = impl_dir / 'REVIEW.md'

   if not review_path.exists():
       print(f'ERROR: Review not found at {review_path}')
       print('Run /review-implementation first to generate a review.')
       exit(1)

   print(f'Implementation directory: {impl_dir}')
   print(f'Review file: {review_path}')
   print(f'Paper: {paper.title}')
   "

•
Read and parse the review
- •Use the Read tool to read paper_impl/{arxiv_id}/REVIEW.md
- •Identify all sections marked with "Proposed Fix" or "Fix for:"
- •
  Extract the following from each fix proposal:
  - •File path: The file to modify
  - •Current code: The existing problematic code
  - •Proposed code: The corrected code
  - •Paper reference: The equation/section being fixed
•
Read the paper PDF (for context)
- •Read the PDF at paper_impl/{arxiv_id}/{arxiv_id}.pdf or paper_impl/{arxiv_id}.pdf
- •Focus on the sections referenced in the fix proposals
- •Understand the mathematical formulations being implemented
•
Apply each fix systematically

For each fix proposal in the review:

a. Read the target file using the Read tool

b. Locate the code to fix
- •Find the exact location matching the "Current Code" section
- •Verify the code context matches
c. Apply the fix using the Edit tool
- •Replace the problematic code with the proposed fix
- •Ensure indentation and formatting match the file style
- •Add/update comments referencing the paper equation
d. Verify the fix
- •Check that the edit was applied correctly
- •Ensure no syntax errors were introduced

•

Run tests after all fixes

bash

cd paper_impl/{arxiv_id}
python -m pytest tests/ -v

•
Update the review file

After applying fixes, update REVIEW.md:
- •Change status of fixed components from "Incorrect" to "Correct" or "Fixed"
- •Add a "Fix Applied" section noting when fixes were applied
- •Example addition:
markdown
```
## Fix History

| Date | Issue | Status |
|------|-------|--------|
| {current_date} | Importance Recalibration (Eq. 5) | Fixed |
| {current_date} | Clustering Algorithm | Fixed |
```

Fix Application Guidelines

•
Order of operations
- •Apply fixes in order of dependency (if A depends on B, fix B first)
- •Start with core algorithm fixes, then move to peripheral components
•
When fixes conflict
- •If two fixes affect the same code region, apply them carefully
- •Consider combining related fixes into a single edit
•
If a fix is ambiguous
- •Read the referenced paper section for clarity
- •Make the most conservative interpretation
- •Document any assumptions in code comments
•
Validation after each fix
- •Run relevant unit tests if they exist
- •Check that the module can be imported without errors
python
```
python -c "from src.{module} import *"
```
•
If tests fail after fixes
- •Check if tests need to be updated for the corrected behavior
- •The fix may reveal that tests were testing incorrect behavior
- •Update tests to match the paper's specification

Example Fix Application

Given a review with this fix proposal:

markdown

#### Fix for: Importance Recalibration (Eq. 5)

**File**: `src/nexus_weaver.py`
**Lines**: 445-493

**Current Code**:
```python
def _recalibrate_importance(self, particle: InsightParticle) -> float:
    # Count strands (incorrect interpretation)
    ia_links = len(particle.relational_strands)

Proposed Code:

python

def _recalibrate_importance(self, particle: InsightParticle) -> float:
    # Count IAs that reference this particle (correct per paper)
    ia_links = sum(
        1 for ia in self.strg.get_all_aggregates()
        if particle.particle_id in ia.derived_from_ids
    )

code


Apply using:
1. Read `src/nexus_weaver.py`
2. Find the `_recalibrate_importance` method
3. Use Edit tool to replace the incorrect IA counting logic
4. Run tests: `pytest tests/test_model.py -v -k "importance"`

## Important Guidelines

### When Reading the Paper

- **Pay attention to subscripts and notation**: Papers often define custom notation
- **Check the Appendix**: Implementation details are frequently in appendices
- **Read figure captions**: They often contain architecture details
- **Look for "Algorithm" boxes**: These provide step-by-step pseudocode

### When Implementing

- **Don't overcomplicate**: Start simple, add complexity as needed
- **Test incrementally**: Verify each component before combining
- **Use reference implementations**: When available, compare outputs
- **Document assumptions**: Note where the paper was ambiguous

### Error Handling

If you encounter issues:

1. **Paper is too vague**: Note the ambiguity, make a reasonable assumption, document it
2. **No reference implementation**: Implement based on paper, add extra tests
3. **Results don't match**: Document the difference, check for common issues (initialization, normalization, hyperparameters)

## Example Sessions

### Standard Implementation

User: /implement-paper https://arxiv.org/abs/2010.11929

Claude: I'll implement "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" (ViT).

[Phase 1: Fetching paper...] [Phase 2: Analyzing architecture...] [Phase 3: Implementing...] [Phase 4: Testing...] [Phase 5: Documenting...]

Implementation complete at paper_impl/2010_11929/

code


### Fix from Review

User: /implement-paper https://arxiv.org/abs/2506.08098 --fix-from-review

Claude: I'll apply fixes from the existing review for "Cognitive Weave: Synthesizing Abstracted Knowledge with a Spatio-Temporal Resonance Graph".

[Phase 6: Applying fixes from review...]

Found REVIEW.md with 3 fix proposals:

•Importance Recalibration (Eq. 5) - src/nexus_weaver.py
•Clustering Algorithm - src/nexus_weaver.py
•Relational Strand Strength - src/strg.py

Applying fix 1/3: Importance Recalibration...

•Reading src/nexus_weaver.py
•Applying fix to _recalibrate_importance method
•Fix applied successfully

Applying fix 2/3: Clustering Algorithm...

•Reading src/nexus_weaver.py
•Applying fix to _identify_clusters method
•Fix applied successfully

Applying fix 3/3: Relational Strand Strength...

•Reading src/strg.py
•Applying fix to _calculate_strand_strength method
•Fix applied successfully

Running tests... tests/test_model.py::test_insight_particle_creation PASSED tests/test_model.py::test_importance_recalibration PASSED tests/test_model.py::test_clustering PASSED ... All 12 tests passed!

Updated REVIEW.md with fix history.

All fixes applied successfully!

code