/implement-paper
Implements machine learning, AI, LLM, or AI agent papers from arXiv as Python projects.
Usage
/implement-paper <arxiv_url_or_id> [options]
Arguments
- •
<arxiv_url_or_id>: arXiv paper URL (e.g.,https://arxiv.org/abs/2301.00001) or ID (e.g.,2301.00001)
Options
- •
--framework <name>: ML framework to use (default:pytorch). Options:pytorch,tensorflow,jax - •
--minimal: Create minimal implementation without training/evaluation scripts - •
--clone-ref: Clone reference implementation if found - •
--fix-from-review: Apply fixes from an existing REVIEW.md file (skips implementation, goes directly to fix phase)
Workflow
When this skill is invoked, follow these phases in order:
Phase 1: Paper Acquisition and Analysis
- •
Fetch paper metadata and PDF
bashcd /Users/shibuiyusuke/tmp/paper2code uv run python -c " from src import fetch_paper, extract_text_from_pdf from pathlib import Path paper, pdf_path = fetch_paper('<arxiv_url_or_id>', Path('./paper_impl')) print(f'Title: {paper.title}') print(f'arXiv ID: {paper.arxiv_id}') print(f'PDF saved to: {pdf_path}') " - •
Search for reference implementations
bashuv run python -c " from src import fetch_paper, find_reference_implementation, extract_text_from_pdf from pathlib import Path paper, pdf_path = fetch_paper('<arxiv_url_or_id>', Path('./paper_impl')) pdf_text = extract_text_from_pdf(pdf_path) if pdf_path else '' repos = find_reference_implementation(paper, pdf_text) for repo in repos: official = ' [OFFICIAL]' if repo.is_official else '' print(f'{repo.url}{official} (stars: {repo.stars})') " - •
Read the PDF using the Read tool to understand the paper:
- •Read the downloaded PDF at
paper_impl/{arxiv_id}/ - •Focus on: Abstract, Introduction, Method/Approach sections, Experiments, Appendix
- •Read the downloaded PDF at
Phase 2: Paper Understanding
Analyze the paper systematically to extract implementation requirements:
- •
Core Algorithm/Model
- •What is the main contribution?
- •What are the key equations/formulas?
- •What is the model architecture (for neural networks)?
- •What are the algorithmic steps (for non-NN methods)?
- •
Input/Output Specifications
- •What are the expected inputs (shapes, types, ranges)?
- •What are the outputs?
- •What preprocessing is required?
- •
Hyperparameters
- •List all hyperparameters with their default values
- •Note which are critical vs. optional
- •
Dependencies
- •What external libraries are needed?
- •Are there pretrained models or datasets required?
- •
Training Details (if applicable)
- •Loss function(s)
- •Optimizer and learning rate schedule
- •Batch size and training epochs
- •Data augmentation techniques
Phase 3: Implementation
Create the project structure under paper_impl/{arxiv_id}/:
paper_impl/{arxiv_id}/
├── README.md # Paper info, usage, citation
├── requirements.txt # Dependencies
├── src/
│ ├── __init__.py
│ ├── model.py # Main model/algorithm
│ ├── layers.py # Custom layers/modules (if needed)
│ ├── utils.py # Utility functions
│ └── config.py # Hyperparameters and configuration
├── scripts/
│ ├── train.py # Training script
│ ├── evaluate.py # Evaluation script
│ └── demo.py # Demo/inference script
└── tests/
└── test_model.py # Unit tests
Implementation Guidelines
- •
Start with the core model/algorithm
- •Implement the central contribution first
- •Map equations directly to code with comments
- •Use descriptive variable names matching paper notation
- •
Code quality requirements
- •Use type hints for all functions
- •Add docstrings referencing paper sections/equations
- •Include shape comments for tensor operations
pythondef forward(self, x: torch.Tensor) -> torch.Tensor: """Forward pass implementing Eq. (3) from Section 3.2. Args: x: Input tensor of shape (batch_size, seq_len, d_model) Returns: Output tensor of shape (batch_size, seq_len, d_model) """ # Eq. (3): y = softmax(QK^T / sqrt(d_k)) V attn = torch.softmax(q @ k.transpose(-2, -1) / math.sqrt(d_k), dim=-1) # (B, H, L, L) return attn @ v # (B, H, L, D) - •
Numerical stability
- •Use
log_softmaxinstead ofsoftmax+logwhen possible - •Add small epsilon to denominators to prevent division by zero
- •Use
torch.clampfor values that should be bounded
- •Use
- •
Match paper exactly
- •Use the same initialization schemes
- •Implement the same normalization (LayerNorm, BatchNorm, etc.)
- •Follow the exact order of operations
Phase 4: Verification
- •
Shape tests
pythondef test_model_shapes(): model = Model(config) x = torch.randn(2, 10, 512) # (batch, seq, dim) y = model(x) assert y.shape == (2, 10, 512) - •
Gradient flow test
pythondef test_gradients(): model = Model(config) x = torch.randn(2, 10, 512, requires_grad=True) y = model(x) loss = y.sum() loss.backward() assert x.grad is not None assert not torch.isnan(x.grad).any() - •
Compare with reference (if available)
- •Load reference implementation
- •Compare outputs for same inputs
- •Document any differences
Phase 5: Documentation
Create README.md with:
# {Paper Title}
Implementation of [{Paper Title}]({arxiv_url}) in {framework}.
## Paper Information
- **Title**: {title}
- **Authors**: {authors}
- **arXiv**: [{arxiv_id}]({arxiv_url})
- **Published**: {date}
## Abstract
{abstract}
## Installation
```bash
pip install -r requirements.txt
Usage
Quick Start
from src.model import Model from src.config import Config config = Config() model = Model(config) output = model(input_data)
Training
python scripts/train.py --config config.yaml
Evaluation
python scripts/evaluate.py --checkpoint path/to/model.pt
Implementation Notes
- •{Note any deviations from paper}
- •{Note any ambiguities resolved}
- •{Note any assumptions made}
Citation
{bibtex_citation}
License
This implementation is provided for research purposes.
### Phase 6: Apply Fixes from Review (--fix-from-review)
When the `--fix-from-review` option is specified, skip Phases 1-5 and directly apply fixes from an existing review:
1. **Locate the review file**
```bash
cd /Users/shibuiyusuke/tmp/paper2code
uv run python -c "
from src import extract_arxiv_id, fetch_paper_metadata
from pathlib import Path
paper = fetch_paper_metadata(extract_arxiv_id('<arxiv_url_or_id>'))
impl_dir = Path('./paper_impl') / paper.clean_id
review_path = impl_dir / 'REVIEW.md'
if not review_path.exists():
print(f'ERROR: Review not found at {review_path}')
print('Run /review-implementation first to generate a review.')
exit(1)
print(f'Implementation directory: {impl_dir}')
print(f'Review file: {review_path}')
print(f'Paper: {paper.title}')
"
- •
Read and parse the review
- •Use the Read tool to read
paper_impl/{arxiv_id}/REVIEW.md - •Identify all sections marked with "Proposed Fix" or "Fix for:"
- •Extract the following from each fix proposal:
- •File path: The file to modify
- •Current code: The existing problematic code
- •Proposed code: The corrected code
- •Paper reference: The equation/section being fixed
- •Use the Read tool to read
- •
Read the paper PDF (for context)
- •Read the PDF at
paper_impl/{arxiv_id}/{arxiv_id}.pdforpaper_impl/{arxiv_id}.pdf - •Focus on the sections referenced in the fix proposals
- •Understand the mathematical formulations being implemented
- •Read the PDF at
- •
Apply each fix systematically
For each fix proposal in the review:
a. Read the target file using the Read tool
b. Locate the code to fix
- •Find the exact location matching the "Current Code" section
- •Verify the code context matches
c. Apply the fix using the Edit tool
- •Replace the problematic code with the proposed fix
- •Ensure indentation and formatting match the file style
- •Add/update comments referencing the paper equation
d. Verify the fix
- •Check that the edit was applied correctly
- •Ensure no syntax errors were introduced
- •
Run tests after all fixes
bashcd paper_impl/{arxiv_id} python -m pytest tests/ -v - •
Update the review file
After applying fixes, update
REVIEW.md:- •Change status of fixed components from "Incorrect" to "Correct" or "Fixed"
- •Add a "Fix Applied" section noting when fixes were applied
- •Example addition:
markdown## Fix History | Date | Issue | Status | |------|-------|--------| | {current_date} | Importance Recalibration (Eq. 5) | Fixed | | {current_date} | Clustering Algorithm | Fixed |
Fix Application Guidelines
- •
Order of operations
- •Apply fixes in order of dependency (if A depends on B, fix B first)
- •Start with core algorithm fixes, then move to peripheral components
- •
When fixes conflict
- •If two fixes affect the same code region, apply them carefully
- •Consider combining related fixes into a single edit
- •
If a fix is ambiguous
- •Read the referenced paper section for clarity
- •Make the most conservative interpretation
- •Document any assumptions in code comments
- •
Validation after each fix
- •Run relevant unit tests if they exist
- •Check that the module can be imported without errors
pythonpython -c "from src.{module} import *" - •
If tests fail after fixes
- •Check if tests need to be updated for the corrected behavior
- •The fix may reveal that tests were testing incorrect behavior
- •Update tests to match the paper's specification
Example Fix Application
Given a review with this fix proposal:
#### Fix for: Importance Recalibration (Eq. 5)
**File**: `src/nexus_weaver.py`
**Lines**: 445-493
**Current Code**:
```python
def _recalibrate_importance(self, particle: InsightParticle) -> float:
# Count strands (incorrect interpretation)
ia_links = len(particle.relational_strands)
Proposed Code:
def _recalibrate_importance(self, particle: InsightParticle) -> float:
# Count IAs that reference this particle (correct per paper)
ia_links = sum(
1 for ia in self.strg.get_all_aggregates()
if particle.particle_id in ia.derived_from_ids
)
Apply using: 1. Read `src/nexus_weaver.py` 2. Find the `_recalibrate_importance` method 3. Use Edit tool to replace the incorrect IA counting logic 4. Run tests: `pytest tests/test_model.py -v -k "importance"` ## Important Guidelines ### When Reading the Paper - **Pay attention to subscripts and notation**: Papers often define custom notation - **Check the Appendix**: Implementation details are frequently in appendices - **Read figure captions**: They often contain architecture details - **Look for "Algorithm" boxes**: These provide step-by-step pseudocode ### When Implementing - **Don't overcomplicate**: Start simple, add complexity as needed - **Test incrementally**: Verify each component before combining - **Use reference implementations**: When available, compare outputs - **Document assumptions**: Note where the paper was ambiguous ### Error Handling If you encounter issues: 1. **Paper is too vague**: Note the ambiguity, make a reasonable assumption, document it 2. **No reference implementation**: Implement based on paper, add extra tests 3. **Results don't match**: Document the difference, check for common issues (initialization, normalization, hyperparameters) ## Example Sessions ### Standard Implementation
User: /implement-paper https://arxiv.org/abs/2010.11929
Claude: I'll implement "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" (ViT).
[Phase 1: Fetching paper...] [Phase 2: Analyzing architecture...] [Phase 3: Implementing...] [Phase 4: Testing...] [Phase 5: Documenting...]
Implementation complete at paper_impl/2010_11929/
### Fix from Review
User: /implement-paper https://arxiv.org/abs/2506.08098 --fix-from-review
Claude: I'll apply fixes from the existing review for "Cognitive Weave: Synthesizing Abstracted Knowledge with a Spatio-Temporal Resonance Graph".
[Phase 6: Applying fixes from review...]
Found REVIEW.md with 3 fix proposals:
- •Importance Recalibration (Eq. 5) - src/nexus_weaver.py
- •Clustering Algorithm - src/nexus_weaver.py
- •Relational Strand Strength - src/strg.py
Applying fix 1/3: Importance Recalibration...
- •Reading src/nexus_weaver.py
- •Applying fix to _recalibrate_importance method
- •Fix applied successfully
Applying fix 2/3: Clustering Algorithm...
- •Reading src/nexus_weaver.py
- •Applying fix to _identify_clusters method
- •Fix applied successfully
Applying fix 3/3: Relational Strand Strength...
- •Reading src/strg.py
- •Applying fix to _calculate_strand_strength method
- •Fix applied successfully
Running tests... tests/test_model.py::test_insight_particle_creation PASSED tests/test_model.py::test_importance_recalibration PASSED tests/test_model.py::test_clustering PASSED ... All 12 tests passed!
Updated REVIEW.md with fix history.
All fixes applied successfully!