AgentSkillsCN

Implement Paper

实施 Paper

SKILL.md

/implement-paper

Implements machine learning, AI, LLM, or AI agent papers from arXiv as Python projects.

Usage

code
/implement-paper <arxiv_url_or_id> [options]

Arguments

  • <arxiv_url_or_id>: arXiv paper URL (e.g., https://arxiv.org/abs/2301.00001) or ID (e.g., 2301.00001)

Options

  • --framework <name>: ML framework to use (default: pytorch). Options: pytorch, tensorflow, jax
  • --minimal: Create minimal implementation without training/evaluation scripts
  • --clone-ref: Clone reference implementation if found
  • --fix-from-review: Apply fixes from an existing REVIEW.md file (skips implementation, goes directly to fix phase)

Workflow

When this skill is invoked, follow these phases in order:

Phase 1: Paper Acquisition and Analysis

  1. Fetch paper metadata and PDF

    bash
    cd /Users/shibuiyusuke/tmp/paper2code
    uv run python -c "
    from src import fetch_paper, extract_text_from_pdf
    from pathlib import Path
    
    paper, pdf_path = fetch_paper('<arxiv_url_or_id>', Path('./paper_impl'))
    print(f'Title: {paper.title}')
    print(f'arXiv ID: {paper.arxiv_id}')
    print(f'PDF saved to: {pdf_path}')
    "
    
  2. Search for reference implementations

    bash
    uv run python -c "
    from src import fetch_paper, find_reference_implementation, extract_text_from_pdf
    from pathlib import Path
    
    paper, pdf_path = fetch_paper('<arxiv_url_or_id>', Path('./paper_impl'))
    pdf_text = extract_text_from_pdf(pdf_path) if pdf_path else ''
    repos = find_reference_implementation(paper, pdf_text)
    
    for repo in repos:
        official = ' [OFFICIAL]' if repo.is_official else ''
        print(f'{repo.url}{official} (stars: {repo.stars})')
    "
    
  3. Read the PDF using the Read tool to understand the paper:

    • Read the downloaded PDF at paper_impl/{arxiv_id}/
    • Focus on: Abstract, Introduction, Method/Approach sections, Experiments, Appendix

Phase 2: Paper Understanding

Analyze the paper systematically to extract implementation requirements:

  1. Core Algorithm/Model

    • What is the main contribution?
    • What are the key equations/formulas?
    • What is the model architecture (for neural networks)?
    • What are the algorithmic steps (for non-NN methods)?
  2. Input/Output Specifications

    • What are the expected inputs (shapes, types, ranges)?
    • What are the outputs?
    • What preprocessing is required?
  3. Hyperparameters

    • List all hyperparameters with their default values
    • Note which are critical vs. optional
  4. Dependencies

    • What external libraries are needed?
    • Are there pretrained models or datasets required?
  5. Training Details (if applicable)

    • Loss function(s)
    • Optimizer and learning rate schedule
    • Batch size and training epochs
    • Data augmentation techniques

Phase 3: Implementation

Create the project structure under paper_impl/{arxiv_id}/:

code
paper_impl/{arxiv_id}/
├── README.md           # Paper info, usage, citation
├── requirements.txt    # Dependencies
├── src/
│   ├── __init__.py
│   ├── model.py        # Main model/algorithm
│   ├── layers.py       # Custom layers/modules (if needed)
│   ├── utils.py        # Utility functions
│   └── config.py       # Hyperparameters and configuration
├── scripts/
│   ├── train.py        # Training script
│   ├── evaluate.py     # Evaluation script
│   └── demo.py         # Demo/inference script
└── tests/
    └── test_model.py   # Unit tests

Implementation Guidelines

  1. Start with the core model/algorithm

    • Implement the central contribution first
    • Map equations directly to code with comments
    • Use descriptive variable names matching paper notation
  2. Code quality requirements

    • Use type hints for all functions
    • Add docstrings referencing paper sections/equations
    • Include shape comments for tensor operations
    python
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """Forward pass implementing Eq. (3) from Section 3.2.
    
        Args:
            x: Input tensor of shape (batch_size, seq_len, d_model)
    
        Returns:
            Output tensor of shape (batch_size, seq_len, d_model)
        """
        # Eq. (3): y = softmax(QK^T / sqrt(d_k)) V
        attn = torch.softmax(q @ k.transpose(-2, -1) / math.sqrt(d_k), dim=-1)  # (B, H, L, L)
        return attn @ v  # (B, H, L, D)
    
  3. Numerical stability

    • Use log_softmax instead of softmax + log when possible
    • Add small epsilon to denominators to prevent division by zero
    • Use torch.clamp for values that should be bounded
  4. Match paper exactly

    • Use the same initialization schemes
    • Implement the same normalization (LayerNorm, BatchNorm, etc.)
    • Follow the exact order of operations

Phase 4: Verification

  1. Shape tests

    python
    def test_model_shapes():
        model = Model(config)
        x = torch.randn(2, 10, 512)  # (batch, seq, dim)
        y = model(x)
        assert y.shape == (2, 10, 512)
    
  2. Gradient flow test

    python
    def test_gradients():
        model = Model(config)
        x = torch.randn(2, 10, 512, requires_grad=True)
        y = model(x)
        loss = y.sum()
        loss.backward()
        assert x.grad is not None
        assert not torch.isnan(x.grad).any()
    
  3. Compare with reference (if available)

    • Load reference implementation
    • Compare outputs for same inputs
    • Document any differences

Phase 5: Documentation

Create README.md with:

markdown
# {Paper Title}

Implementation of [{Paper Title}]({arxiv_url}) in {framework}.

## Paper Information

- **Title**: {title}
- **Authors**: {authors}
- **arXiv**: [{arxiv_id}]({arxiv_url})
- **Published**: {date}

## Abstract

{abstract}

## Installation

```bash
pip install -r requirements.txt

Usage

Quick Start

python
from src.model import Model
from src.config import Config

config = Config()
model = Model(config)
output = model(input_data)

Training

bash
python scripts/train.py --config config.yaml

Evaluation

bash
python scripts/evaluate.py --checkpoint path/to/model.pt

Implementation Notes

  • {Note any deviations from paper}
  • {Note any ambiguities resolved}
  • {Note any assumptions made}

Citation

bibtex
{bibtex_citation}

License

This implementation is provided for research purposes.

code

### Phase 6: Apply Fixes from Review (--fix-from-review)

When the `--fix-from-review` option is specified, skip Phases 1-5 and directly apply fixes from an existing review:

1. **Locate the review file**
   ```bash
   cd /Users/shibuiyusuke/tmp/paper2code
   uv run python -c "
   from src import extract_arxiv_id, fetch_paper_metadata
   from pathlib import Path

   paper = fetch_paper_metadata(extract_arxiv_id('<arxiv_url_or_id>'))
   impl_dir = Path('./paper_impl') / paper.clean_id
   review_path = impl_dir / 'REVIEW.md'

   if not review_path.exists():
       print(f'ERROR: Review not found at {review_path}')
       print('Run /review-implementation first to generate a review.')
       exit(1)

   print(f'Implementation directory: {impl_dir}')
   print(f'Review file: {review_path}')
   print(f'Paper: {paper.title}')
   "
  1. Read and parse the review

    • Use the Read tool to read paper_impl/{arxiv_id}/REVIEW.md
    • Identify all sections marked with "Proposed Fix" or "Fix for:"
    • Extract the following from each fix proposal:
      • File path: The file to modify
      • Current code: The existing problematic code
      • Proposed code: The corrected code
      • Paper reference: The equation/section being fixed
  2. Read the paper PDF (for context)

    • Read the PDF at paper_impl/{arxiv_id}/{arxiv_id}.pdf or paper_impl/{arxiv_id}.pdf
    • Focus on the sections referenced in the fix proposals
    • Understand the mathematical formulations being implemented
  3. Apply each fix systematically

    For each fix proposal in the review:

    a. Read the target file using the Read tool

    b. Locate the code to fix

    • Find the exact location matching the "Current Code" section
    • Verify the code context matches

    c. Apply the fix using the Edit tool

    • Replace the problematic code with the proposed fix
    • Ensure indentation and formatting match the file style
    • Add/update comments referencing the paper equation

    d. Verify the fix

    • Check that the edit was applied correctly
    • Ensure no syntax errors were introduced
  4. Run tests after all fixes

    bash
    cd paper_impl/{arxiv_id}
    python -m pytest tests/ -v
    
  5. Update the review file

    After applying fixes, update REVIEW.md:

    • Change status of fixed components from "Incorrect" to "Correct" or "Fixed"
    • Add a "Fix Applied" section noting when fixes were applied
    • Example addition:
    markdown
    ## Fix History
    
    | Date | Issue | Status |
    |------|-------|--------|
    | {current_date} | Importance Recalibration (Eq. 5) | Fixed |
    | {current_date} | Clustering Algorithm | Fixed |
    

Fix Application Guidelines

  1. Order of operations

    • Apply fixes in order of dependency (if A depends on B, fix B first)
    • Start with core algorithm fixes, then move to peripheral components
  2. When fixes conflict

    • If two fixes affect the same code region, apply them carefully
    • Consider combining related fixes into a single edit
  3. If a fix is ambiguous

    • Read the referenced paper section for clarity
    • Make the most conservative interpretation
    • Document any assumptions in code comments
  4. Validation after each fix

    • Run relevant unit tests if they exist
    • Check that the module can be imported without errors
    python
    python -c "from src.{module} import *"
    
  5. If tests fail after fixes

    • Check if tests need to be updated for the corrected behavior
    • The fix may reveal that tests were testing incorrect behavior
    • Update tests to match the paper's specification

Example Fix Application

Given a review with this fix proposal:

markdown
#### Fix for: Importance Recalibration (Eq. 5)

**File**: `src/nexus_weaver.py`
**Lines**: 445-493

**Current Code**:
```python
def _recalibrate_importance(self, particle: InsightParticle) -> float:
    # Count strands (incorrect interpretation)
    ia_links = len(particle.relational_strands)

Proposed Code:

python
def _recalibrate_importance(self, particle: InsightParticle) -> float:
    # Count IAs that reference this particle (correct per paper)
    ia_links = sum(
        1 for ia in self.strg.get_all_aggregates()
        if particle.particle_id in ia.derived_from_ids
    )
code

Apply using:
1. Read `src/nexus_weaver.py`
2. Find the `_recalibrate_importance` method
3. Use Edit tool to replace the incorrect IA counting logic
4. Run tests: `pytest tests/test_model.py -v -k "importance"`

## Important Guidelines

### When Reading the Paper

- **Pay attention to subscripts and notation**: Papers often define custom notation
- **Check the Appendix**: Implementation details are frequently in appendices
- **Read figure captions**: They often contain architecture details
- **Look for "Algorithm" boxes**: These provide step-by-step pseudocode

### When Implementing

- **Don't overcomplicate**: Start simple, add complexity as needed
- **Test incrementally**: Verify each component before combining
- **Use reference implementations**: When available, compare outputs
- **Document assumptions**: Note where the paper was ambiguous

### Error Handling

If you encounter issues:

1. **Paper is too vague**: Note the ambiguity, make a reasonable assumption, document it
2. **No reference implementation**: Implement based on paper, add extra tests
3. **Results don't match**: Document the difference, check for common issues (initialization, normalization, hyperparameters)

## Example Sessions

### Standard Implementation

User: /implement-paper https://arxiv.org/abs/2010.11929

Claude: I'll implement "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" (ViT).

[Phase 1: Fetching paper...] [Phase 2: Analyzing architecture...] [Phase 3: Implementing...] [Phase 4: Testing...] [Phase 5: Documenting...]

Implementation complete at paper_impl/2010_11929/

code

### Fix from Review

User: /implement-paper https://arxiv.org/abs/2506.08098 --fix-from-review

Claude: I'll apply fixes from the existing review for "Cognitive Weave: Synthesizing Abstracted Knowledge with a Spatio-Temporal Resonance Graph".

[Phase 6: Applying fixes from review...]

Found REVIEW.md with 3 fix proposals:

  1. Importance Recalibration (Eq. 5) - src/nexus_weaver.py
  2. Clustering Algorithm - src/nexus_weaver.py
  3. Relational Strand Strength - src/strg.py

Applying fix 1/3: Importance Recalibration...

  • Reading src/nexus_weaver.py
  • Applying fix to _recalibrate_importance method
  • Fix applied successfully

Applying fix 2/3: Clustering Algorithm...

  • Reading src/nexus_weaver.py
  • Applying fix to _identify_clusters method
  • Fix applied successfully

Applying fix 3/3: Relational Strand Strength...

  • Reading src/strg.py
  • Applying fix to _calculate_strand_strength method
  • Fix applied successfully

Running tests... tests/test_model.py::test_insight_particle_creation PASSED tests/test_model.py::test_importance_recalibration PASSED tests/test_model.py::test_clustering PASSED ... All 12 tests passed!

Updated REVIEW.md with fix history.

All fixes applied successfully!

code