Deep Learning Scientific Paper Reviewer

Scope

Use this skill to produce actionable, evidence-based reviews of ML/DL papers (typically LaTeX). Optimize for:

•Technical correctness and soundness
•Clear writing and well-scoped claims
•Experimental rigor and fair comparisons
•Reproducibility, limitations, and ethics

Quick start (default workflow)

•Identify the venue style (if unspecified, assume NeurIPS/ICLR/ICML norms).
•
Extract claims:
- •Main contributions (1–3 bullets).
- •Key empirical/theoretical claims (what exactly is asserted).
•
Check technical soundness:
- •Definitions and notation introduced before use.
- •Assumptions stated; edge cases addressed.
- •Objective, training setup, and evaluation metrics consistent with claims.
•
Check experimental quality:
- •Baselines: strong and relevant; properly tuned.
- •Ablations: isolate each contribution.
- •Reporting: mean/variance, seeds, compute budget, failure cases.
- •Fairness: same data, model size, prompt format, training steps, etc.
•
Check writing/structure:
- •Abstract matches actual results and scope.
- •Figures/tables readable; captions self-contained.
- •Related work accurately positioned (no straw-manning).
•
Make recommendations:
- •List the smallest set of changes that would most improve the paper.
- •Separate major issues vs minor issues vs nits.

If the user provides only a section (e.g., abstract), review only that scope and explicitly state what you did not evaluate.

Output format (copy/paste template)

Provide the review in this structure unless the user requests a different one:

markdown

## Summary
- [1–3 bullets: what the paper does and why it matters]

## Contributions
- [C1]
- [C2]
- [C3 if needed]

## Strengths
- [S1]
- [S2]

## Weaknesses / Concerns
- **Major**: [must-fix issues; tie each to a claim, result, or missing control]
- **Minor**: [nice-to-fix issues]
- **Nits**: [typos, phrasing, formatting]

## Questions for the authors
- [Q1]
- [Q2]

## Suggested experiments / analyses
- [E1: concrete setup, baseline, metric]
- [E2]

## Reproducibility checklist (quick)
- **Compute**: [hardware/time reported?]
- **Evaluation**: [metrics + protocol unambiguous?]

## Limitations & ethics
- [limits, failure modes, misuse risks]

## Overall assessment
- **Novelty**: [low/medium/high + why]
- **Soundness**: [low/medium/high + why]
- **Clarity**: [low/medium/high + why]
- **Confidence**: [low/medium/high + why]

Review heuristics (what to look for)

Claims vs evidence

•Flag over-claims (e.g., “solves”, “guarantees”, “significantly”) without proper evidence.
•Ensure the paper distinguishes correlation vs causal interpretations.
•Check whether improvements are robust across datasets/seeds/hyperparams.

Common experimental pitfalls in DL papers

•Missing strong baselines or missing tuning for baselines.
•Inadequate ablations (multiple changes at once).
•Leakage (test set peeking; prompt/data contamination not discussed when relevant).
•Cherry-picked metrics/slices; no full distribution or failure cases.
•Comparison across different compute/model sizes without normalization.

LaTeX / presentation hygiene (when reviewing source)

•Undefined references (\ref{}, \cite{}), inconsistent capitalization, missing captions.
•Notation drift (same symbol used for different things).
•Figures: unreadable fonts, axes unlabeled, missing units.
•Tables: unclear bolding; missing variance; missing dataset details.

When asked to propose edits

If the user wants wording changes, propose:

•A revised paragraph (keep meaning, improve precision and flow).
•A “before/after” snippet when helpful.
•Avoid changing technical content unless explicitly asked.

Tone and severity

•Prefer direct, constructive language.
•
For each Major concern, include:
- •what is wrong
- •why it matters
- •how to fix / what evidence would resolve it