✅ Validate Paper Links Skill

Validate all [[Paper Title]] links in a paper note against the actual paper's references and citation patterns.

Goal: Prevent:

•Links to papers NOT referenced in the current paper (hallucination)
•Links to papers with incorrect titles or formatting
•Links to low-influence papers (cited once, not foundational)
•Broken file names (: not exchanged to -)

CRITICAL RULES

A link should exist only if:

•✅ The paper IS referenced in the current paper's bibliography/references
•✅ The paper is INFLUENTIAL (cited ≥2 times OR explicitly foundational/cited in intro/abstract)
•✅ Title is in exact Title Case matching the reference
•✅ File name properly formats : as - (e.g., Attention - Is All You Need)

A link should be removed if:

•❌ Paper is NOT in the current paper's references
•❌ Paper is cited only once in methods/results (low influence)
•❌ Title formatting is incorrect
•❌ File name violates naming convention

WORKFLOW

1. Get paper and note details

Read the paper note and extract:

•Paper URL: From frontmatter (arXiv HTML URL preferred)
•Paper title: From frontmatter or file name
•Authors: From frontmatter (if available)
•Year: From frontmatter (if available)

2. Fetch references section from arXiv HTML

Access paper:

•Use paper URL from frontmatter (typically https://arxiv.org/html/[paper-id]v[version])
•Use WebFetch with prompt focused only on references section
•Skip all paper content (abstract, introduction, methods, results, etc.)
•Extract only the bibliography/references section

Extract citation data:

•Parse reference list to get: titles, authors, years
•Build structured reference list from bibliography section
•Reference section is clearly marked on arXiv HTML pages

WebFetch optimization:

•Prompt: "Extract ONLY the bibliography/references section. Ignore all other paper content (abstract, introduction, methods, results). Return reference titles, authors, and years."
•This signals to skip 90% of paper content and focus only on references
•Saves tokens by not parsing paper text

Why arXiv HTML:

•✅ Direct access to paper bibliography
•✅ Pre-formatted references (easy to extract)
•✅ No API rate limiting or authentication
•✅ Reliable and consistent structure across papers
•✅ No JavaScript rendering issues
•✅ Can optimize WebFetch to skip content (token savings)

3. Read paper note

•Open paper note from Reading/[filename].md
•Extract all [[Paper Title]] links from "🔗 Connections to Other Work" section
•Build link list with current formatting

4. Validate each link

For each [[Paper Title]] in the note:

Check 1: Reference exists?

•Does this paper appear in the current paper's bibliography?
•If NO → FLAG: "Not referenced in paper" (candidate for removal)
•If YES → PASS

Check 2: Influence level

•Count how many times the paper is cited in the current paper's text
•Referenced paper with ≥2 citations in this paper = influential ✓
•Referenced paper cited once but foundational (cited in abstract/intro)? = influential ✓
•Referenced paper cited only once in methods/results (low-impact section)? = LOW influence (candidate for removal)
•If influence is borderline → ask user for judgment

Check 3: Title formatting

•Compare link title against reference list title
•Is it Title Case?
•Are colons : present in reference but - in link?
•Are special characters consistent?
•If mismatch → FLAG: "Title mismatch" or "Formatting error"

Check 4: File name validity

•Expected file name: exact title with : → -
•Is current link pointing to correct file name?
•Example: [[Attention - Is All You Need]] should match file Attention - Is All You Need.md
•If mismatch → FLAG: "File name doesn't match link"

5. Generate validation report

Group findings by severity:

🔴 CRITICAL (remove immediately):

•Papers not in references
•Papers cited only once in low-importance sections

🟡 NEEDS FIXING:

•Title case errors
•File name formatting errors (: not converted to -)

🟢 OK:

•Papers properly referenced
•Properly formatted
•Appropriately influential

❓ NEEDS USER JUDGMENT:

•Papers cited ~2 times (borderline influence)
•Foundational papers cited rarely but in key sections

6. Apply fixes

With user approval:

•Remove links: Delete [[Paper Title]] lines from "🔗 Connections to Other Work"
•Fix titles: Correct Title Case and formatting to match references
•Fix file names: Ensure : → - conversion
•Update note: Save corrected paper note
•Verify: Re-scan to confirm all links now valid

TOKEN EFFICIENCY

Optimizations:

•arXiv HTML for direct bibliography access (single WebFetch call)
•References-only extraction via targeted WebFetch prompt
•Skip all paper content (abstract, intro, methods, results)
•Bibliography section clearly marked and easy to extract
•No API calls, no rate limiting, no authentication needed
•Direct title + author list from bibliography
•Build simple title→reference mapping

Why this is efficient:

•1 WebFetch call to arXiv HTML (direct paper access via frontmatter URL)
•Targeted prompt skips 90% of paper content (saves major tokens)
•Bibliography section is well-formatted and clearly delimited
•No API dependencies or rate limiting concerns
•Minimal parsing needed (just extract reference titles)
•No ambiguity in reference data

Workflow:

•Read paper note frontmatter for arXiv HTML URL
•WebFetch the arXiv HTML page with prompt: "Extract ONLY bibliography/references section. Skip all other content."
•Extract bibliography/references section
•Build reference list from bibliography
•Validate links against reference list

Avoid:

•Semantic Scholar (JavaScript rendering, rate limiting issues)
•PDF parsing
•Regex patterns for extraction (bibliography is already structured)
•Processing full paper text (use targeted prompt instead)

Expected cost: ~100-150 tokens per paper (1 optimized WebFetch call + validation)

OUTPUT RULES

•Grouped by severity (CRITICAL → needs fixing → OK)
•File name / Link text / Issue / Suggested fix
•No prose, bullets only
•Example:

code

🔴 CRITICAL (remove):
- [[Paper Not In References]] - Not in bibliography → REMOVE

🟡 NEEDS FIXING:
- [[attention is all you need]] - Title case error → Fix to [[Attention Is All You Need]]
- [[My Paper: A Study]] - File name formatting → Should be [[My Paper - A Study]]

🟢 OK (keep):
- [[Transformers are All You Need]] ✓

SELF-CHECK

✅ Did I fetch the arXiv HTML page from the frontmatter URL? ✅ Did I extract all references from the bibliography section? ✅ Did I extract reference metadata (title, authors, year)? ✅ Did I check each link against the bibliography references? ✅ Did I assess influence (citation count/context in paper)? ✅ Did I validate Title Case formatting against bibliography titles? ✅ Did I check file name convention (: → -)? ✅ Did I group findings by severity? ✅ Did I get user approval before removing links? ✅ Did I verify all remaining links are valid?