Wiki Ingestion Workflow
Use this skill when importing Wikipedia articles or converting HTML content into Markdown notes.
What wiki ingestion does
Converts Wikipedia HTML (or similar web content) into well-formed Markdown with:
- •Normalized relative links (URL-encoded with
%20for spaces) - •Media references extracted to
archives/Wikimedia Commons/ - •YAML frontmatter scaffolding for new notes
- •Markdown table and list conversion
When to use
- •Importing encyclopedia articles from Wikipedia verbatim
- •Converting web pages to Markdown for knowledge base
- •Extracting and organizing media from online sources
- •Creating new notes with pre-filled structure from web content
Detailed workflow
Step 1: Scaffold new note
Command: python -m "templates.new wiki page"
- •
Script prompts for Wikipedia article name (e.g., "Fourier Transform")
- •
Generates YAML frontmatter template:
yaml--- aliases: [Alternative name] tags: [flashcard/active, language/in/English] ---
- •
Adds Wikipedia link comment:
<!-- Source: https://en.wikipedia.org/wiki/Article_Name --> - •
Copies template to clipboard
- •
Action: Paste into new file
general/Article Name.md(orspecial/if specialized content)
Step 2: Copy Wikipedia HTML to clipboard
- •Open Wikipedia article in browser
- •Select all content (Ctrl+A or Cmd+A)
- •Copy (Ctrl+C or Cmd+C)
- •Content is now in clipboard
Step 3: Ingest HTML
Command: python -m "convert wiki"
- •Tool reads from clipboard
- •Normalizes Markdown formatting (lists, tables, code, emphasis)
- •Downloads images to
archives/Wikimedia Commons/usingconvert wiki.py.names map.jsonfor filename renames - •Normalizes links to relative paths with
%20encoding (not%3Aor other encodings) - •Outputs Markdown that preserves Wikipedia structure
- •Action: Paste output below the frontmatter in your note file
Step 4: Generate flashcards
Command: python -m init generate <file>
- •Scans note for cloze markup:
{@{ hidden text }@},::@::,:@: - •Generates spaced-repetition flashcard state
- •Inserts generated flashcard regions marked with
<!--pytextgen generate section=...--> - •See pytextgen skill for details
Step 5: Review and finalize
- •Review
aliasesandtagsin YAML frontmatter - •Ensure all media references are correct (check
archives/Wikimedia Commons/) - •Verify cloze markup is added to key terms
- •Test regeneration:
python -m init generate -C <file> - •Commit when satisfied
Best practices
- •Check media archives: Ensure all images/files downloaded to
archives/Wikimedia Commons/with%20-encoded filenames - •Verify link normalization: Relative paths only; no external URLs unless absolutely necessary
- •YAML structure: Use markdown-notes conventions for
aliasesandtags - •Keep attribution: Preserve Wikipedia source URL in frontmatter or as HTML comment
- •Review formatting: Simplify complex tables/lists if needed; respect
.markdownlint.jsonsettings - •Test generation: After editing, run
python -m init generate <file>to verify flashcard creation - •Add cloze markup: Manually annotate key terms with
{@{ }@},::@::, or:@:for active recall
Common issues
- •Media download failures: Check if clipboard HTML is complete; retry
convert wiki - •Broken relative links: Verify
%20encoding for spaces (not%3Aor other encodings) - •Complex tables: Some Wikipedia tables don't convert well; manually edit to simpler Markdown format
- •Cloze markup missing: Manually add after generation; see pytextgen skill for syntax
Integration
- •Note scaffolding: Use tools-templates to understand frontmatter conventions
- •Flashcard generation: Use pytextgen to regenerate cloze markup into flashcards
- •Edit conventions: See editing-conventions for general rules while editing imported notes
Typical command pattern
bash
# Ingest from clipboard python -m "convert wiki" # Scaffold new wiki-sourced note python -m "templates.new wiki page"