pyarchivist Workflow
Use this skill when archiving web content, media, or online documents into the knowledge base.
What pyarchivist does
pyarchivist/ is a git submodule that automatically archives online content to archives/ and updates index.md files with metadata (source URL, timestamp, file hash).
When to use
- •Archiving articles, web pages, or media from online sources
- •Storing Wikimedia Commons images alongside notes
- •Creating permanent backups of time-sensitive online content
- •Auto-maintaining
archives/index.mdfiles
Basic workflow
- •Use pyarchivist's interface (CLI or Python API) to download and archive content
- •Specify target directory (
archives/Wikimedia Commons/for media,archives/sparse/for documents) - •pyarchivist auto-generates metadata (timestamp, source URL, content hash)
- •
index.mdentries are auto-created with source and timestamp information - •Filenames are generated consistently (hash-based for deduplication or descriptive for media)
Best practices
- •Let pyarchivist handle file naming and
index.mdupdates - •Use
archives/Wikimedia Commons/for images/media with descriptive names - •Use
archives/sparse/for miscellaneous content (hashes for filenames automatically) - •Always preserve source URL and timestamp metadata in
index.md - •Check that
index.mdwas updated correctly after archiving
Typical command pattern
shell
python -m pyarchivist [options] --target <archives/folder> <source_url>
(Exact interface depends on pyarchivist's implementation)
When in doubt
Consult the pyarchivist documentation or ask the user for guidance on specific archiving needs.