calibre-metadata-apply
A skill for updating metadata of existing Calibre books.
Requirements
- •
calibredbmust be available on PATH in the runtime environment - •
subagent-spawn-command-builderinstalled (for spawn payload generation) - •Reachable Calibre Content server URL
- •
http://HOST:PORT/#LIBRARY_ID
- •
- •If authentication is enabled, pass both
--usernameand--password-env - •Optional auth cache:
--save-auth(default file:~/.config/calibre-metadata-apply/auth.json)
Supported fields
Direct fields (set_metadata --field)
- •
title - •
title_sort - •
authors(string with&or array) - •
author_sort - •
series - •
series_index - •
tags(string or array) - •
publisher - •
pubdate(YYYY-MM-DD) - •
languages - •
comments
Helper fields
- •
comments_html(OC marker block upsert) - •
analysis(auto-generates analysis HTML for comments) - •
analysis_tags(adds tags) - •
tags_merge(defaulttrue) - •
tags_remove(remove specific tags after merge)
Required execution flow
A. Target confirmation (mandatory)
- •Run read-only lookup to narrow candidates
- •Show
id,title,authors,series,series_index - •Get user confirmation for final target IDs
- •Build JSONL using only confirmed IDs
B. Proposal synthesis (when metadata is missing)
- •Collect evidence from file extraction + web sources
- •Show one merged proposal table with:
- •
candidate,source,confidence (high|medium|low) - •
title_sort_candidate,author_sort_candidate
- •
- •Get user decision:
- •
approve all - •
approve only: <fields> - •
reject: <fields> - •
edit: <field>=<value>
- •
- •Apply only approved/finalized fields
- •If confidence is low or sources conflict, keep fields empty
C. Apply
- •Run dry-run first (mandatory)
- •Run
--applyonly after explicit user approval - •Re-read and report final values
Analysis worker policy
- •Use
subagent-spawn-command-builderto generatesessions_spawnpayload for heavy candidate generation- •
taskis required. - •Profile should include model/thinking/timeout/cleanup for this workflow.
- •
- •Use lightweight subagent model for analysis (avoid main heavy model)
- •Keep final decisions + dry-run/apply in main
Long-run turn-split policy (library-wide)
For library-wide heavy processing, always use turn-split execution.
Unknown-document recovery flow (M3)
Batch sizing rule:
- •Keep each unknown-document batch small enough to show full row-by-row results in chat (no representative sampling).
- •If unresolved items remain, stop and wait for explicit user instruction to start the next batch.
User intervention checkpoints (fixed)
- •
Light pass (metadata-only)
- •Always run this stage by default (no extra user instruction required)
- •Analyze existing metadata only (no file content read)
- •Present a table to user:
- •current file/title
- •recommended title/metadata
- •confidence/evidence summary
- •Stop and wait for user instruction before any deeper stage
- •
On user request: page-1 pass
- •Read only the first page and refine proposals
- •Report delta from light pass
- •
If still uncertain: deep pass
- •Read first 5 pages + last 5 pages
- •Add web evidence search
- •Produce finalized proposal with confidence + rationale
- •
Approval gate
- •Show detailed findings and request explicit approval before apply
Pending and unsupported handling
- •Use
pending-reviewtag for unresolved/hold items. - •If document is unresolved in current flow, do not force metadata guesses.
- •Tag with
pending-reviewand keep for follow-up investigation.
- •Tag with
Diff report format (for unknown batch runs)
Return full results (not samples):
- •execution summary (target/changed/pending/skipped/error)
- •full changed list with
id+ key before/after fields - •full pending list with
id+ reason - •full error list with
id+ error summary - •confidence must be expressed as
high|medium|low
Runtime artifact policy
- •Keep run-state and temporary artifacts only while a run is active.
- •On successful completion, remove per-run state/artifacts.
- •On failure, keep minimal artifacts only for retry/debug, then clean up after resolution.
Internal orchestration (recommended)
- •Use lightweight subagent for all analysis stages
- •Keep apply decisions in main session
- •Persist run state for each stage in
state/runs.json
Turn 1 (start)
- •Main defines scope
- •Main generates spawn payload via
subagent-spawn-command-builder(profile example:calibre-meta), then callssessions_spawn - •Save
run_id/session_key/taskviascripts/run_state.py upsert - •Immediately tell the user this is a subagent job and state the execution model used for analysis
- •Reply with "analysis started" and keep normal chat responsive
Turn 2 (completion)
- •Receive subagent completion notice
- •Save result JSON
- •Complete state handling via
scripts/handle_completion.py --run-id ... --result-json ... - •Return summarized proposal (apply only when needed)
Run state file:
- •
state/runs.json
PDF extraction policy
- •Try
ebook-convertfirst - •If empty/failed, fallback to
pdftotext - •If both fail, switch to web-evidence-first mode
Sort reading policy
- •Use user-configured
reading_scriptfor Japanese/non-Latin sort fields- •
katakana/hiragana/latin
- •
- •Ask once on first use, then persist and reuse
- •Default policy is full reading (no truncation)
- •Config path:
~/.config/calibre-metadata-apply/config.json- •key:
reading_script
- •key:
Usage
Dry-run:
bash
cat changes.jsonl | python3 skills/calibre-metadata-apply/scripts/calibredb_apply.py \ --with-library "http://127.0.0.1:8080/#MyLibrary" \ --lang ja
Apply:
bash
cat changes.jsonl | python3 skills/calibre-metadata-apply/scripts/calibredb_apply.py \ --with-library "http://127.0.0.1:8080/#MyLibrary" \ --apply
Do not
- •Do not run direct
--applyusing ambiguous title matches only - •Do not include unconfirmed IDs in apply payload
- •Do not auto-fill low-confidence candidates without explicit confirmation