calibre-metadata-apply

A skill for updating metadata of existing Calibre books.

Requirements

•calibredb must be available on PATH in the runtime environment
•subagent-spawn-command-builder installed (for spawn payload generation)
•
Reachable Calibre Content server URL
- •http://HOST:PORT/#LIBRARY_ID
•If authentication is enabled, pass both --username and --password-env
•Optional auth cache: --save-auth (default file: ~/.config/calibre-metadata-apply/auth.json)

Supported fields

Direct fields (`set_metadata --field`)

•title
•title_sort
•authors (string with & or array)
•author_sort
•series
•series_index
•tags (string or array)
•publisher
•pubdate (YYYY-MM-DD)
•languages
•comments

Helper fields

•comments_html (OC marker block upsert)
•analysis (auto-generates analysis HTML for comments)
•analysis_tags (adds tags)
•tags_merge (default true)
•tags_remove (remove specific tags after merge)

Required execution flow

A. Target confirmation (mandatory)

•Run read-only lookup to narrow candidates
•Show id,title,authors,series,series_index
•Get user confirmation for final target IDs
•Build JSONL using only confirmed IDs

B. Proposal synthesis (when metadata is missing)

•Collect evidence from file extraction + web sources
•
Show one merged proposal table with:
- •candidate, source, confidence (high|medium|low)
- •title_sort_candidate, author_sort_candidate
•
Get user decision:
- •approve all
- •approve only: <fields>
- •reject: <fields>
- •edit: <field>=<value>
•Apply only approved/finalized fields
•If confidence is low or sources conflict, keep fields empty

C. Apply

•Run dry-run first (mandatory)
•Run --apply only after explicit user approval
•Re-read and report final values

Analysis worker policy

•
Use subagent-spawn-command-builder to generate sessions_spawn payload for heavy candidate generation
- •task is required.
- •Profile should include model/thinking/timeout/cleanup for this workflow.
•Use lightweight subagent model for analysis (avoid main heavy model)
•Keep final decisions + dry-run/apply in main

Long-run turn-split policy (library-wide)

For library-wide heavy processing, always use turn-split execution.

Unknown-document recovery flow (M3)

Batch sizing rule:

•Keep each unknown-document batch small enough to show full row-by-row results in chat (no representative sampling).
•If unresolved items remain, stop and wait for explicit user instruction to start the next batch.

User intervention checkpoints (fixed)

•
Light pass (metadata-only)
- •Always run this stage by default (no extra user instruction required)
- •Analyze existing metadata only (no file content read)
- •
  Present a table to user:
  - •current file/title
  - •recommended title/metadata
  - •confidence/evidence summary
- •Stop and wait for user instruction before any deeper stage
•
On user request: page-1 pass
- •Read only the first page and refine proposals
- •Report delta from light pass
•
If still uncertain: deep pass
- •Read first 5 pages + last 5 pages
- •Add web evidence search
- •Produce finalized proposal with confidence + rationale
•
Approval gate
- •Show detailed findings and request explicit approval before apply

Pending and unsupported handling

•Use pending-review tag for unresolved/hold items.
•
If document is unresolved in current flow, do not force metadata guesses.
- •Tag with pending-review and keep for follow-up investigation.

Diff report format (for unknown batch runs)

Return full results (not samples):

•execution summary (target/changed/pending/skipped/error)
•full changed list with id + key before/after fields
•full pending list with id + reason
•full error list with id + error summary
•confidence must be expressed as high|medium|low

Runtime artifact policy

•Keep run-state and temporary artifacts only while a run is active.
•On successful completion, remove per-run state/artifacts.
•On failure, keep minimal artifacts only for retry/debug, then clean up after resolution.

Internal orchestration (recommended)

•Use lightweight subagent for all analysis stages
•Keep apply decisions in main session
•Persist run state for each stage in state/runs.json

Turn 1 (start)

•Main defines scope
•Main generates spawn payload via subagent-spawn-command-builder (profile example: calibre-meta), then calls sessions_spawn
•Save run_id/session_key/task via scripts/run_state.py upsert
•Immediately tell the user this is a subagent job and state the execution model used for analysis
•Reply with "analysis started" and keep normal chat responsive

Turn 2 (completion)

•Receive subagent completion notice
•Save result JSON
•Complete state handling via scripts/handle_completion.py --run-id ... --result-json ...
•Return summarized proposal (apply only when needed)

Run state file:

•state/runs.json

PDF extraction policy

•Try ebook-convert first
•If empty/failed, fallback to pdftotext
•If both fail, switch to web-evidence-first mode

Sort reading policy

•
Use user-configured reading_script for Japanese/non-Latin sort fields
- •katakana / hiragana / latin
•Ask once on first use, then persist and reuse
•Default policy is full reading (no truncation)
•
Config path: ~/.config/calibre-metadata-apply/config.json
- •key: reading_script

Usage

Dry-run:

bash

cat changes.jsonl | python3 skills/calibre-metadata-apply/scripts/calibredb_apply.py \
  --with-library "http://127.0.0.1:8080/#MyLibrary" \
  --lang ja

Apply: