Obsidian Process

Modular Python batch processing toolkit for Obsidian vault operations with dry-run support, backup/rollback capabilities, and comprehensive reporting.

System Architecture

code

┌─────────────────────────────────────────────────────────────┐
│                    BatchProcessor (Base)                     │
│  - find_markdown_files()  - read_file()  - write_file()    │
│  - VaultContext (rollback)  - ProcessingResult (output)     │
└──────────────────────────┬──────────────────────────────────┘
                           │
     ┌─────────────────────┼─────────────────────┐
     │                     │                     │
     ▼                     ▼                     ▼
┌──────────┐        ┌──────────┐         ┌──────────────┐
│ Wikilink │        │   Tag    │         │ Frontmatter  │
│Extractor │        │Normalizer│         │  Processor   │
└──────────┘        └──────────┘         └──────────────┘

Quick Start

Extract All Wikilinks

bash

python scripts/wikilink_extractor.py --vault /path/to/vault --output links.json

Normalize Tags

bash

python scripts/tag_normalizer.py --vault /path/to/vault --case lower --dry-run

Process Frontmatter

bash

python scripts/frontmatter_processor.py --vault /path/to/vault --operation add --key status --value draft

Modules

1. BatchProcessor (base class)

All processors inherit from BatchProcessor:

python

from batch_processor import BatchProcessor, ProcessingResult, VaultContext

class MyProcessor(BatchProcessor):
    def process(self) -> ProcessingResult:
        with VaultContext(self) as ctx:
            for file in self.find_markdown_files():
                content = self.read_file(file)
                # Transform content
                self.write_file(file, new_content)
        return ProcessingResult(success=True, files_processed=n, ...)

Key Features:

•--dry-run: Preview changes without modifying files
•--verbose: Detailed logging output
•Automatic backup before modifications
•Rollback on errors via VaultContext

2. WikilinkExtractor

Extracts and analyzes all wikilink formats:

Format	Example	Captured Fields
Basic	`[[Note]]`	target
Alias	`[[Note\|Display]]`	target, display_text
Header	`[[Note#Section]]`	target, header
Block	`[[Note^abc123]]`	target, block_id
Embed	`![[Image.png]]`	target, is_embedded

Output: JSON with link_index, backlink_index, statistics

bash

# Build link graph
python scripts/wikilink_extractor.py --vault ./vault --output graph.json

# Find broken links
python scripts/wikilink_extractor.py --vault ./vault --find-broken

3. TagNormalizer

Handles both inline (#tag) and frontmatter tags:

Operations:

•Case normalization: lower, upper, title
•Pattern-based rules via JSON config
•Hierarchy parsing: #project/active/urgent → 3 levels

bash

# Normalize all tags to lowercase
python scripts/tag_normalizer.py --vault ./vault --case lower

# Apply custom rules
python scripts/tag_normalizer.py --vault ./vault --rules rules.json

Rules JSON Format:

json

{
  "rules": [
    {"pattern": "^todo$", "replacement": "task", "case_sensitive": false},
    {"pattern": "^WIP$", "replacement": "in-progress"}
  ]
}

4. FrontmatterProcessor

Full YAML frontmatter CRUD with templates and validators:

Operations:

•add: Add/update field (merges with existing)
•remove: Delete field from frontmatter
•update: Same as add (explicit intent)
•validate: Check against registered validators
•template: Apply predefined templates

bash

# Add created date to all files
python scripts/frontmatter_processor.py --vault ./vault \
  --operation add --key created --value "2024-01-01"

# Apply article template
python scripts/frontmatter_processor.py --vault ./vault \
  --operation template --template article

# Validate all frontmatter
python scripts/frontmatter_processor.py --vault ./vault --operation validate

Built-in Templates:

•basic: created, tags, aliases
•article: created, modified, tags, aliases, author, published, draft
•meeting: created, tags, attendees, date, location

Integration Patterns

With obsidian-markdown Skill

Use obsidian-process for batch operations, then validate with obsidian-markdown:

code

1. obsidian-process: Normalize tags vault-wide
2. obsidian-markdown: Validate Obsidian syntax compliance
3. obsidian-process: Generate validation report

With obsidian-data-importer Skill

Process imported data after ingestion:

code

1. obsidian-data-importer: Convert CSV/JSON to markdown
2. obsidian-process: Apply frontmatter templates
3. obsidian-process: Build wikilink index
4. obsidian-markdown: Validate final output

Common Workflows

Vault Health Check

bash

# 1. Extract all wikilinks
python scripts/wikilink_extractor.py --vault ./vault --output links.json

# 2. Identify orphaned notes (no incoming links)
python scripts/wikilink_extractor.py --vault ./vault --find-orphans

# 3. Validate frontmatter
python scripts/frontmatter_processor.py --vault ./vault --operation validate

Tag Migration

bash

# 1. Preview changes with dry-run
python scripts/tag_normalizer.py --vault ./vault --case lower --dry-run

# 2. Apply normalization
python scripts/tag_normalizer.py --vault ./vault --case lower

# 3. Report statistics
python scripts/tag_normalizer.py --vault ./vault --stats-only

Frontmatter Standardization

bash

# 1. Apply template to all notes
python scripts/frontmatter_processor.py --vault ./vault \
  --operation template --template basic --dry-run

# 2. Add custom fields
python scripts/frontmatter_processor.py --vault ./vault \
  --operation add --key project --value "[[Projects/Main]]"

ProcessingResult Schema

All processors return standardized results:

python

@dataclass
class ProcessingResult:
    success: bool
    files_processed: int
    files_modified: int
    errors: List[str]
    warnings: List[str]
    metadata: Dict[str, Any]  # Operation-specific data
    timestamp: str

Safety Features

Feature	Description
`--dry-run`	Preview all changes without writing
Backup	Automatic `.bak` files before modification
Rollback	`VaultContext` reverts on exceptions
Logging	Timestamped operation logs
Validation	Pre-write content validation

Extension Guide

To add a new processor:

python

from batch_processor import BatchProcessor, ProcessingResult

class MyProcessor(BatchProcessor):
    def __init__(self, vault_path, dry_run=False, verbose=False):
        super().__init__(vault_path, dry_run, verbose)
        # Custom initialization

    def process_file(self, file_path: Path) -> bool:
        content = self.read_file(file_path)
        # Transform content
        return self.write_file(file_path, new_content)

    def process(self) -> ProcessingResult:
        for file in self.find_markdown_files():
            self.process_file(file)
        return ProcessingResult(success=True, ...)

CLI Reference

All scripts share common arguments:

code

--vault PATH      Path to Obsidian vault (required)
--dry-run         Preview changes without modifying
--verbose, -v     Enable detailed logging
--output PATH     Output file for results (JSON)

wikilink_extractor.py

code

--find-broken     Report links to non-existent notes
--find-orphans    Report notes with no incoming links
--stats-only      Output statistics without full index

tag_normalizer.py

code

--case {lower,upper,title}  Case normalization mode
--rules PATH                JSON file with transformation rules
--stats-only                Output tag statistics only

frontmatter_processor.py

code

--operation {add,remove,update,validate,template}
--key KEY         Frontmatter key for add/remove/update
--value VALUE     Value to set (supports JSON arrays/objects)
--template NAME   Template name for template operation

LSP Integration

When to use: Semantic analysis requiring graph-awareness—backlink chains, cross-vault renaming, broken link detection with context.

Setup:

bash

export ENABLE_LSP_TOOL=1
# Requires: markdown-oxide in PATH (cargo install markdown-oxide)

Core Capabilities (via scripts/lsp_integration.py):

Operation	Purpose
`findReferences`	Find all backlinks to a note
`goToDefinition`	Resolve wikilink targets
`workspaceSymbol`	Search vault for tags/headings
`diagnostics`	Detect broken links
`rename`	Cross-vault symbol renaming

Key Classes:

•LSPClient: Low-level LSP protocol wrapper
•MarkdownOxideIntegration: High-level Obsidian operations
•RecursiveLSP: Graph traversal (backlink chains, orphan detection, strongly connected components)
•VaultLSPAnalyzer: Extends BatchProcessor with LSP-powered vault analysis

CLI Quick Reference:

bash

# Analyze entire vault
python scripts/lsp_integration.py analyze-vault --vault ~/vault --output analysis.json

# Find recursive backlinks
python scripts/lsp_integration.py recursive-backlinks --vault ~/vault --file note.md --depth 3

Detailed Documentation: See references/lsp-patterns.md for:

•Complete LSP method reference
•Recursive query patterns (backlink chains, reference graphs, transitive closure)
•Batch processing strategies with caching
•Integration patterns with WikilinkExtractor, TagNormalizer, VaultAnalyzer
•Performance optimization and troubleshooting