AgentSkillsCN

processor

将文档处理成RAG数据库。当用户希望将文件分块、嵌入或索引到向量数据库以进行语义搜索时使用。

SKILL.md
--- frontmatter
name: processor
description: Process documents into RAG database. Use when user wants to chunk, embed, or index files into a vector database for semantic search.

Document Processing

This skill helps you process documents, codebases, and papers into a searchable RAG (Retrieval-Augmented Generation) database using LanceDB.

Quick Start

bash
# 1. Check that services are running
uv run processor check

# 2. Process files into database
uv run processor process ./input -o ./lancedb

# 3. Verify results
uv run processor stats ./lancedb

Common Use Cases

Process a codebase

bash
uv run processor process ./my-project -o ./code_db --content-type code

Process papers/documents

bash
uv run processor process ./papers -o ./papers_db

Incremental updates (skip unchanged files)

bash
uv run processor process ./input -o ./lancedb --incremental

High-quality embeddings (slower, better retrieval)

bash
uv run processor process ./input -o ./lancedb --text-profile high --code-profile high

Embedding Profiles

TypeProfileModelDimensionsUse Case
textlowQwen3-Embedding-0.6B1024Fast, good quality
textmediumQwen3-Embedding-4B2560Balanced
texthighQwen3-Embedding-8B4096Maximum quality
codelowjina-code-0.5b896Fast code search
codehighjina-code-1.5b1536Best code search

Key Options

OptionValuesDescription
--embedderollama, transformersEmbedding backend
--text-profilelow, medium, highText embedding quality
--code-profilelow, highCode embedding quality
--table-modeseparate, unified, bothTable organization
--incremental/--full-Skip unchanged files
--content-typeauto, code, paper, markdownForce content detection

MCP Server

Start the processor MCP server for programmatic access:

bash
uv run processor-mcp

Configure in Claude Desktop (claude_desktop_config.json):

json
{
  "mcpServers": {
    "processor": {
      "command": "uv",
      "args": ["run", "processor-mcp"],
      "cwd": "/path/to/processor"
    }
  }
}

Available MCP Tools

  • process_documents - Process files into LanceDB
  • check_services - Check backend availability
  • setup_models - Download embedding models
  • get_db_stats - Database statistics
  • export_db - Export database

Troubleshooting

"Model not found" error

bash
uv run processor setup  # Download required models

Ollama not running

bash
ollama serve  # Start Ollama server

Check available models

bash
uv run processor check