mcp-vector-search Configuration Guide
Comprehensive reference for writing .mcp-vector-search/config.edn configuration files.
Overview
mcp-vector-search indexes documents using semantic embeddings and provides a search tool via the Model Context Protocol. Configuration controls which files are indexed, how they're processed, and what metadata is extracted.
Configuration File Location
Create a configuration file at one of these locations (first found is used):
- •Project-specific:
.mcp-vector-search/config.edn(in project root) - •Global:
~/.mcp-vector-search/config.edn(user home directory)
The server reads the configuration at startup and indexes all specified sources.
Basic Configuration Structure
{:description "Custom search tool description" ; optional
:watch? true ; optional, enable file watching
:sources [{:path "/docs/**/*.md"
:name "Documentation" ; optional
:ingest :whole-document ; optional, defaults to :whole-document
:watch? true ; optional, overrides global :watch?
:custom-key "custom-value"}]} ; any additional keys become metadata
Top-level keys:
- •
:description- Custom description for the search tool (optional) - •
:watch?- Enable automatic re-indexing when files change (optional, default: false) - •
:sources- Array of source configurations (required)
Path Specifications
Filesystem vs Classpath Sources
Filesystem sources (:path key):
- •Use absolute paths with leading
/ - •Support file watching for automatic re-indexing
- •Files are read from the filesystem
{:path "/docs/**/*.md"}
Classpath sources (:class-path key):
- •Use relative paths without leading
/ - •File watching not available (read-only resources)
- •Resources discovered from classpath (JARs, resource directories)
- •Useful when embedding mcp-vector-search as a library
{:class-path "docs/**/*.md"}
Important: Sources must specify exactly one of :path or :class-path.
Glob Patterns
Single-level glob (*):
- •Matches any characters within a single directory level
- •Does not match path separators
{:path "/docs/*.md"} ; matches /docs/README.md
; does NOT match /docs/api/guide.md
Recursive glob (**):
- •Matches any characters across multiple directory levels
- •Includes path separators
{:path "/docs/**/*.md"} ; matches /docs/README.md
; matches /docs/api/guide.md
Named Captures
Extract metadata from file paths using named regex groups:
{:path "/docs/(?<category>[^/]+)/*.md"}
Syntax: (?<name>pattern)
- •
name- Metadata key (converted to keyword) - •
pattern- Java regular expression
For file /docs/api/functions.md:
- •Captures:
{:category "api"}
Multiple captures example:
{:path "/(?<project>[^/]+)/(?<version>v\\d+)/(?<file>.+\\.clj)"}
For file /myapp/v1/core.clj:
- •Captures:
{:project "myapp", :version "v1", :file "core.clj"}
Path Specification Examples
;; All markdown files recursively
{:path "/docs/**/*.md"}
;; Single directory level
{:path "/docs/*.md"}
;; Capture directory name
{:path "/docs/(?<category>[^/]+)/*.md"}
;; Multiple captures
{:path "/(?<project>[^/]+)/(?<version>[^/]+)/**/*.clj"}
;; Literal file
{:path "/docs/README.md"}
;; Classpath resource (no leading /)
{:class-path "docs/**/*.md"}
Ingest Pipeline Strategies
Ingest strategies control how documents are processed, embedded, and stored. Set via the :ingest key.
:whole-document (default)
Embeds and stores the entire file content as a single segment.
{:sources [{:path "/docs/**/*.md"
:ingest :whole-document}]}
Characteristics:
- •One segment per file
- •Both embedding and storage use full content
- •Simple and straightforward for most use cases
Use when: You want to search across complete documents and return full content.
:namespace-doc
For Clojure source files - embeds only the namespace docstring but stores the full file content.
{:sources [{:path "/src/**/*.clj"
:ingest :namespace-doc}]}
Requirements:
- •File must contain a valid
nsform - •Namespace must have a docstring
- •Adds
:namespaceto metadata (e.g.,{:namespace "my.app.core"})
Characteristics:
- •Embedding uses only namespace docstring
- •Storage includes full file content
- •One segment per file
Use when: You want to search Clojure namespaces by their documentation while still returning the complete source code.
:file-path
Embeds the full content but stores only the file path.
{:sources [{:path "/docs/**/*.md"
:ingest :file-path}]}
Characteristics:
- •Embedding uses full file content
- •Storage contains only the file path
- •One segment per file
- •Reduces memory footprint for large document sets
Use when:
- •You only need to discover which files match a query
- •You want to reduce memory usage
- •Your client will read file content separately
:code-analysis
Analyzes Clojure and Java source files using clj-kondo to extract code elements (vars, namespaces, classes, methods, fields, macros). Creates one searchable segment per code element.
{:sources [{:path "/src/**/*.clj"
:ingest :code-analysis}]}
Configuration options:
{:sources [{:path "/src/**/*.clj"
:ingest :code-analysis
:visibility :public-only ; :all (default) | :public-only
:element-types #{:var :macro}}]} ; optional filter
:visibility - Controls which elements to include:
- •
:all(default) - Include all elements regardless of visibility - •
:public-only- Include only public elements- •Clojure: Excludes vars with
^:privateor{:private true}metadata - •Java: Excludes members with
privateorprotectedaccess modifiers
- •Clojure: Excludes vars with
:element-types (optional) - Set of element types to include:
- •Valid types:
:var,:macro,:namespace,:class,:method,:field,:constructor - •If omitted: Include all element types
- •If provided: Only include specified types
Characteristics:
- •Multiple segments per file (one per code element)
- •Embedding uses docstring if present, otherwise element name
- •Content stores complete clj-kondo analysis map as EDN string
- •Supports both Clojure (.clj, .cljs, .cljc) and Java (.java) files
Segment metadata:
- •
:element-type- Type of code element (var, macro, namespace, class, method, field, constructor) - •
:element-name- Qualified name (e.g., "my.ns/my-fn" or "com.example.MyClass.myMethod") - •
:language- Source language (clojure or java) - •
:namespace- Containing namespace (Clojure) or package (Java) - •
:visibility- Access level (public, private, or protected)
Use when: You want to search code by documentation or API discovery, finding functions/classes/methods based on their purpose rather than file names.
Examples:
;; Search all code elements
{:sources [{:path "/src/**/*.clj"
:ingest :code-analysis}]}
;; Search only public API
{:sources [{:path "/src/**/*.clj"
:ingest :code-analysis
:visibility :public-only}]}
;; Search only vars and macros
{:sources [{:path "/src/**/*.clj"
:ingest :code-analysis
:element-types #{:var :macro}}]}
;; Java source code analysis
{:sources [{:path "/src/**/*.java"
:ingest :code-analysis
:visibility :public-only}]}
:chunked
Splits documents into smaller segments using LangChain4j's recursive text splitter. Enables better semantic search for large documents.
{:sources [{:path "/docs/**/*.md"
:ingest :chunked
:chunk-size 512
:chunk-overlap 100}]}
Configuration:
- •
:chunk-size- Maximum characters per chunk (default: 512) - •
:chunk-overlap- Characters to overlap between chunks (default: 100)
Note: LangChain4j's recursive paragraph splitter prioritizes semantic boundaries (paragraph breaks) over exact overlap amounts. Adjacent chunks may have less overlap than specified if splitting at a paragraph boundary.
Characteristics:
- •Multiple segments per file
- •Each chunk is embedded and stored independently
- •All chunks from the same file share the same
:doc-idfor batch removal during updates - •Chunk metadata includes:
:chunk-index(position),:chunk-count(total chunks),:chunk-offset(character offset)
Chunk sizing guidance:
- •Smaller chunks (256-512 chars): Better for precise fact-based retrieval
- •Larger chunks (1024+ chars): Better for broader context
- •Overlap (10-20%): Recommended to preserve context at chunk boundaries
Use when: You have large documents and need precise fact-based retrieval where specific information may be buried in lengthy content.
Examples:
;; Fine-grained retrieval for technical docs
{:sources [{:path "/docs/**/*.md"
:ingest :chunked
:chunk-size 384
:chunk-overlap 75}]}
;; Broader context for narrative content
{:sources [{:path "/articles/**/*.md"
:ingest :chunked
:chunk-size 1024
:chunk-overlap 200}]}
;; Compare strategies for different content types
{:sources [
;; Small reference docs - whole document works well
{:path "/api-reference/**/*.md"
:ingest :whole-document}
;; Large guides - chunking improves precision
{:path "/guides/**/*.md"
:ingest :chunked
:chunk-size 512
:chunk-overlap 100}]}
Metadata System
Metadata comes from two sources:
- •Base metadata: Any additional keys in the source map (except
:path,:class-path,:name,:ingest,:watch?) - •Captures: Values extracted from named groups in the path spec
{:sources [{:path "/docs/(?<category>[^/]+)/*.md"
:project "my-project"
:type "documentation"}]}
For a file /docs/api/functions.md:
- •Metadata:
{:project "my-project", :type "documentation", :category "api"}
The :name key, if provided, is also added to metadata.
System-added metadata:
- •
:doc-id- File path (used for watch updates/deletes) - •
:file-id- File path - •
:segment-id- Unique segment identifier
Strategy-specific metadata:
- •
:namespace-docadds::namespace - •
:code-analysisadds::element-type,:element-name,:language,:namespace,:visibility - •
:chunkedadds::chunk-index,:chunk-count,:chunk-offset
File Watching
Optional file watching system for automatic re-indexing when files change.
Configuration:
- •Global
:watch? trueenables watching for all sources - •Per-source
:watch? true/falseoverrides global setting - •Only available for filesystem sources (
:path), not classpath sources
{:watch? true ; enable globally
:sources [
{:path "/docs/**/*.md"} ; watched (global setting)
{:path "/src/**/*.clj"
:watch? false} ; not watched (override)
{:path "/notes/**/*.txt"
:watch? true}]} ; watched (explicit)
Behavior:
- •Events are debounced (500ms) to avoid excessive re-indexing
- •File created → index new file
- •File modified → remove old embeddings by
:doc-id, re-index - •File deleted → remove embeddings by
:doc-id - •Recursive watching for directories with
**glob
Complete Examples
Basic Documentation Search
{:sources [{:path "/Users/me/docs/**/*.md"}]}
Multi-Source with Metadata
{:description "Project documentation and code search"
:sources [
{:path "/Users/me/project/docs/**/*.md"
:name "Documentation"
:type "docs"}
{:path "/Users/me/project/src/**/*.clj"
:ingest :namespace-doc
:name "Source Code"
:type "code"}]}
Metadata Extraction with Captures
{:sources [{:path "/docs/(?<category>[^/]+)/*.md"
:project "myapp"
:type "documentation"}]}
Code Analysis with Filtering
{:sources [
{:path "/src/**/*.clj"
:ingest :code-analysis
:visibility :public-only
:element-types #{:var :macro}}
{:path "/src/**/*.java"
:ingest :code-analysis
:visibility :public-only}]}
Chunked Large Documents
{:sources [
{:path "/guides/**/*.md"
:ingest :chunked
:chunk-size 512
:chunk-overlap 100}]}
Mixed Filesystem and Classpath
{:sources [
;; Filesystem documentation
{:path "/Users/me/docs/**/*.md"
:source "local"}
;; Bundled library documentation from classpath
{:class-path "lib-docs/**/*.md"
:source "library"}
;; Clojure source from classpath
{:class-path "my/app/**/*.clj"
:ingest :namespace-doc
:source "library-code"}]}
File Watching Enabled
{:watch? true
:sources [
{:path "/Users/me/project/docs/**/*.md"}
{:path "/Users/me/project/src/**/*.clj"
:ingest :namespace-doc}]}
Complex Multi-Strategy Configuration
{:description "Comprehensive project search"
:watch? true
:sources [
;; API reference - small docs, keep whole
{:path "/docs/api/**/*.md"
:ingest :whole-document
:category "api-reference"}
;; User guides - large docs, chunk them
{:path "/docs/guides/**/*.md"
:ingest :chunked
:chunk-size 512
:chunk-overlap 100
:category "guides"}
;; Public API code
{:path "/src/(?<namespace>[^/]+)/**/*.clj"
:ingest :code-analysis
:visibility :public-only
:category "code"}
;; README files - whole document
{:path "/(?<project>[^/]+)/README.md"
:ingest :whole-document
:category "readme"}]}
Tips and Best Practices
Path specifications:
- •Use absolute paths for filesystem sources (start with
/) - •Use relative paths for classpath sources (no leading
/) - •Named captures are powerful for extracting structured metadata
- •Test path patterns with a small subset first
Ingest strategies:
- •Start with
:whole-documentfor most use cases - •Use
:namespace-docfor Clojure codebases to search by documentation - •Use
:code-analysiswhen you need fine-grained API discovery - •Use
:chunkedfor large documents (>1000 chars) - •Use
:file-pathwhen you need to minimize memory usage
Metadata:
- •Add meaningful metadata to enable filtering during search
- •Use consistent naming conventions for metadata keys
- •Captures are great for hierarchical organization (project, version, category)
File watching:
- •Enable globally with
:watch? truefor development - •Disable for production or when working with static content
- •Override per-source as needed
- •Only works with filesystem sources, not classpath
Performance:
- •Smaller chunk sizes create more segments (more memory, more precise search)
- •Larger chunk sizes create fewer segments (less memory, broader context)
- •
:file-pathstrategy significantly reduces memory usage - •Consider the trade-off between search precision and resource usage