OberaConnect Secondbrain: n8n Knowledge Base System

Overview

This skill deploys a self-hosted knowledge base system that replaces NotebookLM functionality with an automated, integrated solution. The system ingests documents from SharePoint, processes them into a vector database, and provides RAG-powered query capabilities through multiple interfaces.

Architecture

code

┌─────────────────────────────────────────────────────────────────────────────┐
│                        OberaConnect Secondbrain                              │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  ┌──────────────┐     ┌──────────────┐     ┌──────────────┐                 │
│  │  SharePoint  │────▶│    n8n       │────▶│   Qdrant     │                 │
│  │  (Source)    │     │  Workflows   │     │ Vector Store │                 │
│  └──────────────┘     └──────────────┘     └──────────────┘                 │
│         │                    │                    │                          │
│         │                    ▼                    │                          │
│         │             ┌──────────────┐            │                          │
│         │             │   Claude     │◀───────────┘                          │
│         │             │  (Anthropic) │                                       │
│         │             └──────────────┘                                       │
│         │                    │                                               │
│         ▼                    ▼                                               │
│  ┌──────────────┐     ┌──────────────┐     ┌──────────────┐                 │
│  │   Obsidian   │◀───▶│  Teams Bot   │     │  CLI/Webhook │                 │
│  │    Vault     │     │  Interface   │     │    API       │                 │
│  └──────────────┘     └──────────────┘     └──────────────┘                 │
│                                                                              │
└─────────────────────────────────────────────────────────────────────────────┘

Components

1. Document Ingestion Pipeline (`workflow-ingestion.json`)

•Trigger: Schedule (daily) or manual
•Source: SharePoint folders/document libraries
•Processing: PDF/DOCX text extraction, chunking, embedding
•Storage: Qdrant vector database with metadata

2. RAG Query System (`workflow-query.json`)

•Trigger: Chat interface, webhook, or Teams bot
•Retrieval: Semantic search against Qdrant
•Generation: Claude AI with retrieved context
•Output: Cited responses with source references

3. Obsidian Sync (`workflow-obsidian-sync.json`)

•Bidirectional: Notes to vector store, insights to vault
•Format: Markdown with YAML frontmatter preservation

Prerequisites

•Azure Linux VM (existing OberaConnect infrastructure)
•Docker and Docker Compose
•Anthropic API key
•Microsoft 365 credentials (for SharePoint access)
•GitHub access (for version control)

Deployment

Quick Start

bash

# 1. Clone to your Azure VM
cd /opt
git clone https://github.com/jerm71279/oberaconnect-ai-ops.git
cd oberaconnect-ai-ops/secondbrain-n8n

# 2. Configure environment
cp .env.example .env
nano .env  # Add your API keys

# 3. Deploy stack
docker-compose up -d

# 4. Access n8n
# https://your-vm-ip:5678

Environment Variables

bash

# Required
ANTHROPIC_API_KEY=sk-ant-...
N8N_ENCRYPTION_KEY=<generate-random-32-char>
N8N_BASIC_AUTH_USER=admin
N8N_BASIC_AUTH_PASSWORD=<secure-password>

# SharePoint (Microsoft Graph)
MICROSOFT_CLIENT_ID=<app-registration-id>
MICROSOFT_CLIENT_SECRET=<app-secret>
MICROSOFT_TENANT_ID=<tenant-id>
SHAREPOINT_SITE_ID=<site-id>

# Qdrant
QDRANT_URL=http://qdrant:6333
QDRANT_COLLECTION=oberaconnect-docs

# Optional
WEBHOOK_URL=https://your-domain/webhook
TEAMS_WEBHOOK_URL=<teams-incoming-webhook>

Workflow Details

Ingestion Pipeline

Trigger Options:

•Cron schedule: 0 2 * * * (daily at 2 AM)
•Manual trigger for immediate sync
•Webhook for event-driven updates

Processing Steps:

•List files from SharePoint folder (new/modified since last run)
•Download files to temporary storage
•
Extract text using appropriate method:
- •PDF: Extract From File node
- •DOCX: Extract From File node
- •Markdown: Direct read
•Split into chunks (500 tokens, 50 token overlap)
•Generate embeddings via Anthropic
•
Upsert to Qdrant with metadata:
- •source_file: Original filename
- •sharepoint_path: Full path in SharePoint
- •modified_date: Last modification timestamp
- •chunk_index: Position in document
- •customer: Extracted customer name (if applicable)

Query System

Input Handling:

•Natural language questions
•Filters: customer name, date range, document type
•Context: Previous conversation (via Simple Memory node)

Retrieval Configuration:

•Top-K: 5 chunks
•Similarity threshold: 0.7
•Metadata filtering supported

Response Format:

code

[Answer based on retrieved context]

---
Sources:
- [Document Name](sharepoint-link) - Chunk 3
- [Document Name](sharepoint-link) - Chunk 7

Integration Points

Teams Bot

The query workflow exposes a webhook that can be connected to a Teams bot:

•Create Azure Bot Service
•Configure messaging endpoint to n8n webhook URL
•Install bot in Teams channel

CLI Integration

For your multi-AI orchestration workflow:

bash

curl -X POST https://your-n8n/webhook/secondbrain-query \
  -H "Content-Type: application/json" \
  -d '{"query": "What are the WiFi best practices for UniFi?", "customer": "optional-filter"}'

Obsidian

Using the Post Webhook plugin:

•Install Post Webhook in Obsidian
•Configure webhook URL to n8n endpoint
•Send notes with post-webhook: true frontmatter

Maintenance

Monitoring

•n8n execution logs: docker logs n8n
•Qdrant health: curl http://localhost:6333/health
•Failed executions visible in n8n UI

Backup

bash

# Backup Qdrant data
docker exec qdrant qdrant-backup /snapshots/backup-$(date +%Y%m%d)

# Backup n8n workflows
docker exec n8n n8n export:workflow --all --output=/data/backups/

Scaling

•Increase Qdrant resources for larger document sets
•Add n8n workers for parallel processing
•Consider Qdrant Cloud for >10GB vector data

Troubleshooting

Common Issues

SharePoint authentication fails:

•Verify Microsoft Graph permissions: Sites.Read.All, Files.Read.All
•Check token expiration and refresh

Embedding errors:

•Verify Anthropic API key is valid
•Check rate limits (consider batching)

Qdrant connection refused:

•Ensure Qdrant container is running
•Verify network connectivity between containers

No results returned:

•Check if documents were ingested (query Qdrant directly)
•Lower similarity threshold
•Verify chunk size isn't too large

Cost Estimation

Component	Usage	Monthly Cost
Azure VM	Existing	$0
Qdrant	Self-hosted	$0
Anthropic API	~100K tokens/day	$30-50
Total		~$30-50

Version History

•
v1.0.0 (2024-12): Initial release
- •SharePoint ingestion pipeline
- •RAG query system
- •Obsidian bidirectional sync
- •Teams bot integration

OberaConnect Secondbrain: n8n Knowledge Base System

Overview

Architecture

Components

1. Document Ingestion Pipeline (workflow-ingestion.json)

2. RAG Query System (workflow-query.json)

3. Obsidian Sync (workflow-obsidian-sync.json)

Prerequisites

Deployment

Quick Start

Environment Variables

Workflow Details

Ingestion Pipeline

Query System

Integration Points

Teams Bot

CLI Integration

Obsidian

Maintenance

Monitoring

Backup

Scaling

Troubleshooting

Common Issues

Cost Estimation

Version History

References

1. Document Ingestion Pipeline (`workflow-ingestion.json`)

2. RAG Query System (`workflow-query.json`)

3. Obsidian Sync (`workflow-obsidian-sync.json`)