OberaConnect Secondbrain: n8n Knowledge Base System
Overview
This skill deploys a self-hosted knowledge base system that replaces NotebookLM functionality with an automated, integrated solution. The system ingests documents from SharePoint, processes them into a vector database, and provides RAG-powered query capabilities through multiple interfaces.
Architecture
code
┌─────────────────────────────────────────────────────────────────────────────┐ │ OberaConnect Secondbrain │ ├─────────────────────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ SharePoint │────▶│ n8n │────▶│ Qdrant │ │ │ │ (Source) │ │ Workflows │ │ Vector Store │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ │ │ │ │ │ ▼ │ │ │ │ ┌──────────────┐ │ │ │ │ │ Claude │◀───────────┘ │ │ │ │ (Anthropic) │ │ │ │ └──────────────┘ │ │ │ │ │ │ ▼ ▼ │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ Obsidian │◀───▶│ Teams Bot │ │ CLI/Webhook │ │ │ │ Vault │ │ Interface │ │ API │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ │ │ └─────────────────────────────────────────────────────────────────────────────┘
Components
1. Document Ingestion Pipeline (workflow-ingestion.json)
- •Trigger: Schedule (daily) or manual
- •Source: SharePoint folders/document libraries
- •Processing: PDF/DOCX text extraction, chunking, embedding
- •Storage: Qdrant vector database with metadata
2. RAG Query System (workflow-query.json)
- •Trigger: Chat interface, webhook, or Teams bot
- •Retrieval: Semantic search against Qdrant
- •Generation: Claude AI with retrieved context
- •Output: Cited responses with source references
3. Obsidian Sync (workflow-obsidian-sync.json)
- •Bidirectional: Notes to vector store, insights to vault
- •Format: Markdown with YAML frontmatter preservation
Prerequisites
- •Azure Linux VM (existing OberaConnect infrastructure)
- •Docker and Docker Compose
- •Anthropic API key
- •Microsoft 365 credentials (for SharePoint access)
- •GitHub access (for version control)
Deployment
Quick Start
bash
# 1. Clone to your Azure VM cd /opt git clone https://github.com/jerm71279/oberaconnect-ai-ops.git cd oberaconnect-ai-ops/secondbrain-n8n # 2. Configure environment cp .env.example .env nano .env # Add your API keys # 3. Deploy stack docker-compose up -d # 4. Access n8n # https://your-vm-ip:5678
Environment Variables
bash
# Required ANTHROPIC_API_KEY=sk-ant-... N8N_ENCRYPTION_KEY=<generate-random-32-char> N8N_BASIC_AUTH_USER=admin N8N_BASIC_AUTH_PASSWORD=<secure-password> # SharePoint (Microsoft Graph) MICROSOFT_CLIENT_ID=<app-registration-id> MICROSOFT_CLIENT_SECRET=<app-secret> MICROSOFT_TENANT_ID=<tenant-id> SHAREPOINT_SITE_ID=<site-id> # Qdrant QDRANT_URL=http://qdrant:6333 QDRANT_COLLECTION=oberaconnect-docs # Optional WEBHOOK_URL=https://your-domain/webhook TEAMS_WEBHOOK_URL=<teams-incoming-webhook>
Workflow Details
Ingestion Pipeline
Trigger Options:
- •Cron schedule:
0 2 * * *(daily at 2 AM) - •Manual trigger for immediate sync
- •Webhook for event-driven updates
Processing Steps:
- •List files from SharePoint folder (new/modified since last run)
- •Download files to temporary storage
- •Extract text using appropriate method:
- •PDF:
Extract From Filenode - •DOCX:
Extract From Filenode - •Markdown: Direct read
- •PDF:
- •Split into chunks (500 tokens, 50 token overlap)
- •Generate embeddings via Anthropic
- •Upsert to Qdrant with metadata:
- •
source_file: Original filename - •
sharepoint_path: Full path in SharePoint - •
modified_date: Last modification timestamp - •
chunk_index: Position in document - •
customer: Extracted customer name (if applicable)
- •
Query System
Input Handling:
- •Natural language questions
- •Filters: customer name, date range, document type
- •Context: Previous conversation (via Simple Memory node)
Retrieval Configuration:
- •Top-K: 5 chunks
- •Similarity threshold: 0.7
- •Metadata filtering supported
Response Format:
code
[Answer based on retrieved context] --- Sources: - [Document Name](sharepoint-link) - Chunk 3 - [Document Name](sharepoint-link) - Chunk 7
Integration Points
Teams Bot
The query workflow exposes a webhook that can be connected to a Teams bot:
- •Create Azure Bot Service
- •Configure messaging endpoint to n8n webhook URL
- •Install bot in Teams channel
CLI Integration
For your multi-AI orchestration workflow:
bash
curl -X POST https://your-n8n/webhook/secondbrain-query \
-H "Content-Type: application/json" \
-d '{"query": "What are the WiFi best practices for UniFi?", "customer": "optional-filter"}'
Obsidian
Using the Post Webhook plugin:
- •Install Post Webhook in Obsidian
- •Configure webhook URL to n8n endpoint
- •Send notes with
post-webhook: truefrontmatter
Maintenance
Monitoring
- •n8n execution logs:
docker logs n8n - •Qdrant health:
curl http://localhost:6333/health - •Failed executions visible in n8n UI
Backup
bash
# Backup Qdrant data docker exec qdrant qdrant-backup /snapshots/backup-$(date +%Y%m%d) # Backup n8n workflows docker exec n8n n8n export:workflow --all --output=/data/backups/
Scaling
- •Increase Qdrant resources for larger document sets
- •Add n8n workers for parallel processing
- •Consider Qdrant Cloud for >10GB vector data
Troubleshooting
Common Issues
SharePoint authentication fails:
- •Verify Microsoft Graph permissions:
Sites.Read.All,Files.Read.All - •Check token expiration and refresh
Embedding errors:
- •Verify Anthropic API key is valid
- •Check rate limits (consider batching)
Qdrant connection refused:
- •Ensure Qdrant container is running
- •Verify network connectivity between containers
No results returned:
- •Check if documents were ingested (query Qdrant directly)
- •Lower similarity threshold
- •Verify chunk size isn't too large
Cost Estimation
| Component | Usage | Monthly Cost |
|---|---|---|
| Azure VM | Existing | $0 |
| Qdrant | Self-hosted | $0 |
| Anthropic API | ~100K tokens/day | $30-50 |
| Total | ~$30-50 |
Version History
- •v1.0.0 (2024-12): Initial release
- •SharePoint ingestion pipeline
- •RAG query system
- •Obsidian bidirectional sync
- •Teams bot integration