AgentSkillsCN

codebase-intelligence

利用语义向量搜索、影响分析以及直接针对Oak CI数据库的SQL查询,对你的代码库进行搜索、分析与查询。在寻找语义相关代码、在重构前分析代码变更的影响、发现组件间的关系、回忆先前会议中讨论或决定的内容、查找过往的对话或结果、查询会话历史、检查活动日志、浏览记忆、在activities.db上运行SQL,或探索grep难以捕捉的模式时使用此功能。切勿用于存储记忆——请改用oak_remember或{oak-cli-command} ci remember。

SKILL.md
--- frontmatter
name: codebase-intelligence
description: >-
  Search, analyze, and query your codebase using semantic vector search, impact
  analysis, and direct SQL queries against the Oak CI database. Use when finding
  semantically related code, analyzing code change impacts before refactoring,
  discovering component relationships, recalling what was discussed or decided
  in previous sessions, looking up past conversations or outcomes, querying
  session history, checking activity logs, browsing memories, running SQL
  against activities.db, or exploring patterns that grep would miss. Do NOT use
  for storing memories — use oak_remember or {oak-cli-command} ci remember instead.
allowed-tools: Bash, Read
user-invocable: true

Codebase Intelligence

Search, analyze, and query your codebase using semantic vector search, impact analysis, and direct SQL queries against the Oak CI database.

Quick Start

Semantic search

bash
# Find code related to a concept
{oak-cli-command} ci search "form validation logic" --type code

# Find similar patterns
{oak-cli-command} ci search "retry with exponential backoff" --type code

Impact analysis

bash
# Find all code related to what you're changing
{oak-cli-command} ci search "AuthService token validation" --type code -n 20

# Get impact context for a specific file
{oak-cli-command} ci context "impact of changes" -f src/services/auth.py

Session and memory lookup

bash
# What happened in recent sessions?
sqlite3 -readonly -header -column .oak/ci/activities.db \
  "SELECT id, agent, title, status, datetime(created_at_epoch, 'unixepoch', 'localtime') as started FROM sessions ORDER BY created_at_epoch DESC LIMIT 5;"

# Search past decisions and learnings
{oak-cli-command} ci search "authentication refactor decision" --type memory

# Browse memories by type
{oak-cli-command} ci memories --type decision

Database query

bash
# Open the database in read-only mode
sqlite3 -readonly -header -column .oak/ci/activities.db "SELECT count(*) FROM sessions;"

Commands Reference

CLI commands

CommandPurpose
{oak-cli-command} ci search "query" --type codeSemantic vector search for code
{oak-cli-command} ci search "query" --type memorySemantic search for memories
{oak-cli-command} ci search "query" -n 20Broader search with more results
{oak-cli-command} ci context "task" -f <file>Get context for current work
{oak-cli-command} ci remember "observation"Store a memory (NOT via SQL)
{oak-cli-command} ci memories --type gotchaBrowse memories by type
{oak-cli-command} ci memories --status activeBrowse memories by lifecycle status
{oak-cli-command} ci resolve <id>Mark observation as resolved
{oak-cli-command} ci resolve --session <id>Bulk-resolve all observations from a session
{oak-cli-command} ci sessionsList session summaries
{oak-cli-command} ci statusCheck daemon status

MCP tools

MCP ToolCLI EquivalentPurpose
oak_search{oak-cli-command} ci search "query"Semantic vector search
oak_remember{oak-cli-command} ci remember "observation"Store a memory
oak_context{oak-cli-command} ci context "task"Get task-relevant context
oak_resolve_memory{oak-cli-command} ci resolve <id>Mark observation resolved/superseded

Direct SQL

bash
sqlite3 -readonly -header -column .oak/ci/activities.db "YOUR QUERY HERE"

When to Use What

NeedToolExample
Find similar implementations{oak-cli-command} ci search --type code"retry with exponential backoff"
Understand component relationships{oak-cli-command} ci context"how auth middleware relates to session handling"
Assess refactoring risk{oak-cli-command} ci search --type code -n 20"PaymentProcessor error handling"
Find past decisions/gotchas{oak-cli-command} ci search --type memory"gotchas with auth changes"
Recall previous discussionssqlite3 -readonlySELECT title, summary FROM sessions WHERE ...
Find what was done before{oak-cli-command} ci memories / sqlite3"what did we decide about caching?"
Query session historysqlite3 -readonlySELECT * FROM sessions ORDER BY ...
Aggregate usage statssqlite3 -readonlySELECT agent_name, sum(cost_usd) FROM agent_runs ...
Resolve stale observations{oak-cli-command} ci resolveAfter completing work that addresses a gotcha
Find unresolved planning itemssqlite3 -readonlySELECT ... WHERE status='active' AND session_origin_type='planning'
Run automated analysis{oak-cli-command} ci agent run{oak-cli-command} ci agent run usage-report

Why Semantic Search Over Grep

GrepSemantic Search
Finds "UserService" literallyFinds code about user management regardless of naming
Misses synonyms (auth vs authentication)Understands concepts are related
Can't find "conceptually similar" codeGroups code by purpose, not text
No relevance rankingReturns most relevant first

Core Tables Overview

<!-- BEGIN GENERATED CORE TABLES -->
TablePurposeKey Columns
memory_observationsExtracted memories/learningsobservation, memory_type, status, context, tags, importance, session_origin_type
sessionsCoding sessions (launch to exit)id, agent, status, summary, title, started_at, created_at_epoch
prompt_batchesUser prompts within sessionssession_id, user_prompt, classification, response_summary
activitiesRaw tool executionssession_id, tool_name, file_path, success, error_message
agent_runsCI agent executionsagent_name, task, status, result, cost_usd, turns_used
session_link_eventsSession linking analyticssession_id, event_type, old_parent_id, new_parent_id
session_relationshipsSemantic session relationshipssession_a_id, session_b_id, relationship_type, similarity_score
agent_schedulesCron scheduling statetask_name, cron_expression, enabled, last_run_at, next_run_at
<!-- END GENERATED CORE TABLES -->

Memory Types

The memory_type column in memory_observations uses these values:

  • gotcha — Non-obvious behavior or quirk
  • bug_fix — Solution to a bug with root cause
  • decision — Architectural/design decision with rationale
  • discovery — General insight about the codebase
  • trade_off — Trade-off that was made and why
  • session_summary — LLM-generated session summary

Observation Status

The status column tracks lifecycle state:

  • active — Current and relevant (default for all new observations)
  • resolved — Issue was addressed in a later session
  • superseded — Replaced by a newer, more accurate observation

Resolving Observations

When oak_search or oak_context surfaces a gotcha, bug_fix, or discovery that you then address during your session, resolve it so future sessions don't see stale guidance:

  1. Note the observation id from the search results.
  2. After completing the fix or addressing the issue, call:
    • MCP: oak_resolve_memory(id="<observation-id>")
    • CLI: {oak-cli-command} ci resolve <observation-id>
  3. For superseded observations (replaced by a better one), use status="superseded".

When to resolve:

  • You fixed a bug that was tracked as a bug_fix observation
  • You addressed a gotcha (e.g., refactored the problematic code)
  • A discovery about a problem is no longer accurate after your changes
  • A decision was reversed or replaced by a new decision

When NOT to resolve:

  • The observation is still accurate even after your changes
  • You only partially addressed the issue
  • The observation is a permanent architectural insight (e.g., "service X uses eventual consistency")

Session Origin Types

The session_origin_type column classifies how the session that created the observation operated:

  • planning — Planning-phase session (high read:edit ratio, few modifications)
  • investigation — Exploration/debugging session (many reads, minimal edits)
  • implementation — Active coding session (significant file modifications)
  • mixed — Combined activity patterns

Planning/investigation observations are automatically capped at importance 5.

Essential Queries

Recent Sessions

sql
SELECT id, agent, title, status,
       datetime(created_at_epoch, 'unixepoch', 'localtime') as started,
       prompt_count, tool_count
FROM sessions
ORDER BY created_at_epoch DESC
LIMIT 10;

What Files Were Touched in a Session

sql
SELECT DISTINCT file_path, tool_name, count(*) as times
FROM activities
WHERE session_id = 'SESSION_ID' AND file_path IS NOT NULL
GROUP BY file_path, tool_name
ORDER BY times DESC;

Recent Memories

sql
SELECT memory_type, substr(observation, 1, 150) as observation,
       context,
       datetime(created_at_epoch, 'unixepoch', 'localtime') as created
FROM memory_observations
ORDER BY created_at_epoch DESC
LIMIT 20;

Agent Run History

sql
SELECT agent_name, task, status, turns_used,
       printf('$%.4f', cost_usd) as cost,
       datetime(created_at_epoch, 'unixepoch', 'localtime') as created
FROM agent_runs
ORDER BY created_at_epoch DESC
LIMIT 10;

Full-Text Search on Memories

sql
SELECT m.memory_type, m.observation, m.context
FROM memory_observations m
JOIN memories_fts fts ON m.rowid = fts.rowid
WHERE memories_fts MATCH 'authentication'
ORDER BY rank
LIMIT 10;

Scheduled Tasks

sql
SELECT task_name, enabled, cron_expression, description,
       datetime(last_run_at_epoch, 'unixepoch', 'localtime') as last_run,
       datetime(next_run_at_epoch, 'unixepoch', 'localtime') as next_run
FROM agent_schedules
ORDER BY next_run_at_epoch;

Observation Lifecycle Status

sql
SELECT status, count(*) as count
FROM memory_observations
GROUP BY status;

Active Observations from Planning Sessions

sql
SELECT substr(observation, 1, 120) as observation, memory_type,
       context, session_origin_type,
       datetime(created_at_epoch, 'unixepoch', 'localtime') as created
FROM memory_observations
WHERE status = 'active' AND session_origin_type = 'planning'
ORDER BY created_at_epoch DESC
LIMIT 20;

Resolution Provenance (what resolved what)

sql
SELECT m.id, substr(m.observation, 1, 100) as observation,
       m.resolved_by_session_id, s.title as resolving_session,
       m.resolved_at
FROM memory_observations m
LEFT JOIN sessions s ON m.resolved_by_session_id = s.id
WHERE m.status = 'resolved'
ORDER BY m.resolved_at DESC
LIMIT 10;

Important Notes

  • Always use -readonly flag with sqlite3 to prevent accidental writes
  • The database uses WAL mode — safe to read while the daemon is writing
  • Epoch timestamps are Unix seconds — use datetime(col, 'unixepoch', 'localtime') to format
  • FTS5 tables (activities_fts, memories_fts) use MATCH syntax, not LIKE
  • JSON columns (tool_input, files_affected, files_created) can be queried with json_extract()
  • Database location: .oak/ci/activities.db

Automated Analysis

For automated analysis that runs queries and produces reports:

bash
{oak-cli-command} ci agent run usage-report              # Cost and token usage trends
{oak-cli-command} ci agent run productivity-report       # Session quality and error rates
{oak-cli-command} ci agent run codebase-activity-report  # File hotspots and tool patterns
{oak-cli-command} ci agent run prompt-analysis           # Prompt quality and recommendations

Reports are written to oak/insights/ (git-tracked, team-shareable).

Deep Dives

For detailed guidance, consult the reference documents:

  • references/finding-related-code.md — Semantic search for code relationships and patterns
  • references/impact-analysis.md — Assessing change impact before refactoring
  • references/querying-databases.md — Full database querying guide with schema overview
  • references/schema.md — Complete CREATE TABLE statements, indexes, FTS5 tables (auto-generated)
  • references/queries.md — Advanced query cookbook with joins, aggregations, and debugging queries
  • references/analysis-playbooks.md — Structured multi-query workflows for usage, productivity, and activity analysis