Semantic Code Indexer
Parse codebases with tree-sitter, extract focal functions with nested call graphs, and perform bottom-up summarization using actual source code and callee context. Create the db in the .deeptest/ folder.
Complete Workflow
Step 1: Build Focal Function Call Graph
python scripts/build_focal.py \ --focal "DnsQueryEx" \ --file "querystub.c" \ --project ~/dns_full \ --db dns.db
Step 2: Bottom-Up Summarization and pre-/post-condition Annotation [long running task]
This step involves bottom up summarization, up the call graph, using actual source code and callee summaries to build context. Follow the instructions exactly, this is a long running task where you need to iteratively call scripts. Do not attempt to optimize by writing your own scripts to speed up the process.
Step 2a: Get next function to summarize
Find the next function to summarize
python scripts/summarizer.py --db dns.db --project ~/dns_full next
{
"status": "needs_summary",
"function": "LogError",
"file": "/home/user/dns_full/log.cpp",
"source_code": "void LogError(const char* fmt, ...) {\n va_list args;\n va_start(args, fmt);\n fnsVaLog(LOG_ERROR, fmt, args);\n va_end(args);\n}",
"callees": [
{
"function": "va_start",
"summary": "Initializes variable argument list processing"
},
{
"function": "fnsVaLog",
"summary": "Core logging function that formats messages with va_list"
},
{
"function": "va_end",
"summary": "Cleans up variable argument processing"
}
]
}
You have:
- •✅ Actual source code of
LogError - •✅ Summaries of all callees
Read the code! You can see it:
- •Takes variable arguments (
...) - •Initializes va_list with
va_start - •Calls
fnsVaLogwith LOG_ERROR level - •Cleans up with
va_end
Write a summary using this context and update the database:
Tips for Good Summaries
Summary Format: Write a concise paragraph summary covering the function's purpose, how outputs depend on inputs, any global or shared state it reads or mutates, and which callees have side effects, can fail, or contain complex branching that a test might need to exercise. Focus on these aspects:
- •Function's purpose - What does it do?
- •Input/output relationship - How outputs depend on inputs
- •State mutations - Any global or shared state it reads or mutates
- •Callee behavior - Which callees have side effects, can fail, or contain complex branching that a test might need to exercise
Best Practices:
- •Actually read the source code - Don't just rely on function names
- •Use callee summaries - They tell you what dependencies do and their important behaviors
- •Look for control flow - Loops, conditions, error handling
- •Note side effects - File I/O, global state, logging, network calls
- •Be specific - "Validates X by checking Y" not "Validates input"
- •Include callee context - Mention which callees do the heavy lifting or can possibly fail
python scripts/summarizer.py --db dns.db update \ --function "LogError" \ --summary "Logs error messages by initializing variable argument processing with va_start, passing formatted arguments to fnsVaLog at LOG_ERROR level, then cleaning up with va_end"
Step 2b: Add Preconditions and Postconditions
Annotate precondition
python scripts/summarizer.py --db dns.db annotate \ --function "LogError" \ --type precondition \ # or postcondition --text "fmt is a valid format string"
pre-conditions should cover: Required contracts on the input parameters (e.g., non-empty list, non-null fields) Required environment/state Assumptions about invariants (e.g., IDs are unique, timestamps are monotonic) Any conditions that gate deeper branches (e.g., feature flag enabled, debug mode on)
post-conditions should cover: Return value guarantees (type/shape, relationships between fields, sentinel values) State changes (files written, DB rows updated, caches mutated, globals modified) Error behavior (what exceptions/errors can occur and under what inputs/states)
Continue calling next and update to summarize and annotate more functions until the focal function you called build_focal.py on is fully summarized.
Check Progress
python scripts/summarizer.py --db dns.db status
{
"total_functions": 89,
"summarized": 67,
"remaining": 22,
"progress_percent": 75.3,
"leaf_functions": 18,
"call_edges": 234
}