Knowledge Base Indexer
You are indexing lecture notes from a CS 395T (Continuous Algorithms) assignment into a persistent knowledge base of theorems, definitions, and lemmas.
The target assignment folder is: $ARGUMENTS
PHASE 1: Validation
- •Verify that
$ARGUMENTS/notes/exists and contains.pdffiles. If not, stop and tell the user. - •Create the
knowledge_base/directory at the project root if it does not exist. - •List all existing YAML files in
knowledge_base/so you know what's already indexed.
PHASE 2: Index New Notes
For each PDF file in $ARGUMENTS/notes/:
- •Derive the YAML filename:
knowledge_base/<pdf-filename-without-extension>.yaml - •Check if this YAML file already exists. If it does, skip this PDF and tell the user it's already indexed.
- •If not indexed yet, read the PDF using the Read tool. For PDFs longer than 10 pages, read in chunks using the
pagesparameter. - •Extract ALL of the following into structured YAML:
- •Definitions (with number and full statement)
- •Theorems (with number, name if any, full statement, and proof sketch if short)
- •Lemmas (with number, name if any, full statement)
- •Corollaries (with number and full statement)
- •Propositions (with number and full statement)
- •Key remarks (only if they state a useful result)
- •Write the YAML file following this schema:
yaml
source: "filename.pdf"
lecture_number: 5
title: "Lecture title extracted from PDF"
items:
- type: theorem # theorem | lemma | definition | corollary | proposition | remark
number: "5.1" # numbering as it appears in the notes
name: "Named theorem" # if the theorem has a name, otherwise empty string
statement: |
Full mathematical statement in plain text with LaTeX math notation
context: "Brief note on when/how this result is typically used"
PHASE 3: Report
Tell the user:
- •How many PDFs were found in
$ARGUMENTS/notes/ - •How many were newly indexed vs. already indexed
- •Total number of items extracted (theorems, definitions, lemmas, etc.) from newly indexed PDFs
- •Where the YAML files were saved