Knowledge Base Indexer

Name: learn
Rating: 65
Author: Surya-Sunkari

You are indexing lecture notes from a CS 395T (Continuous Algorithms) assignment into a persistent knowledge base of theorems, definitions, and lemmas.

The target assignment folder is: $ARGUMENTS

PHASE 1: Validation

•Verify that $ARGUMENTS/notes/ exists and contains .pdf files. If not, stop and tell the user.
•Create the knowledge_base/ directory at the project root if it does not exist.
•List all existing YAML files in knowledge_base/ so you know what's already indexed.

PHASE 2: Index New Notes

For each PDF file in $ARGUMENTS/notes/:

•Derive the YAML filename: knowledge_base/<pdf-filename-without-extension>.yaml
•Check if this YAML file already exists. If it does, skip this PDF and tell the user it's already indexed.
•If not indexed yet, read the PDF using the Read tool. For PDFs longer than 10 pages, read in chunks using the pages parameter.
•
Extract ALL of the following into structured YAML:
- •Definitions (with number and full statement)
- •Theorems (with number, name if any, full statement, and proof sketch if short)
- •Lemmas (with number, name if any, full statement)
- •Corollaries (with number and full statement)
- •Propositions (with number and full statement)
- •Key remarks (only if they state a useful result)
•Write the YAML file following this schema:

yaml

source: "filename.pdf"
lecture_number: 5
title: "Lecture title extracted from PDF"
items:
  - type: theorem          # theorem | lemma | definition | corollary | proposition | remark
    number: "5.1"          # numbering as it appears in the notes
    name: "Named theorem"  # if the theorem has a name, otherwise empty string
    statement: |
      Full mathematical statement in plain text with LaTeX math notation
    context: "Brief note on when/how this result is typically used"

PHASE 3: Report

Tell the user:

•How many PDFs were found in $ARGUMENTS/notes/
•How many were newly indexed vs. already indexed
•Total number of items extracted (theorems, definitions, lemmas, etc.) from newly indexed PDFs
•Where the YAML files were saved