SPL Expert (Local-First)

This skill assumes a repo layout like:

•spl-expert/kb/ (git submodule)
•spl-expert/spl_validator/ (git submodule; invoked via python3 -m spl_validator)
•spl-expert/official_conf/ (authoritative local ground truth for grammar/limits)

If those folders are not present, scripts may fall back to ../kb, ../spl_validator, ../official_conf for local development.

You should use search related features or mcp tools to search official documents or community if needed, via site:docs.splunk.com or site:help.splunk.com or site:community.splunk.com. And you should remember that the 10.0 in url https://help.splunk.com/en/splunk-enterprise/spl-search-reference/10.0/introduction/welcome-to-the-search-reference means the document is suite for splunk 10.0.

Workflow (always follow this loop)

•
If unfamiliar with the repo, read the component maps
- •spl-expert/references/kb-structure.md
- •spl-expert/references/spl-validator-structure.md
- •spl-expert/references/official-conf-structure.md
- •spl-expert/references/golden-prompts.md (sanity-check expected behavior)

0.5 Verify local layout

•python3 spl-expert/scripts/check_layout.py
•If installing into Codex (symlink/dev): mkdir -p "${CODEX_HOME:-$HOME/.codex}/skills" && ln -s "$(pwd)/spl-expert" "${CODEX_HOME:-$HOME/.codex}/skills/spl-expert"
•
If installing into Codex (copy/portable; no symlinks):
- •cp only: dest="${CODEX_HOME:-$HOME/.codex}/skills/spl-expert" && mkdir -p "$dest" && cp -R -L spl-expert/. "$dest/"
- •rsync: mkdir -p "${CODEX_HOME:-$HOME/.codex}/skills" && rsync -aL --exclude='.git' --exclude='__pycache__' spl-expert/ "${CODEX_HOME:-$HOME/.codex}/skills/spl-expert/"

•
Read essential documents (small, fixed set)
- •kb/spl/_overview.md
- •spl_validator/README.md
- •official_conf/searchbnf.conf (only via targeted extraction; do not load whole file)
- •official_conf/limits.conf (only via targeted extraction; do not load whole file)
•
Understand the user’s intent
- •Restate the goal in one sentence.
- •Ask for missing constraints only if blocking (index/sourcetype/time range, required fields, expected output shape).
•
State assumptions explicitly
- •Data In: what events/fields exist (and time window).
- •Data Out: what the final table/series should look like.
- •Constraints: performance expectations, limits, and “Splunk Enterprise 10.0 only” boundaries.
•
Search for essential knowledge (minimal retrieval)
- •Prefer targeted searches over opening whole docs.
- •
  Use these tools:
  - •KB search: python3 spl-expert/scripts/kb_search.py --query "…" --top 6
  - •Command syntax from BNF: python3 spl-expert/scripts/extract_command_syntax.py --command stats
  - •Exact stanza extraction (BNF/limits/etc.): python3 spl-expert/scripts/extract_stanza.py --file searchbnf.conf --stanza stats-command
- •Only open/read the few most relevant hits + snippets.
•
Design the data process flow (before writing SPL)
- •Write a short stage plan: filter → normalize → enrich → aggregate → present.
- •For each stage: list inputs, outputs, and what fields are created/removed.
•
Generate SPL
- •Translate each stage into concrete commands/functions.
- •Keep the pipeline readable; prefer “obvious” SPL over clever SPL.
•
Validate SPL (warnings pass, but must be shown)
- •
  Run validation with AST + flow:
  - •python3 spl-expert/scripts/validate_spl.py --spl "…"
- •
  Treat:
  - •valid=true as PASS (even with warnings)
  - •valid=false as FAIL
- •Always report warnings as feedback and, when feasible, improve the SPL to reduce them.
•
Loop
- •If validation fails (or AST/flow disagrees with the intended stage plan), go back to Search for essential knowledge, revise the flow/SPL, and re-validate.
•
Output
- •If PASS: output the final SPL and a short note of assumptions + warnings (if any).
- •If FAIL but you have strong evidence the validator is wrong: output the SPL with a clearly labeled “validator limitation” note and the evidence you used.

Output contract (make results auditable)

Always include these sections in your final answer:

•
Assumptions
- •Data In (source/index/sourcetype/time range; required fields; null/late events)
- •Data Out (final columns/series + meaning)
•
Process flow (stage plan)
- •Stage-by-stage pipeline intent (what changes at each step)
•
SPL
- •Final SPL query (single block)
•
Validation
- •spl_validator pass/fail, plus warnings as user feedback (warnings still pass)
•
Evidence
- •KB files used (paths) and any official_conf stanza names extracted (e.g. stats-command)

Stage plan template (fill before writing SPL)

•Stage 0 (Data source): index=… sourcetype=… + time window → events
•Stage 1 (Filter): keep only relevant events; note what gets excluded
•Stage 2 (Normalize/Extract): eval/rex/spath; list new fields
•Stage 3 (Enrich): lookups/joins; note keys and expected cardinality
•Stage 4 (Aggregate): stats/timechart/tstats; define grouping + outputs
•Stage 5 (Present): table/sort/rename; final columns/order

“Don’t load big files” rules (context hygiene)

•Never dump official_conf/searchbnf.conf or official_conf/limits.conf wholesale into context.
•
When you need grammar/limits details, extract only:
- •the specific command stanza (*-command)
- •the specific non-command stanza by exact name
- •or the smallest snippet that answers the question

Validation hygiene

•Only run --file validation on files that actually contain SPL (not Markdown/docs). If the user gives a .md, extract the SPL snippet into a temporary string and validate via --spl.

When `spl_validator` can be “absolutely wrong”

Only claim this when both are true:

•The SPL aligns with the local ground truth (BNF/KB) for Splunk Enterprise 10.0, and
•The validator error is attributable to a known validator limitation (coverage gap, unknown command/function registry, unsupported construct), not a genuine SPL issue.

In that case:

•Provide the SPL anyway.
•Clearly separate validator output (what it reported) vs your reasoning (why it’s likely a limitation).
•Suggest a conservative alternative SPL if possible.