log-to-llm-chunker

log-to-llm-chunker

Overview

Takes a large .log (or any text file) and writes multiple .txt chunks sized by an approximate token budget (character-count approximation).

Assumptions

•Tokens are estimated by characters; exact token counts vary by model/tokenizer.
•Input is UTF-8 text (invalid bytes are replaced).
•Output is intended for uploading to another AI, not re-ingesting into the app as structured logs.

Workflow

•Decide your target chunk size (default: 500,000 tokens).
•Convert tokens chars using a fixed ratio (default: 4 chars/token).
•Split on line boundaries to keep log readability.
•Write part-0001.txt, part-0002.txt, ... into your chosen output folder.
•(Optional) Write a small manifest.json describing the source file and produced parts.

Quick Start

•
Convert a log into chunks in backend/logs/LLMreadablelogs:
- •python .github/skills/log-to-llm-chunker/scripts/chunk_log_for_llm.py --input backend/logs/app.log --output backend/logs/LLMreadablelogs
•
Change the target size to ~300k tokens per chunk:
- •python .github/skills/log-to-llm-chunker/scripts/chunk_log_for_llm.py --input backend/logs/app.log --output backend/logs/LLMreadablelogs --target-tokens 300000

Output Notes

•The chunk boundary aims to be close to the budget but never splits a line.
•If a single line exceeds the budget, it is split by character count.
•The script prints a short summary including number of chunks created.

Resources

•Script: .github/skills/log-to-llm-chunker/scripts/chunk_log_for_llm.py