ghost
Overview
Generate a ghost-library package (spec + tests + install prompt) from an existing library repo (in any source language).
Preserve behavior, not prose:
- •
tests.yamlis the behavior contract - •source tests are the primary evidence
- •code/docs/examples only fill gaps (never contradict tests)
The output is language-agnostic so the library can be implemented in any target language.
Fit / limitations
This approach works best when the library’s behavior can be expressed as deterministic data:
- •pure-ish operations (input -> output or error)
- •a runnable test suite covering the public API
It gets harder (but is still possible) when the contract depends on time, randomness, IO, concurrency, global state, or platform details. In those cases, make assumptions explicit in SPEC.md + VERIFY.md, and normalize nondeterminism into explicit inputs/outputs.
Hard rules (MUST / MUST NOT)
- •MUST treat upstream tests as authoritative; if docs/examples disagree, prefer tests and record the discrepancy.
- •MUST normalize nondeterminism into explicit inputs/outputs (no implicit "now", random seeds, locale surprises, unordered iteration).
- •MUST keep the ghost repo language-agnostic: ship no implementation code, adapter runner, or build tooling.
- •MUST paraphrase upstream docs; do not copy text verbatim.
- •MUST preserve upstream license files verbatim as
LICENSE*. - •MUST produce a verification signal and document it in
VERIFY.md(adapter runner preferred; sampling fallback allowed). - •MUST document provenance and regeneration in
VERIFY.md(upstream repo + revision, how artifacts were produced, and how to rerun verification).
Inputs
- •Source repo path (git working tree)
- •Output repo name/location (default: sibling directory
<repo-name>-ghost) - •Upstream identity + revision (remote URL if available; tag/commit SHA)
- •Public API surface if ambiguous (functions/classes/modules)
- •Source language/runtime + how to run upstream tests
- •Any required runtime assumptions (timezone, locale, units, encoding)
Conventions
Operation ids
tests.yaml keys are operation ids (stable identifiers for public API entries). Use a naming scheme that survives translation across languages:
- •
foo(top-level function) - •
module.foo(namespaced function) - •
Class#method(instance method) - •
Class.method(static/class method)
Avoid language-specific spellings in ids (e.g., avoid snake_case vs camelCase wars). Prefer the canonical name used by the source library’s docs.
tests.yaml version
tests.yaml MUST include a top-level version string that identifies the upstream library version used as evidence.
- •If the upstream library has a release version (SemVer/tag), use it.
- •Otherwise, use an immutable source revision identifier (e.g.,
git:<short-sha>orgit describe).
Workflow (tests-first)
1) Scope the source
- •Locate the test suite(s), examples, and primary docs (README, API docs, docs site).
- •Identify the public API and map each public operation to an operation id.
- •Use export/visibility cues to confirm what’s public:
- •JS/TS: package entrypoints + exports/re-exports
- •Python: top-level module +
__all__ - •Rust:
pubitems re-exported fromlib.rs - •Zig:
build.zigmodule graph (root_source_file,addModule,pub usingnamespace) is source of truth; defaults are oftensrc/root.zig(library) andsrc/main.zig(exe) but repos vary; treat C ABIexportas public only if documented - •C/C++: installed public headers + exported symbols; include macros/constants only if documented as API
- •Go: exported identifiers (Capitalized)
- •Java/C#:
publictypes/members in the target package/namespace - •Other: use the language’s visibility/export mechanism + published package entrypoints
- •Confirm which functions/classes are in scope:
- •public API + tests covering it
- •exclude internal helpers unless tests prove they are part of the contract
- •Decide the output directory as a new sibling repo unless the user overrides.
2) Harvest behavior evidence
- •Extract test cases and expected outputs; treat tests as authoritative.
- •When tests are silent, read code/docs to infer behavior and record the inference.
- •Note all boundary values, rounding rules, encoding rules, and error cases.
- •Normalize environment assumptions:
- •eliminate dependency on current time (use explicit timestamps)
- •force timezone/locale rules if relevant
- •remove nondeterminism (random seeds, unordered iteration)
3) Write SPEC.md (strict, language-agnostic)
- •Describe types abstractly (number/string/object/timestamp/bytes/etc.).
- •For bytes/buffers, define a canonical encoding (hex or base64) and use it consistently in
tests.yaml. - •Define normalization rules (e.g., timestamp parsing, string trimming, unicode, case folding).
- •Specify error behavior precisely (conditions), but keep the mechanism language-idiomatic.
- •Specify every public operation with inputs, outputs, rules, and edge cases.
- •Paraphrase source docs; do not copy text verbatim.
- •Use
references/templates.mdfor structure.
4) Generate tests.yaml (exhaustive)
- •Convert each source test into a YAML case under its operation id.
- •Include a top-level
versionstring (upstream library version or revision). - •Schema is intentionally strict and portable:
- •each case has
nameandinput - •each case has exactly one of
outputorerror: true - •keep to a portable YAML subset (no anchors/tags/binary) so it is easy to parse in many languages
- •quote ambiguous scalars (
yes,no,on,off,null) to avoid parser disagreements
- •each case has
- •Normalize inputs to deterministic values (avoid "now"; use explicit timestamps).
- •Keep or improve coverage across all public operations and failure modes.
- •If the source returns floats, prefer defining stable rounding/formatting rules so
outputis exact. - •Follow the format in
references/templates.md.
5) Add INSTALL.md + README.md + VERIFY.md + LICENSE*
- •
INSTALL.md: a short prompt for implementing the library in any language, referencingSPEC.mdandtests.yaml. - •
README.md: explain what the ghost library is, list operations, and describe the included files. - •
VERIFY.md: describe provenance + how the ghost artifacts were produced and verified against the source library (adapter-first; sampling fallback).- •include upstream repo identity + exact revision (tag or commit)
- •include the exact commands used to produce each artifact (or a single deterministic regeneration recipe)
- •include the exact commands used to run verification and the resulting pass/skip counts
- •include any environment normalization assumptions
- •
LICENSE*: preserve the upstream repo’s license files verbatim.- •copy common files like
LICENSE,LICENSE.md,COPYING* - •if no license file exists upstream, include a
LICENSEfile stating that no upstream license was found
- •copy common files like
6) Verify fidelity (must do)
- •Ensure
tests.yamlparses and case counts match or exceed the source tests covering the public API. - •Preferred: create a temporary adapter runner in the source language to run
tests.yamlagainst the existing library.- •if the source language has weak YAML tooling, parse YAML externally and dispatch into the library via a tiny CLI/FFI shim
- •assert outputs/errors match exactly
- •delete the adapter afterward; do not ship it in the ghost repo
- •summarize how to run it (and results) in
VERIFY.md
- •If a full adapter is infeasible:
- •run a representative sample across all operation ids (typical + boundary + error)
- •document the limitation clearly in
VERIFY.md
- •Use
references/verification.mdfor a checklist andVERIFY.mdtemplate.
Reproducibility and regen policy
- •The ghost repo must be reproducible: a future developer should be able to point at the upstream revision and rerun the extraction + verification.
- •Do not add regeneration scripts as tracked files unless the user explicitly asks; put the recipe in
VERIFY.mdinstead.
Output
Produce only these artifacts in the ghost repo:
- •
README.md - •
SPEC.md - •
tests.yaml - •
INSTALL.md - •
VERIFY.md - •
LICENSE*(copied from upstream) - •
.gitignore(optional, minimal)
Notes
- •Prefer precision over verbosity; rules should be unambiguous and testable.
- •Keep the ghost repo free of implementation code and packaging scaffolding.
Zig notes
- •Running upstream tests: prefer
zig build test(ifbuild.zigdefines tests); otherwisezig test path/to/file.zigfor the library root and any test entrypoints. - •Operation ids for methods: treat a first parameter named
selfof typeT/*Tas an instance method (T#method); otherwise useT.method. - •
comptimeparameters: record allowed values inSPEC.md, and represent them as ordinary fields intests.yamlinputs. - •Allocators/buffers: if the API takes
std.mem.Allocatoror caller-provided buffers, specify ownership and mutation rules; assume allocations succeed unless tests cover OOM. - •Errors: keep
tests.yamlstrict (error: trueonly); in a Zig adapter, treat "any error return" as a passing error case, and rely onSPEC.mdto pin the exact error conditions. - •YAML tooling: Zig stdlib has JSON but not YAML; for adapters/implementations it’s fine to convert
tests.yamlto JSON (or JSONL) as an intermediate and have a Zig runner parse it viastd.json.
Activation cues
- •"ghost" / "ghost library" / "ghostify" / "spec-ify" / "spec package"
- •"extract language-agnostic spec/tests"
Resources
- •
references/templates.md(artifact outlines and YAML format) - •
references/verification.md(verification checklist +VERIFY.mdtemplate)