readable-hash-models

从语料文件（models/corpus）中，在此仓库中创建或更新可读哈希生成模型。当用户被要求构建新模型、生成 Rust 模块、推出新的哈希函数，或为基于语料的可读哈希添加特定任务的 n-gram 大小测试时，可使用此技能。

SKILL.md

--- frontmatter

name: readable-hash-models
description: Create or update readable-hash generator models in this repo from corpus files (models/corpus). Use when asked to build new models, generate Rust modules, expose new hash functions, or add tests for corpus-based readable hashes with task-specific n-gram sizes.

Readable Hash Model Creation

Quick start

•Confirm requirements

•Identify corpus file(s) in models/corpus/.
•Identify target n-gram size for this task (any integer).
•Identify desired Rust module + function names and output behavior (word list vs sentence, separators, max length).

•Prepare corpus

•Ensure the corpus is plain text in English.
•If the input is Project Gutenberg, use models/bin/prepare_gutenberg_corpus.sh to strip headers/footers.

•Train model

•Run python3 models/tokenize.py <inputs> -o models/training-data/.
•If the task requires a different n-gram size, update models/tokenize.py to accept a CLI option (e.g. --max-ngram) and use a descriptive variable name. Avoid single-letter names except i, j, k.
•Update models/README.md if new flags or behaviors are added.

•Generate Rust

•Run python3 models/generate_rust.py models/training-data/<name>-model.json.
•Write output to src/<model_name>.rs alongside src/english_word.rs.
•Ensure the generated Rust file embeds the model data and exposes a function that generates the readable hash output for that model.

•Wire public API

•Expose a new function in src/lib.rs that mirrors english_word_hash style.
•Keep API naming consistent with the requested model.

•Tests

•Add Cucumber feature files under tests/features/.
•Update tests/cucumber.rs step definitions if the new API needs coverage.

•Verify

•Run cargo fmt --all and cargo test.

References to read when needed

•models/README.md for model format and generation flow.
•models/tokenize.py for tokenization + model training behavior.
•models/generate_rust.py for Rust code generation details.
•src/english_word.rs and src/lib.rs for API patterns.
•tests/features/*.feature and tests/cucumber.rs for testing patterns.