Import Model Skill
You are helping the user import new model evaluation results into the ReasonScape m12x dataset structure.
Workflow
When the user invokes /import <model-name>, follow these steps:
Important: Use the AskUserQuestion tool only for all user interactions to keep the flow smooth.
1. Search for Results
Search results/ directory for folders matching the model name pattern:
ls -1 results/ | grep -i "<model-name>"
Extract all unique model variants found and group by distinct configurations:
- •Quantization (e.g., fp16, AWQ, FP8)
- •Sampler variants (e.g., base vs sglang)
- •Context variants (e.g., default vs 16k)
- •Template and sampler combinations
Handle ambiguity:
- •If only ONE distinct variant found (with 3 replicas) → Proceed with that variant
- •If MULTIPLE distinct variants found → Use
AskUserQuestionto confirm which to import- •Example: Found both
GLM-4.7-Flash-fp16andGLM-4.7-Flash-sglang-fp16 - •Ask: "Which variant(s) should I import?" with options for each
- •Allow selecting multiple if user wants to import several at once
- •Example: Found both
2. Find HuggingFace Model ID
Search for the model using the HF CLI:
hf models ls --search "<model-name>" --limit 10
Parse the output to extract model IDs (format: author/model-name).
Use AskUserQuestion to confirm the HF ID:
- •If 1 clear match: Ask "Confirm HF model?" with the top result as default option
- •If multiple matches: Present top 3-5 as options and let user pick
- •Use the
AskUserQuestiontool for smooth confirmation without breaking flow
3. Determine Cohort and Examine Existing Structure
CRITICAL STEP: Before fetching metadata, determine the target cohort directory and examine what's already there.
3a. Determine Cohort Directory Name
From the model name and results folders, determine the cohort directory name.
What is a cohort? A cohort groups equivalent models together, including:
- •The base model and its variants (e.g., Qwen3-30B-A3B-Thinking + Qwen3-30B-A3B-Instruct)
- •All quantizations of the same model (fp16, AWQ, FP8, GPTQ)
- •Context extensions/REAPs (e.g., GLM-4.7 is a REAP of GLM-4.5)
A cohort does NOT include:
- •Significantly different model sizes (GLM-4.5-Air is 1/3 the size → separate cohort)
- •Different model architectures or generations
Examples:
- •
Qwen3-30B-A3B/- Contains both Thinking and Instruct variants, all quants - •
GLM-4.5/- Contains base model + GLM-4.7 REAP variant - •
GLM-4.5-Air/- Separate cohort (different size, not just a quant/variant) - •
MiroThinker-v1.5-30B/- All quants of this specific model
Determining cohort name:
- •Look at folder name patterns (e.g.,
2025-01-*_MiroThinker-v1.5-30B-fp16_*) - •Extract base model identifier:
MiroThinker-v1.5-30B,Qwen3-30B-A3B,GLM-4.5, etc. - •Check existing
data/m12x/directories for similar models - •Target cohort:
data/m12x/<ModelIdentifier>/
3b. Check if Cohort Exists
python ./cohort.py list --search '<ModelFamilyRegExp>'
Note that you can use both simple matching and regexp here.
3c. Examine Existing Structure (if cohort exists)
If the cohort directory exists, examine its current state:
# List existing evals in cohort python ./cohort.py list "data/m12x/<ModelFamily>" # Read existing evals.json cat "data/m12x/<ModelFamily>/evals.json" | jq .
Extract and understand:
- •Existing variants: What quantizations/contexts are already imported?
- •Example:
fp16default,fp16-16k,AWQ, etc.
- •Example:
- •Glob patterns: What naming conventions are used?
- •Example:
*-fp16_*vs*-fp16-16k_* - •Check if old globs need refinement to avoid matching new variants
- •Example:
- •Faceting conventions: What groups are assigned?
- •Are
quant:*facets present? - •Are
ctx:*facets used? - •What families/arch/size tags exist?
- •Are
- •HF IDs: Are there multiple HF repos for different variants?
- •Example: Thinking vs Instruct variants may have separate repos
- •Tags: Does this cohort have
"tags": ["leaderboard"]?
3d. Report Context
Before proceeding to metadata fetch, report:
- •Cohort directory:
data/m12x/<ModelFamily>/ - •Status: NEW or EXISTS
- •If EXISTS:
- •List existing variants and their facets
- •Note glob patterns that may need refinement
- •Identify any missing facets on existing evals (e.g.,
quant:fp16) - •Flag if new variant needs separate HF ID
This context is essential for correctly fetching metadata and generating the migration script.
4. Fetch Model Metadata
Once hf_id is confirmed, run modelinfo:
python analyze.py modelinfo --hf-id <hf_id> --output-dir /tmp/import-modelinfo
Read the generated MODELINFO.md and extract:
- •base_model: Extract base architecture family (e.g., "Qwen/Qwen3-30B" →
family:qwen3) - •Architecture: Extract arch type:
- •
*MoeForCausalLM→arch:moe - •
*ForCausalLM(non-MoE) →arch:dense - •Look for SSM/Mamba/hybrid indicators →
arch:ssmorarch:hybrid
- •
- •Model Type: Confirm the architecture classification
5. Extract Metadata from Folder Names
Parse the folder names to extract:
- •Model name with quant: e.g.,
MiroThinker-v1.5-30B-fp16,Qwen3-30B-A3B-Thinking-2507-fp16-16k - •Template: e.g.,
zeroshot-nosys,zerocot-nosys - •Sampler: e.g.,
qwen3-think-max,greedy-max - •Quantization: from model name suffix (fp16, AWQ, FP8, GPTQ)
- •Context length: from suffix like
-16k→ctx:16kfacet (ONLY if explicitly in folder name, otherwise omit)
6. Determine Faceting
Build the groups array:
Size (infer from parameter count in model name):
- •
tiny: <4B - •
small: 4-8B - •
mid: 8-20B - •
large: 20-100B - •
xlarge: >100B
Architecture (from modelinfo):
- •
arch:dense/arch:moe/arch:ssm/arch:hybrid
Quantization (from folder name - ALWAYS include):
- •
quant:fp16/quant:fp8/quant:awq/quant:gptq - •Add this facet even for "non-quantized" fp16 models
Context Length (from folder name - ONLY if explicit):
- •
ctx:16kif folder name contains-16k - •
ctx:32kif folder name contains-32k - •OMIT this facet if no context suffix in folder name
Families:
- •Base family from
base_model(e.g.,family:qwen3,family:llama) - •Finetune family from model name (e.g.,
family:mirothinker)
If base_model is missing or unclear, use AskUserQuestion with common options: qwen3, llama, mistral, phi, gemma
7. Generate Migration Script
Create a Python script that:
For each quantization variant found:
#!/usr/bin/env python3
import json
import shutil
from pathlib import Path
from glob import glob
# Model: <model-name>-<quant>
cohort_dir = Path("data/m12x/<ModelFamily>")
eval_json = cohort_dir / "evals.json"
print("Processing <model-name>-<quant>...")
# Create cohort directory if needed
cohort_dir.mkdir(parents=True, exist_ok=True)
# Move result folders
for src in glob("results/*<model-name>-<quant>*"):
dest = cohort_dir / Path(src).name
shutil.move(src, dest)
print(f" Moved {src} -> {dest}")
# Build new eval entry
new_eval = {
"evaluate": {"glob": "data/m12x/<ModelFamily>/*_<model-name>-<quant>_<template>_<sampler>_*/*"},
"filters": {"model": "<model-name>-<quant>", "template": "<template>", "sampler": "<sampler>"},
"label": "<Human Readable Label>",
"groups": [<facet-array>],
<tags-line-if-new-model>
"hf_id": "<hf_id>"<,quant_id-if-applicable>
}
# Create or append to evals.json
if eval_json.exists():
with open(eval_json) as f:
evals = json.load(f)
evals.append(new_eval)
else:
evals = [new_eval]
with open(eval_json, 'w') as f:
json.dump(evals, f, indent=2)
print("✓ Imported <model-name>-<quant>")
Important:
- •For
<tags-line-if-new-model>: Include"tags": ["leaderboard"],ifcohort_dirdid NOT exist before (brand new cohort), otherwise omit - •If cohort directory already exists, omit tags (manual promotion)
- •Add
"hf_quant_id": "<quant_id>"field if this is a quantized version - •When adding variants to existing models: Refine old globs to prevent matching new variants
- •Example: Change
*-fp16*to*-fp16_*so it won't match-fp16-16k_variants - •Each eval entry should match ONLY its intended variant
- •Example: Change
- •Always add missing facets to existing evals when updating (e.g., add
quant:fp16if missing)
8. Execute
Save the generated script to /tmp/import-<model-name>.py and execute it immediately:
python3 /tmp/import-<model-name>.py
9. Verify
After execution, verify:
python ./cohort.py list "$COHORT_DIR"
Report success and show the newly imported eval entries from the cohort listing.
Edge Cases
- •Cohort directory exists: This is normal - append to evals.json, don't error
- •Check if you need to refine existing globs to prevent matching new variants
- •Add missing facets to existing evals (e.g.,
quant:fp16)
- •Multiple HF matches: Show top 3, ask user to pick
- •No results found: Error and ask user to check the model name
- •Base model unclear: Ask user directly for base family
- •Multiple quantizations: Import all variants found (fp16, AWQ, FP8, etc.)
- •Multiple context lengths: Import all context variants (default, 16k, 32k, etc.)
- •Each context length gets its own eval entry
- •Only add
ctx:*facet when context is explicitly in folder name
- •Different model variants with separate HF IDs: Some model families have separate HF repos for variants (e.g., Qwen3-30B-A3B-Thinking-2507 vs Qwen3-30B-A3B-Instruct-2507)
- •Fetch metadata for each unique HF ID
- •Store correct
hf_idfor each eval entry
Size Band Reference
Quick lookup for parameter counts:
- •1B, 2B, 3B →
tiny - •4B, 7B, 8B →
small - •9B-19B →
mid - •20B-99B →
large - •100B+ →
xlarge
Example Invocations
Example 1: New Model Import
User: /import MiroThinker-v1.5-30B 1. Found: 3 replicas of MiroThinker-v1.5-30B-fp16 2. HF: miromind-ai/MiroThinker-v1.5-30B ✓ 3. Modelinfo: - base_model: Qwen/Qwen3-30B-A3B-Thinking-2507 - Architecture: Qwen3MoeForCausalLM 4. Extracted: - Template: zeroshot-nosys - Sampler: qwen3-think-max - Quantization: fp16 - Context: (none - no suffix) 5. Faceting: - family:qwen3 (base) - family:mirothinker (finetune) - arch:moe - size:large (30B) - quant:fp16 - (no ctx facet) 6. Generate script → execute automatically 7. Verify with cohort.py list 8. ✓ Imported to cohort data/m12x/MiroThinker-v1.5-30B/
Example 2: Adding Context Variant to Existing Model
User: /import Qwen3-30B-A3B 1. Found: 6 replicas (2 model variants × 3 replicas each) - Qwen3-30B-A3B-Thinking-2507-fp16-16k - Qwen3-30B-A3B-Instruct-2507-fp16-16k 2. HF: Qwen/Qwen3-30B-A3B-Thinking-2507, Qwen/Qwen3-30B-A3B-Instruct-2507 3. Cohort directory already exists at data/m12x/Qwen3-30B-A3B/ 4. Extracted: - Quantization: fp16 - Context: 16k (from "-16k" suffix) - Templates: zeroshot-nosys, zerocot-nosys - Samplers: qwen3-think-max, greedy-max 5. Faceting: - family:qwen3 - arch:moe - size:large - quant:fp16 - ctx:16k (NEW - because of -16k suffix) 6. Update existing evals.json: - Refine old globs: *-fp16* → *-fp16_* (prevent matching -16k variants) - Add missing quant:fp16 to existing entries - Add 2 new eval entries with ctx:16k facet 7. Execute automatically 8. Verify with cohort.py list 9. ✓ Imported 2 new variants to cohort data/m12x/Qwen3-30B-A3B/
Notes
- •Always work from the ReasonScape root directory
- •Activate venv before running analyze.py:
source venv/bin/activate - •Use
/tmp/for temporary modelinfo cache - •Be explicit about what you're doing at each step
- •If uncertain about anything, ask the user before proceeding