Import Model Skill

You are helping the user import new model evaluation results into the ReasonScape m12x dataset structure.

Workflow

When the user invokes /import <model-name>, follow these steps:

Important: Use the AskUserQuestion tool only for all user interactions to keep the flow smooth.

1. Search for Results

Search results/ directory for folders matching the model name pattern:

bash

ls -1 results/ | grep -i "<model-name>"

Extract all unique model variants found and group by distinct configurations:

•Quantization (e.g., fp16, AWQ, FP8)
•Sampler variants (e.g., base vs sglang)
•Context variants (e.g., default vs 16k)
•Template and sampler combinations

Handle ambiguity:

•If only ONE distinct variant found (with 3 replicas) → Proceed with that variant
•
If MULTIPLE distinct variants found → Use AskUserQuestion to confirm which to import
- •Example: Found both GLM-4.7-Flash-fp16 and GLM-4.7-Flash-sglang-fp16
- •Ask: "Which variant(s) should I import?" with options for each
- •Allow selecting multiple if user wants to import several at once

2. Find HuggingFace Model ID

Search for the model using the HF CLI:

bash

hf models ls --search "<model-name>" --limit 10

Parse the output to extract model IDs (format: author/model-name).

Use AskUserQuestion to confirm the HF ID:

•If 1 clear match: Ask "Confirm HF model?" with the top result as default option
•If multiple matches: Present top 3-5 as options and let user pick
•Use the AskUserQuestion tool for smooth confirmation without breaking flow

3. Determine Cohort and Examine Existing Structure

CRITICAL STEP: Before fetching metadata, determine the target cohort directory and examine what's already there.

3a. Determine Cohort Directory Name

From the model name and results folders, determine the cohort directory name.

What is a cohort? A cohort groups equivalent models together, including:

•The base model and its variants (e.g., Qwen3-30B-A3B-Thinking + Qwen3-30B-A3B-Instruct)
•All quantizations of the same model (fp16, AWQ, FP8, GPTQ)
•Context extensions/REAPs (e.g., GLM-4.7 is a REAP of GLM-4.5)

A cohort does NOT include:

•Significantly different model sizes (GLM-4.5-Air is 1/3 the size → separate cohort)
•Different model architectures or generations

Examples:

•Qwen3-30B-A3B/ - Contains both Thinking and Instruct variants, all quants
•GLM-4.5/ - Contains base model + GLM-4.7 REAP variant
•GLM-4.5-Air/ - Separate cohort (different size, not just a quant/variant)
•MiroThinker-v1.5-30B/ - All quants of this specific model

Determining cohort name:

•Look at folder name patterns (e.g., 2025-01-*_MiroThinker-v1.5-30B-fp16_*)
•Extract base model identifier: MiroThinker-v1.5-30B, Qwen3-30B-A3B, GLM-4.5, etc.
•Check existing data/m12x/ directories for similar models
•Target cohort: data/m12x/<ModelIdentifier>/

3b. Check if Cohort Exists

bash

python ./cohort.py list --search '<ModelFamilyRegExp>'

Note that you can use both simple matching and regexp here.

3c. Examine Existing Structure (if cohort exists)

If the cohort directory exists, examine its current state:

bash

# List existing evals in cohort
python ./cohort.py list "data/m12x/<ModelFamily>"

# Read existing evals.json
cat "data/m12x/<ModelFamily>/evals.json" | jq .

Extract and understand:

•
Existing variants: What quantizations/contexts are already imported?
- •Example: fp16 default, fp16-16k, AWQ, etc.
•
Glob patterns: What naming conventions are used?
- •Example: *-fp16_* vs *-fp16-16k_*
- •Check if old globs need refinement to avoid matching new variants
•
Faceting conventions: What groups are assigned?
- •Are quant:* facets present?
- •Are ctx:* facets used?
- •What families/arch/size tags exist?
•
HF IDs: Are there multiple HF repos for different variants?
- •Example: Thinking vs Instruct variants may have separate repos
•Tags: Does this cohort have "tags": ["leaderboard"]?

3d. Report Context

Before proceeding to metadata fetch, report:

•Cohort directory: data/m12x/<ModelFamily>/
•Status: NEW or EXISTS
•
If EXISTS:
- •List existing variants and their facets
- •Note glob patterns that may need refinement
- •Identify any missing facets on existing evals (e.g., quant:fp16)
- •Flag if new variant needs separate HF ID

This context is essential for correctly fetching metadata and generating the migration script.

4. Fetch Model Metadata

Once hf_id is confirmed, run modelinfo:

bash

python analyze.py modelinfo --hf-id <hf_id> --output-dir /tmp/import-modelinfo

Read the generated MODELINFO.md and extract:

•base_model: Extract base architecture family (e.g., "Qwen/Qwen3-30B" → family:qwen3)
•
Architecture: Extract arch type:
- •*MoeForCausalLM → arch:moe
- •*ForCausalLM (non-MoE) → arch:dense
- •Look for SSM/Mamba/hybrid indicators → arch:ssm or arch:hybrid
•Model Type: Confirm the architecture classification

5. Extract Metadata from Folder Names

Parse the folder names to extract:

•Model name with quant: e.g., MiroThinker-v1.5-30B-fp16, Qwen3-30B-A3B-Thinking-2507-fp16-16k
•Template: e.g., zeroshot-nosys, zerocot-nosys
•Sampler: e.g., qwen3-think-max, greedy-max
•Quantization: from model name suffix (fp16, AWQ, FP8, GPTQ)
•Context length: from suffix like -16k → ctx:16k facet (ONLY if explicitly in folder name, otherwise omit)

6. Determine Faceting

Build the groups array:

Size (infer from parameter count in model name):

•tiny: <4B
•small: 4-8B
•mid: 8-20B
•large: 20-100B
•xlarge: >100B

Architecture (from modelinfo):

•arch:dense / arch:moe / arch:ssm / arch:hybrid

Quantization (from folder name - ALWAYS include):

•quant:fp16 / quant:fp8 / quant:awq / quant:gptq
•Add this facet even for "non-quantized" fp16 models

Context Length (from folder name - ONLY if explicit):

•ctx:16k if folder name contains -16k
•ctx:32k if folder name contains -32k
•OMIT this facet if no context suffix in folder name

Families:

•Base family from base_model (e.g., family:qwen3, family:llama)
•Finetune family from model name (e.g., family:mirothinker)

If base_model is missing or unclear, use AskUserQuestion with common options: qwen3, llama, mistral, phi, gemma

7. Generate Migration Script

Create a Python script that:

For each quantization variant found:

python

#!/usr/bin/env python3
import json
import shutil
from pathlib import Path
from glob import glob

# Model: <model-name>-<quant>
cohort_dir = Path("data/m12x/<ModelFamily>")
eval_json = cohort_dir / "evals.json"

print("Processing <model-name>-<quant>...")

# Create cohort directory if needed
cohort_dir.mkdir(parents=True, exist_ok=True)

# Move result folders
for src in glob("results/*<model-name>-<quant>*"):
    dest = cohort_dir / Path(src).name
    shutil.move(src, dest)
    print(f"  Moved {src} -> {dest}")

# Build new eval entry
new_eval = {
    "evaluate": {"glob": "data/m12x/<ModelFamily>/*_<model-name>-<quant>_<template>_<sampler>_*/*"},
    "filters": {"model": "<model-name>-<quant>", "template": "<template>", "sampler": "<sampler>"},
    "label": "<Human Readable Label>",
    "groups": [<facet-array>],
    <tags-line-if-new-model>
    "hf_id": "<hf_id>"<,quant_id-if-applicable>
}

# Create or append to evals.json
if eval_json.exists():
    with open(eval_json) as f:
        evals = json.load(f)
    evals.append(new_eval)
else:
    evals = [new_eval]

with open(eval_json, 'w') as f:
    json.dump(evals, f, indent=2)

print("✓ Imported <model-name>-<quant>")

Important:

•For <tags-line-if-new-model>: Include "tags": ["leaderboard"], if cohort_dir did NOT exist before (brand new cohort), otherwise omit
•If cohort directory already exists, omit tags (manual promotion)
•Add "hf_quant_id": "<quant_id>" field if this is a quantized version
•
When adding variants to existing models: Refine old globs to prevent matching new variants
- •Example: Change *-fp16* to *-fp16_* so it won't match -fp16-16k_ variants
- •Each eval entry should match ONLY its intended variant
•Always add missing facets to existing evals when updating (e.g., add quant:fp16 if missing)

8. Execute

Save the generated script to /tmp/import-<model-name>.py and execute it immediately:

bash

python3 /tmp/import-<model-name>.py

9. Verify

After execution, verify:

bash

python ./cohort.py list "$COHORT_DIR"

Report success and show the newly imported eval entries from the cohort listing.

Edge Cases

•
Cohort directory exists: This is normal - append to evals.json, don't error
- •Check if you need to refine existing globs to prevent matching new variants
- •Add missing facets to existing evals (e.g., quant:fp16)
•Multiple HF matches: Show top 3, ask user to pick
•No results found: Error and ask user to check the model name
•Base model unclear: Ask user directly for base family
•Multiple quantizations: Import all variants found (fp16, AWQ, FP8, etc.)
•
Multiple context lengths: Import all context variants (default, 16k, 32k, etc.)
- •Each context length gets its own eval entry
- •Only add ctx:* facet when context is explicitly in folder name
•
Different model variants with separate HF IDs: Some model families have separate HF repos for variants (e.g., Qwen3-30B-A3B-Thinking-2507 vs Qwen3-30B-A3B-Instruct-2507)
- •Fetch metadata for each unique HF ID
- •Store correct hf_id for each eval entry

Size Band Reference

Quick lookup for parameter counts:

•1B, 2B, 3B → tiny
•4B, 7B, 8B → small
•9B-19B → mid
•20B-99B → large
•100B+ → xlarge

Example Invocations

Example 1: New Model Import

code

User: /import MiroThinker-v1.5-30B

1. Found: 3 replicas of MiroThinker-v1.5-30B-fp16
2. HF: miromind-ai/MiroThinker-v1.5-30B ✓
3. Modelinfo:
   - base_model: Qwen/Qwen3-30B-A3B-Thinking-2507
   - Architecture: Qwen3MoeForCausalLM
4. Extracted:
   - Template: zeroshot-nosys
   - Sampler: qwen3-think-max
   - Quantization: fp16
   - Context: (none - no suffix)
5. Faceting:
   - family:qwen3 (base)
   - family:mirothinker (finetune)
   - arch:moe
   - size:large (30B)
   - quant:fp16
   - (no ctx facet)
6. Generate script → execute automatically
7. Verify with cohort.py list
8. ✓ Imported to cohort data/m12x/MiroThinker-v1.5-30B/

Example 2: Adding Context Variant to Existing Model

code

User: /import Qwen3-30B-A3B

1. Found: 6 replicas (2 model variants × 3 replicas each)
   - Qwen3-30B-A3B-Thinking-2507-fp16-16k
   - Qwen3-30B-A3B-Instruct-2507-fp16-16k
2. HF: Qwen/Qwen3-30B-A3B-Thinking-2507, Qwen/Qwen3-30B-A3B-Instruct-2507
3. Cohort directory already exists at data/m12x/Qwen3-30B-A3B/
4. Extracted:
   - Quantization: fp16
   - Context: 16k (from "-16k" suffix)
   - Templates: zeroshot-nosys, zerocot-nosys
   - Samplers: qwen3-think-max, greedy-max
5. Faceting:
   - family:qwen3
   - arch:moe
   - size:large
   - quant:fp16
   - ctx:16k (NEW - because of -16k suffix)
6. Update existing evals.json:
   - Refine old globs: *-fp16* → *-fp16_* (prevent matching -16k variants)
   - Add missing quant:fp16 to existing entries
   - Add 2 new eval entries with ctx:16k facet
7. Execute automatically
8. Verify with cohort.py list
9. ✓ Imported 2 new variants to cohort data/m12x/Qwen3-30B-A3B/

Notes

•Always work from the ReasonScape root directory
•Activate venv before running analyze.py: source venv/bin/activate
•Use /tmp/ for temporary modelinfo cache
•Be explicit about what you're doing at each step
•If uncertain about anything, ask the user before proceeding