Notion LLM Config Table Management
Database Information
- •Database ID:
ddfd95bd-109a-4ac6-955c-90541cc53d5e - •Data Source ID:
0a9fafd6-2cc2-4d6b-b6f0-3797ea777421 - •Location: Family Notes workspace → "LLM config table"
- •Purpose: Track LLM model configurations across multiple providers (Llama, GPT-2, Qwen, others)
Data Sources
Llama Models
Primary source: ~/projects/github/llama-models/models/sku_list.py
- •Contains all registered Llama models with architecture parameters
- •Constants:
LLAMA2_VOCAB_SIZE = 32000,LLAMA3_VOCAB_SIZE = 128256 - •WARNING:
sku_list.pyhas incorrectffn_dim_multipliervalues for Llama 2 7B and 13B models- •sku_list.py shows 1.3 for all Llama 2 models, but actual checkpoints differ
- •Always verify d_ff values against HuggingFace configs, not just sku_list.py
GPT-2 Models
Primary source: HuggingFace config.json files
- •Base repos:
openai-community/gpt2,gpt2-medium,gpt2-large,gpt2-xl - •All variants: vocab_size=50257, max_context=1024, uses BPE tokenizer
- •Uses learned positional embeddings (not RoPE)
User Preferences
- •No emojis in any updates or content
- •Batch operations preferred over sequential updates
- •Focus on practical utility - add "Main" column for filtering to representative models
- •Verify calculations - user will catch errors, so double-check formulas
Database Schema Key Columns
Identity
- •Model Name (title), Model Family, Model Type, Core Model ID
- •HuggingFace Repo, Variant
Architecture Parameters
- •d_hidden = model hidden dimension (
dimfrom Llama arch_args) - •d_ff = feed-forward hidden dimension (see calculation formulas below)
- •d_ff / d_hidden Ratio = d_ff / d_hidden, rounded to 3 decimals
- •Llama 2: 2.688 (7B), 2.7 (13B), 3.5 (70B)
- •Llama 3+: 2.667 (3B), 3.125 (Guard INT4), 3.25 (405B), 3.5 (most common), 4.0 (1B)
- •GPT-2: 4.0 (all variants - standard transformer)
- •Null for models without d_ff (Llama 4 has empty arch_args)
- •Gated MLP (checkbox) = gated MLP/activation (SwiGLU for Llama, standard GELU for GPT-2)
- •n_layers, n_heads, n_kv_heads, head_dim
- •Multiple Of, Norm Eps
Tokenization
- •Tokenizer: Llama 2 Tokenizer (32k vocab) or Llama 3 Tokenizer (128k vocab)
- •Vocab Size: 32000 (Llama 2), 128256 (Llama 3+)
MoE Architecture
- •Is MoE (checkbox)
- •Num Experts: Llama 4: 16 (Scout), 128 (Maverick)
- •Top K Experts: Number of routed experts per token
- •MoE Routing: "Token Choice" (Llama 4) vs "Expert Choice"
- •Has Shared Expert: Llama 4 has 1 shared + 1 routed per token
- •Activated Params (B): Llama 4: 17B
- •Total Params (B): Llama 4: 109B (Scout), 400B (Maverick)
RoPE Configuration
- •RoPE Theta, RoPE Freq Base
- •Use Scaled RoPE (checkbox)
Other
- •Quantization Format, PTH File Count, Max Context Size
- •Main (checkbox) - marks representative models for filtering
Update Patterns
Property Updates (Preferred)
mcp__notion__notion-update-page({
"page_id": "page-id-here",
"command": "update_properties",
"properties": {
"Tokenizer": "Llama 3 Tokenizer",
"Vocab Size": 128256,
"Is MoE": "__YES__", # Checkboxes use __YES__/__NO__
"Num Experts": 16
}
})
Batch Updates
- •Use parallel tool calls when updating multiple independent pages
- •Group by model family for logical batching (e.g., all Scout models together)
Search & Fetch
# Search within database
mcp__notion__notion-search({
"query": "Llama 4",
"query_type": "internal",
"data_source_url": "collection://0a9fafd6-2cc2-4d6b-b6f0-3797ea777421"
})
# Fetch page details
mcp__notion__notion-fetch({"id": "page-id-or-url"})
Common Tasks
Adding New Models
- •Parse
~/projects/github/llama-models/models/sku_list.pyto get model data - •Extract arch_args and calculate derived fields (head_dim, d_ff using Llama formula, d_ff/d_hidden ratio)
- •Determine tokenizer based on model family (see Llama Model Family Mappings)
- •Set MoE fields for Llama 4 models
- •Set "Gated MLP" to Yes (all Llama models use SwiGLU)
- •Create pages in batches using
mcp__notion__notion-create-pages
Marking Representative Models
Representative "Main" models (user preference):
- •Llama 2: 7b chat, 70b chat
- •Llama 3.1: 8b instruct, 70b instruct, 405b instruct (FP8)
- •Llama 3.2: 1b instruct, 3b instruct, 11b vision instruct (user trains small models)
- •Llama 3.3: 70b instruct
- •Llama 4: Scout instruct, Maverick instruct
- •GPT-2: All variants (117M, 345M, 774M, 1.5B) marked as Main
Schema Updates
mcp__notion__notion-update-database({
"database_id": "ddfd95bd-109a-4ac6-955c-90541cc53d5e",
"properties": {
"New Column": {"type": "number", "number": {}}
}
})
Important Notes
Pitfalls to Avoid
- •Don't clear existing fields - Only specify properties you're updating
- •Column ordering - Cannot be changed via API (view-level setting in UI)
- •Checkbox format - Must use
"__YES__"or"__NO__", not boolean - •Llama 4 models - Have empty
arch_args={}in sku_list.py, no d_hidden/d_ff available
Model Family Mappings
Llama
- •llama2: Llama 2 Tokenizer, 32k vocab
- •Actual d_ff values (verified from HuggingFace):
- •7B: d_ff=11008 (NOT 14336 from sku_list.py formula)
- •13B: d_ff=13824 (NOT 17920 from sku_list.py formula)
- •70B: d_ff=28672 (correct in sku_list.py)
- •Actual d_ff values (verified from HuggingFace):
- •llama3, llama3_1, llama3_2, llama3_3, llama4, safety: Llama 3 Tokenizer, 128k vocab
GPT-2
- •gpt2: BPE tokenizer, 50257 vocab, 1024 max context
- •Naming: "GPT-2", "GPT-2 Medium", "GPT-2 Large", "GPT-2 XL"
- •Model Type: "base" (all are base models)
- •Parameter counts: 117M (base), 345M (medium), 774M (large), 1.5B (XL)
- •Config parameter mappings:
n_embd→d_hidden,n_layer→n_layers
Llama MoE Architecture (Llama 4 only)
- •Only Llama 4 models are MoE
- •Architecture: 1 shared expert (always active) + 1 routed expert (top_k=1)
- •Effective: 2 experts per token (shared + routed)
- •Routing: Token Choice (each token selects expert via router scores)
Llama Gated MLP / Activation Function
- •All Llama models use gated MLP with SwiGLU activation
- •Implementation:
w2(F.silu(w1(x)) * w3(x))(frommodels/llama3/model.py) - •Uses 3 weight matrices (w1, w2, w3) instead of standard 2-matrix FFN
- •SwiGLU = Swish-Gated Linear Unit (Swish is same as SiLU)
Calculation Formulas
General Derived Fields
head_dim = d_hidden / n_heads d_ff_d_hidden_ratio = round(d_ff / d_hidden, 3)
Llama d_ff Calculation
From actual Llama code (models/llama3/model.py):
# Initial: hidden_dim = 4 * dim
hidden_dim = int(2 * hidden_dim / 3) # = int(8 * dim / 3)
if ffn_dim_multiplier is not None:
hidden_dim = int(ffn_dim_multiplier * hidden_dim)
# Round up to multiple_of
hidden_dim = multiple_of * ((hidden_dim + multiple_of - 1) // multiple_of)
# Result is d_ff
IMPORTANT: Llama 2 Model-Specific Behavior
Llama 2 models use different ffn_dim_multiplier values than what's in sku_list.py:
- •7B & 13B: Use
ffn_dim_multiplier=None(pure 8d/3 formula)- •7B: d_ff=11008, ratio=2.688
- •13B: d_ff=13824, ratio=2.7
- •70B: Uses
ffn_dim_multiplier=1.3(as specified in sku_list.py)- •70B: d_ff=28672, ratio=3.5
This was verified from actual HuggingFace checkpoint configs. The sku_list.py incorrectly shows ffn_dim_multiplier=1.3 for all Llama 2 models.
Alternative formula from HuggingFace transformers (equivalent for models with multiple_of=256):
# For Llama 2 7B, 13B (no multiplier)
def compute_intermediate_size(n, multiple_of=256):
return int(math.ceil(n * 8 / 3) + multiple_of - 1) // multiple_of * multiple_of
GPT-2 d_ff Calculation
Standard transformer architecture:
d_ff = 4 * d_hidden # Always 4x for all GPT-2 variants
- •Not explicitly in config.json, but defined in model architecture
- •All GPT-2 models have d_ff/d_hidden ratio of exactly 4.0
Helpful Scripts & References
Llama
Working directory: ~/projects/github/llama-models/
Scripts:
- •Parse models:
parse_llama_models.py - •Prepare Notion data:
prepare_notion_data.py - •Update tokenizers:
update_tokenizers.py - •Bulk updates:
bulk_update_llama3.py - •Calculate d_ff:
fix_d_ff.py(correct d_ff calculation using actual Llama formula) - •Calculate ratios:
calculate_ratios.py(d_ff/d_hidden ratios and gated MLP status) - •Data files:
d_ff_corrections.json,complete_notion_updates.json,ratio_updates.json - •Documentation:
COMPLETION_SUMMARY.md(d_ff update history),STATUS.md(current status)
References:
- •Llama models repo:
~/projects/github/llama-models/ - •Model definitions:
models/sku_list.py - •Architecture files:
models/llama{2,3,4}/args.py - •MoE implementation:
models/llama4/moe.py
GPT-2
HuggingFace References:
- •Base:
https://huggingface.co/openai-community/gpt2 - •Medium:
https://huggingface.co/gpt2-medium - •Large:
https://huggingface.co/gpt2-large - •XL:
https://huggingface.co/gpt2-xl - •Config location:
/raw/main/config.json(append to repo URL)
Architecture Notes:
- •Uses learned positional embeddings (not RoPE) - add to Notes field
- •Activation: GELU (gelu_new variant) - not gated MLP
- •Standard attention (n_kv_heads = n_heads, no GQA)
- •All variants have head_dim = 64
- •Norm epsilon: 1e-05 (layer_norm_epsilon in config)