AgentSkillsCN

data-standards

数据分类、命名规范与枚举完整性防护准则。在创建任何新图表、筛选器或数据展示之前,务必启用此功能。它强制要求统一使用集中化的常量来定义地区、卧室数量、楼层高度、销售类型、租期年限以及年龄区间,从而避免硬编码字符串与枚举值的偏差。该功能由“数据标准”与“枚举完整性防护准则”合并而成。

SKILL.md
--- frontmatter
name: data-standards
description: Data classification, naming standards, and enum integrity guardrail. ALWAYS activate before creating ANY new chart, filter, or data display. Enforces consistent use of centralized constants for regions, bedrooms, floor levels, sale types, tenures, and age bands. Prevents hardcoded strings and enum drift. Merged from data-standards and enum-integrity-guardrails.

Data Standards Guardrail

Purpose

Ensure all data classifications, labels, and naming conventions are consistent across backend and frontend. All new features MUST use centralized constants. Prevent taxonomy/enum drift.


Part 1: Single Source of Truth Files

DomainBackendFrontend
All Classificationsbackend/constants.pyfrontend/src/constants/index.js
Enums (Sale Type, Tenure)backend/api/contracts/contract_schema.pyfrontend/src/schemas/apiContract.js

RULE: If a constant doesn't exist in these files, ADD IT THERE FIRST.


Part 2: Region/Segment Standards

Canonical Values

CodeFull NameDistricts
CCRCore Central RegionD01, D02, D06, D07, D09, D10, D11
RCRRest of Central RegionD03, D04, D05, D08, D12, D13, D14, D15, D20
OCROutside Central RegionD16-D19, D21-D28

Usage

javascript
// CORRECT - Use constants
import { REGIONS, CCR_DISTRICTS, getRegionForDistrict, REGION_BADGE_CLASSES } from '../constants';

REGIONS.forEach(region => { ... });  // ['CCR', 'RCR', 'OCR']
const region = getRegionForDistrict(district);
const badgeClass = REGION_BADGE_CLASSES[region];

// FORBIDDEN - Hardcoded
const regions = ['CCR', 'RCR', 'OCR'];  // Use REGIONS constant instead
if (district === 'D01') region = 'CCR';  // Use getRegionForDistrict()

Backend

python
# CORRECT
from constants import get_region_for_district, CCR_DISTRICTS

# FORBIDDEN
if district in ['D01', 'D02', 'D06', ...]:  # Hardcoded list

Part 3: Bedroom Standards

Canonical Values

CountShort LabelFull LabelAPI Value
11BR1-Bedroom1
22BR2-Bedroom2
33BR3-Bedroom3
44BR4-Bedroom4
5+5BR+5-Bedroom+5

Bedroom Classification (Three-Tier System)

SINGLE SOURCE OF TRUTH:

  • Frontend: frontend/src/constants/index.js
  • Backend: backend/services/classifier.py

URA data doesn't include bedroom count. We estimate based on unit area (sqft) with three tiers:

TierContext1BR2BR3BR4BR5BR+
Tier 1New Sale ≥ Jun 2023 (Post-Harmonization)<580<780<1150<1450≥1450
Tier 2New Sale < Jun 2023 (Pre-Harmonization)<600<850<1200<1500≥1500
Tier 3Resale (any date)<600<950<1350<1650≥1650

Why three tiers?

  • Tier 1 (Ultra Compact): After June 2023 AC ledge removal rules, developers build smaller units
  • Tier 2 (Modern Compact): Pre-2023 new sales still had AC ledges counted in GFA
  • Tier 3 (Legacy): Resale units are typically larger (older developments)

Usage

javascript
// CORRECT - Use constants
import {
  BEDROOM_ORDER,           // ['1BR', '2BR', '3BR', '4BR', '5BR+']
  BEDROOM_ORDER_NUMERIC,   // [1, 2, 3, 4, 5]
  BEDROOM_THRESHOLDS_TIER1,
  BEDROOM_THRESHOLDS_TIER2,
  BEDROOM_THRESHOLDS_TIER3,
  classifyBedroom,         // Simple fallback
  classifyBedroomThreeTier, // Full 3-tier logic
  getBedroomLabelShort,
} from '../constants';

// Classify a unit
const bedroom = classifyBedroomThreeTier(750, 'New Sale', '2024-01-15');
// Returns: 2 (Tier 1: 750 < 780)

// For display/sorting
BEDROOM_ORDER.forEach(br => { ... });
const label = getBedroomLabelShort(2);  // "2BR"

// FORBIDDEN - Hardcoded
const BEDROOM_ORDER = ['1BR', '2BR', '3BR', '4BR', '5BR+'];  // Use constant
if (area < 580) bedroom = 1;  // Use classifyBedroomThreeTier()

Backend Usage

python
# CORRECT - Use services/classifier.py
from services.classifier import classify_bedroom_three_tier, classify_bedroom

bedroom = classify_bedroom_three_tier(750, 'New Sale', date(2024, 1, 15))

# FORBIDDEN
if area < 580:
    bedroom = 1  # Hardcoded threshold

Part 4: Floor Level Standards

Canonical Values

LevelFloor RangeSort Order
Low01-050
Mid-Low06-101
Mid11-202
Mid-High21-303
High31-404
Luxury41+5

Usage

javascript
// CORRECT
import { FLOOR_LEVELS, FLOOR_LEVEL_LABELS, getFloorLevelColor } from '../constants';

FLOOR_LEVELS.forEach(level => {
  const label = FLOOR_LEVEL_LABELS[level];
  const color = getFloorLevelColor(level);
});

// FORBIDDEN
const levels = ['Low', 'Mid-Low', 'Mid', 'Mid-High', 'High', 'Luxury'];  // Hardcoded

Part 5: Sale Type Standards

Canonical Values

Enum KeyDB ValueDisplay Label
NEW_SALENew SaleNew Sale
RESALEResaleResale
SUB_SALESub SaleSub Sale

Usage

javascript
// CORRECT - Use enum helpers
import { isSaleType, SaleType, SaleTypeLabels } from '../schemas/apiContract';

if (isSaleType.newSale(row.saleType)) { ... }
const label = SaleTypeLabels[SaleType.RESALE];

// FORBIDDEN - Hardcoded strings
if (row.sale_type === 'New Sale') { ... }
const label = 'Resale';

Backend

python
# CORRECT
from constants import SALE_TYPE_NEW, SALE_TYPE_RESALE
from api.contracts.contract_schema import SaleType

# FORBIDDEN
if sale_type == 'New Sale':

Part 6: Tenure Standards

Canonical Values

Enum KeyDB ValueFull LabelShort Label
FREEHOLDFreeholdFreeholdFH
LEASEHOLD_9999-year99-year Leasehold99yr
LEASEHOLD_999999-year999-year Leasehold999yr

Usage

javascript
// CORRECT
import { isTenure, TenureLabelsShort } from '../schemas/apiContract';

const shortLabel = TenureLabelsShort[row.tenure];

// FORBIDDEN
const label = row.tenure === 'Freehold' ? 'FH' : '99yr';

Part 7: Property Age Band Standards

Canonical Values

KeyLabelAge RangeSource
new_saleNew SaleN/Asale_type
recently_topRecently TOP4-8 yrsage
young_resaleYoung Resale8-15 yrsage
resaleResale15-25 yrsage
mature_resaleMature Resale25+ yrsage
freeholdFreeholdN/Atenure

Usage

javascript
// CORRECT
import { getAgeBandKey, AGE_BAND_LABELS_SHORT } from '../constants';

const band = getAgeBandKey(age, isFreehold, isNewSale);
const label = AGE_BAND_LABELS_SHORT[band];

// FORBIDDEN
if (age < 5) band = 'new';  // Wrong classification

Part 8: Filter Parameter Standards

Two-Layer Naming Convention

Filter parameters use different naming at different layers:

code
┌─────────────────────────────────────────────────────────────────┐
│  FRONTEND                                                        │
│  buildApiParams() → { district: 'D01,D02', bedroom: '2,3' }     │
│                           ↓ SINGULAR                             │
├─────────────────────────────────────────────────────────────────┤
│  API BOUNDARY (HTTP Request)                                     │
│  ?district=D01,D02&bedroom=2,3&segment=CCR                      │
│                           ↓ SINGULAR                             │
├─────────────────────────────────────────────────────────────────┤
│  ROUTE HANDLER (routes/*.py)                                     │
│  Parses & normalizes → { districts: [...], bedrooms: [...] }    │
│                           ↓ PLURAL                               │
├─────────────────────────────────────────────────────────────────┤
│  SERVICE LAYER (services/*.py)                                   │
│  filters.get('districts'), filters.get('bedrooms')              │
│                           = PLURAL                               │
└─────────────────────────────────────────────────────────────────┘

API Parameters (Singular) — Frontend & HTTP

ConceptAPI ParamFormatExample
DistrictdistrictComma-separateddistrict=D01,D02
BedroombedroomComma-separatedbedroom=2,3
SegmentsegmentComma-separatedsegment=CCR,RCR
Sale Typesale_typeSingle valuesale_type=Resale
Date Rangedate_from, date_toISO datedate_from=2024-01-01
PSF Rangepsf_min, psf_maxNumberpsf_min=1500
Size Rangesize_min, size_maxNumbersize_min=800
TenuretenureSingle valuetenure=Freehold
ProjectprojectStringproject=Parc

Service Filters (Plural) — Backend Services

ConceptService KeyTypeExample
DistrictsdistrictsList[str]['D01', 'D02']
BedroomsbedroomsList[int][2, 3]
SegmentssegmentsList[str]['CCR', 'RCR']
Sale Typesale_typestr'Resale'
Date Rangedate_from, date_todatePython date objects

Frontend Usage

javascript
// CORRECT - API params are SINGULAR, comma-separated strings
const params = {
  district: 'D01,D02',      // Singular key, comma-separated
  bedroom: '2,3',           // Singular key, comma-separated
  segment: 'CCR',           // Singular key
  sale_type: 'Resale',      // snake_case
};

// FORBIDDEN - Never use plural or arrays in API params
const params = {
  districts: ['D01', 'D02'],  // Wrong: plural, array
  bedrooms: '2,3',            // Wrong: plural
  region: 'CCR',              // Wrong: use 'segment'
};

Backend Route Handler

python
# Route handler normalizes singular → plural for services
@app.route('/api/data')
def get_data():
    # Parse singular API params
    district_param = request.args.get('district', '')
    bedroom_param = request.args.get('bedroom', '')

    # Normalize to plural for service layer
    filters = {
        'districts': [d.strip() for d in district_param.split(',') if d.strip()],
        'bedrooms': [int(b) for b in bedroom_param.split(',') if b.strip()],
    }

    return service.get_data(filters)

Backend Service

python
# Services expect PLURAL keys with list values
def get_data(filters: dict):
    districts = filters.get('districts', [])  # List[str]
    bedrooms = filters.get('bedrooms', [])    # List[int]
    segments = filters.get('segments', [])    # List[str]
    sale_type = filters.get('sale_type')      # str (singular - not a list)

Why This Convention?

  1. API params are singular — Matches HTTP convention (?id=1,2 not ?ids=1,2)
  2. Service filters are plural — Semantically correct for lists (districts contains multiple districts)
  3. Route handlers bridge the gap — Single place for normalization/validation

Part 9: Response Field Naming

API Response Conventions

v1 (snake_case)v2 (camelCase)Description
median_psfmedianPsfMedian price per sqft
sale_typesaleTypeTransaction type
bedroom_countbedroomCountNumber of bedrooms
floor_levelfloorLevelFloor classification

Adapter Pattern

javascript
// Adapters normalize v1/v2 responses - components use camelCase
const transformedData = transformTimeSeries(response.data);
// Now: data.medianPsf (never data.median_psf)

Part 10: Color Standards

Chart Colors (from palette)

ElementColorHex
CCRDeep Navy#213448
RCROcean Blue#547792
OCRSky Blue#94B4C1
BackgroundSand/Cream#EAE0CF

Bedroom Colors

javascript
// Defined in constants - DO NOT hardcode
import { getBedroomColor } from '../constants'; // Add if missing

Part 10b: Common Mistakes Quick Reference

Anti-PatternSymptomGrep to FindFix
Hardcoded region arrayOut-of-sync with constantsgrep -rn "\['CCR'.*'RCR'.*'OCR'\]" frontend/src/Use REGIONS from constants
Hardcoded bedroom thresholdWrong classificationgrep -rn "< 580|< 600|< 780" frontend/src/Use classifyBedroomThreeTier()
String enum comparisonCase mismatch breaks filtergrep -rn "=== 'New Sale'|=== 'Resale'" frontend/src/Use isSaleType.newSale()
Plural API paramsBackend doesn't parsegrep -rn "districts=|bedrooms=" frontend/src/Use singular: district=, bedroom=
Hardcoded floor levelsMissing levels, wrong ordergrep -rn "\['Low'.*'Mid'\]" frontend/src/Use FLOOR_LEVELS from constants
Inline color hexInconsistent palettegrep -rn "#[0-9A-Fa-f]{6}" frontend/src/components/Use constants (REGION_COLORS)

Quick Audit Commands

bash
# Find hardcoded region arrays
grep -rn "'CCR'\|'RCR'\|'OCR'" frontend/src/components/ | grep -v "import\|from"

# Find hardcoded bedroom thresholds
grep -rn "< 580\|< 600\|< 780\|< 950" frontend/src/

# Find hardcoded sale type strings
grep -rn "'New Sale'\|'Resale'\|'Sub Sale'" frontend/src/ backend/

# Find plural API params (should be singular)
grep -rn "districts=\|bedrooms=\|segments=" frontend/src/

# Find hardcoded color hex
grep -rn "\"#[0-9A-Fa-f]\{6\}\"" frontend/src/components/powerbi/

# Find floor level hardcoding
grep -rn "'Low'\|'Mid'\|'High'\|'Luxury'" frontend/src/ | grep -v import

Part 11: Pre-Commit Checklist

Before any chart/filter/data change:

code
[ ] No hardcoded region strings - use REGIONS constant
[ ] No hardcoded bedroom labels - use BEDROOM_ORDER constant
[ ] No hardcoded bedroom thresholds - use classifyBedroomThreeTier()
[ ] No hardcoded floor levels ('Low', 'Mid', etc.)
[ ] No hardcoded sale types ('New Sale', 'Resale')
[ ] No hardcoded tenure strings ('Freehold', '99-year')
[ ] All filter params use singular form (district, bedroom, segment)
[ ] Colors from constants/palette, not hardcoded hex
[ ] New constants added to both backend AND frontend files

Part 12: Adding New Classifications

When you need a new classification (e.g., new age band, new floor tier):

Step 1: Add to Backend Constants

python
# backend/constants.py
NEW_CLASSIFICATION = 'value'
NEW_CLASSIFICATION_LABELS = { ... }

Step 2: Add to Frontend Constants

javascript
// frontend/src/constants/index.js
export const NEW_CLASSIFICATION = 'value';
export const NEW_CLASSIFICATION_LABELS = { ... };

Step 3: Add to API Contract (if enum)

python
# backend/api/contracts/contract_schema.py
class NewEnum(str, Enum):
    VALUE = 'value'
javascript
// frontend/src/schemas/apiContract.js
export const NewEnum = { VALUE: 'value' };

Step 4: Update This Document

Add the new classification to the appropriate section above.


Quick Reference Card

code
DATA STANDARDS CHECKLIST

BEFORE WRITING ANY DATA CODE:
[ ] Check constants/index.js for existing values
[ ] Check apiContract.js for enum helpers
[ ] Use helper functions (getRegionForDistrict, isSaleType, etc.)
[ ] Never hardcode classification strings
[ ] Response fields: use adapters for v1/v2 normalization

FILTER NAMING (Two-Layer Convention):
┌──────────────────────────────────────────────────────┐
│ API PARAMS (singular)  →  SERVICE FILTERS (plural)   │
│ district               →  districts                  │
│ bedroom                →  bedrooms                   │
│ segment                →  segments                   │
│ sale_type              →  sale_type (stays singular) │
└──────────────────────────────────────────────────────┘

Frontend: params.district = 'D01,D02'     // SINGULAR
Route:    filters['districts'] = [...]    // PLURAL
Service:  filters.get('districts', [])    // PLURAL

ADDING NEW CLASSIFICATION:
1. Add to backend/constants.py
2. Add to frontend/src/constants/index.js
3. Add to contract_schema.py if enum
4. Update this skill document

PART 2: ENUM INTEGRITY (Merged from enum-integrity-guardrails)

Trigger: Before modifying age bands, sale types, regions, or any categorical bucket keys.

Core Rule

All categorical "bucket" keys (age bands, sale type, region, labels, etc.) MUST come from the canonical enums in:

  • backend/api/contracts/contract_schema.py

No other file (SQL, routes, frontend, utils) may invent, rename, or extend bucket keys.


Enum Single Source of Truth

Canonical: PropertyAgeBucket (and other enums) in backend/api/contracts/contract_schema.py

Allowed:

  • Backend computes age_band using the canonical enum keys
  • API returns age_band values that are exactly one of the enum keys

Forbidden:

  • Adding new bucket keys not present in the enum (e.g., just_top)
  • Duplicating bucket definitions in SQL or frontend
  • Hardcoding bucket strings outside contract_schema.py

Classification Location (Backend Only)

Age band classification MUST happen in backend code, using PropertyAgeBucket.classify().

Required pattern:

python
# Use canonical classifier
from api.contracts.contract_schema import PropertyAgeBucket

age_band = PropertyAgeBucket.classify(
    age=property_age,
    sale_type=sale_type,
    tenure=tenure
)

Forbidden patterns:

  • SQL returns string bucket keys (e.g., literal('recently_top'))
  • Frontend computes or overrides age_band
  • Duplicating classification logic in multiple places

Dynamic SQL from Enums

When SQL needs to GROUP BY bucket, build the CASE dynamically from PropertyAgeBucket:

python
# Build CASE conditions from PropertyAgeBucket.AGE_RANGES
age_conditions = [
    (func.lower(Transaction.sale_type) == 'new sale', literal(PropertyAgeBucket.NEW_SALE)),
    (Transaction.tenure.ilike('%freehold%'), literal(PropertyAgeBucket.FREEHOLD)),
    (Transaction.lease_start_year.is_(None), literal('unknown')),
]

for bucket, (min_age, max_age) in PropertyAgeBucket.AGE_RANGES.items():
    if max_age is None:
        condition = property_age >= min_age
    else:
        condition = and_(property_age >= min_age, property_age < max_age)
    age_conditions.append((condition, literal(bucket)))

age_band_case = case(*age_conditions, else_=literal('unknown'))

Enum Integrity Contract Tests

Tests in tests/test_contract_schema.py enforce:

  • test_classify_returns_valid_keys_only - classify() only returns valid keys
  • test_enum_key_snapshot - enum keys don't change without intent
  • test_age_band_boundaries - exact age range behavior

If enum keys change, tests fail immediately.


Enum Integrity Checklist (Before Merge)

code
[ ] No new bucket strings added outside `contract_schema.py`
[ ] SQL uses PropertyAgeBucket constants (not hardcoded strings)
[ ] Backend is the only place mapping age → age_band
[ ] All tests in TestPropertyAgeBucket pass
[ ] Frontend AGE_BAND_LABELS_* match PropertyAgeBucket.LABELS

Mental Model

code
Enums define reality.
Backend classifies.
SQL provides numbers (or uses enum constants).
Frontend displays.
No one invents categories.