Hard rules (public sources only)
- •
Every authority/proposition outcome MUST be one of:
- •VERIFIED_CORRECT (public source retrieved and supports)
- •VERIFIED_ERROR (public source retrieved and contradicts OR shows mismatch)
- •UNVERIFIABLE_PUBLIC (not found / ambiguous / blocked / missing coverage)
- •
NEVER state "does not exist" unless you have retrieved material proving mismatch. If you cannot retrieve the authority text from public sources, label UNVERIFIABLE_PUBLIC.
- •
Every conclusion MUST include:
- •retrieval URL(s) attempted,
- •timestamp,
- •cached artefact path + SHA256,
- •short reason for any failure.
Source Acquisition Policy (v0.2)
Permitted Sources (Priority Order)
- •
Find Case Law (National Archives) - Primary for UK case law
- •Preferred for all UK neutral citations
- •Deterministic URL construction before searching
- •Atom feed search in restricted mode (max 10 results per query)
- •Document retrieval via
/{document_uri}/data.xml
- •
BAILII - Secondary fallback for UK case law
- •Used only when FCL fails or not applicable
- •Case-scoped, targeted retrieval only
- •Per-job cap: 25-50 fetches maximum (configurable)
- •
User-provided documents and links - Always permitted
Prohibited Activities
NEVER:
- •Crawl, mirror, or bulk-download from BAILII or Find Case Law
- •Build a general corpus from any public source
- •Walk indices, pagination, or "related cases" links
- •Perform systematic computational analysis without licensing
Find Case Law Specific Rules
- •
Open Justice Licence Compliance:
- •Default mode: URL fetch + API retrieval for specific documents only
- •No computational analysis without explicit permission
- •If bulk searching required: mark as
FCL_SEARCH_MODE=RESTRICTED
- •
Rate Limiting (mandatory):
- •Global limit: 1,000 requests per 5 minutes per IP (API spec)
- •Polite limit: 1 request per second (system default)
- •Handle 429 responses: backoff + retry (3 attempts max)
- •After retries exhausted: mark remaining
UNVERIFIABLE_PUBLIC
- •
Resolution Algorithm (conservative):
codeStep 1: Deterministic URL Construction Try: GET /{court}/{year}/{num}/data.xml If 200: RESOLVED If 404: Proceed to Step 2 Step 2: Atom Feed Search (Restricted) Build 2-3 targeted queries (court filter, neutral citation, party) Fetch: per_page=10, page=1 only (max 2 pages) Score: exact identifier match > party+year+court match If exactly one high-confidence: RESOLVED If multiple: UNVERIFIABLE_PUBLIC (ambiguous) If none: Proceed to Step 3 Step 3: BAILII Fallback (Optional) Only if within per-job BAILII limit Try pattern match Respect rate limit - •
Caching Requirements:
- •Store raw XML/HTML artefact
- •Compute SHA256 hash locally
- •Store FCL contenthash from Atom (if available)
- •Store updated timestamp from Atom
- •Create metadata JSON with all retrieval details
BAILII Specific Rules
- •
Hard Limits (enforced per job):
- •Per-job cap: 25-50 fetches (configurable)
- •Rate limit: 1 request per second minimum
- •Strict caching: no refetching within job if cached
- •
Pattern Matching Only:
- •Support 9 UK court types with deterministic URL templates
- •No web search or creative guessing
- •If pattern doesn't match: mark unresolvable
Evidence Requirements
Every VERIFIED_* conclusion MUST include:
- •Source URL(s) - All URLs attempted and which succeeded
- •Retrieval timestamp - ISO 8601 format
- •Cached artefact - Path relative to project root
- •Content hash - SHA256 of cached content
- •Metadata - HTTP status, content type, content length
- •Quoted snippets - From retrieved artefact only (no fabrication)
Every UNVERIFIABLE_PUBLIC MUST include:
- •URLs attempted - All resolution attempts
- •HTTP status codes - Or resolution failure reasons
- •Reason - No pattern match / 404 / ambiguous / rate limited
- •Timestamp - When attempts were made
Rate Limit Handling
When rate limits encountered (429 responses):
- •
Immediate action:
- •Backoff with exponential delay (1s, 2s, 4s)
- •Retry up to 3 times total
- •Log each retry attempt
- •
After exhaustion:
- •Stop making requests to that source for this job
- •Mark remaining citations as
UNVERIFIABLE_PUBLIC - •Include clear note in report:
code
"Rate limit reached for [source]. Remaining N citations could not be verified. Consider running audit again after delay or increasing per-job limits."
- •
Never:
- •Continue hammering the API
- •Skip rate limiting to "get results faster"
- •Hide rate limit failures from user