PWC Audit Intelligence Skill
Purpose
This skill packages audit domain expertise for the ground-truth project, enabling Claude to understand audit concepts, compliance requirements, and quality validation without re-explanation in each session.
When to Use This Skill
- •Extracting ground truth data from SEC filings (10-K, 10-Q, DEF 14A)
- •Analyzing Critical Audit Matters (CAMs)
- •Validating Internal Controls over Financial Reporting (ICFR)
- •Ensuring PCAOB independence compliance (Rule 3520)
- •Building audit intelligence systems (RAG, ML models, partner dashboards)
Core Audit Concepts
1. Critical Audit Matters (CAMs)
Definition: Matters arising from the current period audit of financial statements that were communicated (or required to be communicated) to the audit committee and that:
- •Relate to accounts/disclosures material to the financial statements
- •Involved especially challenging, subjective, or complex auditor judgment
Why They Matter:
- •High audit risk: Areas requiring significant professional judgment
- •Complexity indicator: More CAMs = more complex client
- •Partner resource allocation: CAM-heavy clients require senior staff
- •Churn risk: Clients with increasing CAM count may be at risk
Examples (from ground-truth extractions):
- •PFSI (PennyMac): Mortgage Servicing Rights (MSRs) valuation
- •Why critical: "Involved especially challenging, subjective, or complex judgments"
- •Sector-specific: Unique to mortgage banking
- •ILMN (Illumina): Revenue recognition for complex multi-element arrangements
- •Banking sector: Loan loss reserves (CECL calculations)
Typical CAM Counts:
- •0-1 CAMs: Low complexity (straightforward audits)
- •2-3 CAMs: Moderate complexity (industry-standard)
- •4+ CAMs: High complexity (partner-intensive)
Red Flags:
- •Increasing CAM count year-over-year
- •Same CAM repeated for 3+ years (unresolved issues)
- •CAMs related to management estimates (subjectivity risk)
2. Internal Controls over Financial Reporting (ICFR)
Definition (SOX 404): Process designed to provide reasonable assurance regarding reliability of financial reporting and preparation of financial statements.
Effectiveness Conclusion: Binary (Effective / Ineffective)
- •Effective: No material weaknesses identified
- •Ineffective: One or more material weaknesses exist
Why It Matters:
- •Audit quality: Clean ICFR = lower detection risk
- •Client health: ICFR failures signal governance issues
- •Churn risk: Material weaknesses increase partner workload, may lead to resignation
- •Regulatory risk: ICFR failures attract SEC scrutiny
Material Weakness Examples:
- •Inadequate segregation of duties
- •Ineffective IT general controls (ITGC)
- •Lack of evidence for key control execution
- •Management override of controls
Ground Truth Extraction (sec_10k.controls):
{
"icfr_effective": true,
"auditor": "Deloitte & Touche LLP",
"opinion_date": "2025-02-19",
"opinion": "In our opinion, the Company maintained, in all material respects, effective internal control over financial reporting..."
}
ICFR Status Distribution (across 13 companies):
- •Effective: 13/13 (100%) — all test companies have clean controls
- •Ineffective: 0/13 — no material weaknesses found (expected, these are mature public companies)
3. PCAOB Independence Compliance (Rule 3520)
Rule: Auditors must be independent in fact and appearance
Key Prohibitions:
- •Cannot audit own firm's clients (independence conflict)
- •Cannot perform certain non-audit services (e.g., bookkeeping)
- •Cannot have financial interest in audit client
- •Rotation requirements (lead partner: 5 years, reviewing partner: 5 years)
Why It Matters for ground-truth:
- •Testing restriction: Can ONLY use non-PWC clients for development
- •Deployment restriction: Must get Independence Office approval before using PWC client data
- •Provenance requirement: Must document that all test data is independence-compliant
Permitted Test Companies (127 total):
- •Audited by: EY, Deloitte, KPMG (NOT PWC)
- •Primary test company: PFSI (PennyMac Financial Services) — Deloitte client
- •Full list:
config/test_companies_permitted.csv
Verification Process:
- •Check PCAOB Form AP (auditor registration)
- •Cross-reference company CIK with Form AP client list
- •If PWC is auditor → FAIL (cannot use)
- •If EY/Deloitte/KPMG → PASS (permitted)
Deployment Checklist (before using PWC clients):
- • Independence Office consultation
- • Legal/Compliance sign-off
- • Client consent (where required)
- • Bias audit completed
- • PCAOB consultation (recommended)
4. SEC Filing Types
10-K (Annual Report):
- •Item 1A: Risk Factors
- •Item 7: MD&A (Management's Discussion & Analysis)
- •Item 9A: Controls and Procedures (ICFR opinion)
- •Critical Audit Matters: In auditor's report (near end of 10-K)
10-Q (Quarterly Report):
- •Condensed financials (no full CAM disclosure)
- •ICFR disclosure only if material change
DEF 14A (Proxy Statement):
- •Auditor information (fees, tenure)
- •Executive compensation
- •Board composition
- •Related party transactions
8-K (Current Report):
- •Material events (M&A, executive changes, restatements)
Ground Truth Extraction Workflow
Phase 1: Company Resolution
Input: Ticker (e.g., "PFSI") or CIK (e.g., "0001745916")
Process:
- •Resolve ticker → CIK via SEC API
- •Fetch company profile (includes SIC code, auditor, fiscal year end)
- •Check independence: Is company a PWC client? (FAIL if yes)
Output: Company metadata
{
"ticker": "PFSI",
"cik": "0001745916",
"company_name": "PENNYMAC FINANCIAL SERVICES INC",
"sic_code": "6162",
"sector": "mortgage",
"auditor": "Deloitte & Touche LLP"
}
Phase 2: Data Extraction
Sources:
- •SEC XBRL: Financial statements (Assets, Liabilities, Equity, Net Income, EPS)
- •SEC 10-K: CAMs, ICFR, Risk Factors
- •Market data: Current price, market cap
- •Sector-specific: HMDA (mortgage), FDIC (banking), EIA (utilities)
Provenance Requirements:
- •source_url: Direct link to SEC filing
- •file_sha256: Cryptographic hash of source document
- •extraction_method: How fact was extracted (XBRL API, regex, table parser)
- •confidence: Score 0.0-1.0 (1.0 = deterministic, <1.0 = heuristic)
Phase 3: Validation
Balance Sheet Reconciliation:
Assets = Liabilities + Stockholders' Equity
- •PASS: Difference < $1M or < 0.1% of assets
- •FAIL: Significant imbalance (check for off-balance-sheet items, extraction errors)
EPS Consistency:
Calculated EPS = Net Income / Shares Outstanding
- •PASS: Within $0.01 of reported diluted EPS
- •FAIL: Material difference (check for share count errors, preferred dividends)
Data Quality:
- •Required facts present (Assets, Liabilities, Equity, Net Income)
- •Numeric plausibility (no negative equity for going concerns)
- •Date validity (ISO 8601 format)
Provenance Completeness:
- •All facts have source_url
- •All facts have SHA-256 checksums
- •Evidence files archived locally
Phase 4: Sector Routing
SIC Code Classification:
- •Banking (6000-6099): FDIC call reports, summary of deposits
- •Mortgage (6100-6199): HMDA originations, PMMS rates, FHFA HPI
- •Utilities (4900-4999): EIA data, EPA CAMD emissions
- •Airlines (4500-4599): BTS Form 41, on-time performance
- •Tech/Other: Base extractors only (XBRL, 10-K, market data)
Routing Command:
python -m ground_truth.cli classify PFSI # Output: Sector: mortgage, Extractors: sec_xbrl, sec_10k, market_data, hmda, pmms, fhfa_hpi
RAG Integration (Phase 2)
Chunking Strategy
10-K Sections to Embed:
- •Critical Audit Matters (keep each CAM as separate chunk)
- •ICFR controls (single chunk)
- •Risk Factors (split if >1000 tokens)
- •MD&A (split by subsection)
Chunk Metadata (required):
{
"chunk_id": "PFSI_2024_CAM_01",
"ticker": "PFSI",
"cik": "0001745916",
"section": "Critical Audit Matters",
"filing_date": "2024-12-31",
"source_sha256": "f52e532ba...",
"chunk_index": 0,
"total_chunks": 2
}
Query Patterns
Factual Retrieval:
- •"What are PFSI's Critical Audit Matters?"
- •"Is PFSI's ICFR effective?"
- •"What is EWBC's total assets?"
Comparative:
- •"Compare PFSI and RKT risk factors"
- •"Which has more CAMs: PFSI or RKT?"
Analytical:
- •"Which mortgage companies have ICFR failures?"
- •"Which companies have 3+ CAMs?"
- •"Show all companies audited by Deloitte"
Temporal (requires multi-year data):
- •"Did PFSI's net income increase in 2024?"
- •"What CAMs did PFSI have in 2023 vs 2024?"
LTV Prediction Model (Phase 3)
Features (Churn Risk Indicators)
Complexity Indicators:
- •CAM count (0-1 = low, 2-3 = medium, 4+ = high)
- •ICFR effectiveness (effective = -20 pts, ineffective = +50 pts)
- •Restatement history (each restatement = +30 pts)
Financial Health:
- •Profitability (net income < 0 = +20 pts)
- •Leverage (debt/equity > 3.0 = +15 pts)
- •Liquidity (current ratio < 1.0 = +10 pts)
Sector Risk:
- •Mortgage (cyclical, sensitive to rates) = +10 pts
- •Banking (regulatory-heavy) = +5 pts
- •Tech (fast-changing, valuation risk) = +5 pts
Heuristic Score (0-100):
def calculate_client_risk_score(ground_truth):
score = 50 # Base score
# CAM complexity
cam_count = len(ground_truth['critical_audit_matters'])
if cam_count >= 3:
score += 30
elif cam_count == 0:
score -= 30
# ICFR status
if not ground_truth['controls']['icfr_effective']:
score += 50
else:
score -= 20
# Profitability
if ground_truth['xbrl']['NetIncomeLoss'] < 0:
score += 20
return max(0, min(100, score)) # Clamp to 0-100
Risk Bands:
- •0-30: Low risk (retain, minimal partner time)
- •31-60: Medium risk (monitor, standard engagement)
- •61-100: High risk (churn candidate, consider resignation)
Quality Standards
Audit-Grade Provenance
Every fact must include:
- •Source URL: Direct link to SEC filing
- •SHA-256: Cryptographic hash for verification
- •Extraction method: How it was obtained
- •Confidence score: Reliability estimate
Example:
{
"matter": "Mortgage Servicing Rights (MSRs)",
"why_critical": "Involved especially challenging, subjective, or complex judgments",
"provenance": {
"source_url": "https://www.sec.gov/Archives/edgar/data/1745916/000155837025001148/pfsi-20241231x10k.htm",
"file_sha256": "f52e532ba113920525cf682c979c52e107904c847d6532ef0d922b71ba684e6a",
"extraction_method": "section_locator",
"confidence": 0.7
}
}
Verification Process
Partners must be able to:
- •Download original 10-K from SEC
- •Compute SHA-256 hash
- •Verify it matches provenance metadata
- •Manually locate cited text in filing
If mismatch:
- •Flag for investigation (corruption or fabrication)
- •Re-run extraction
- •Update evidence files
PWC Pitch Framework
Problem
Partners manually review 5,000+ client 10-Ks annually:
- •200 pages per 10-K × 2 hours = 10,000 hours/year
- •No institutional memory (knowledge locked in partner minds)
- •Reactive (find issues after they occur) vs proactive
Solution
Automated ground truth extraction + RAG query interface:
- •Extract CAMs, ICFR, financials in 3 minutes
- •Natural language queries: "Which clients have material weaknesses?"
- •Full provenance (SHA-256 verification)
- •Proactive alerts (CAM count increasing, ICFR failures)
Proof
- •13 companies extracted (100% success rate)
- •43.8 MB evidence archived with SHA-256 checksums
- •All 127 test companies are independence-compliant
- •RAG POC: Partners query in 2 seconds vs 2 hours
Business Impact
Efficiency:
- •200 hours → 2 hours per partner per year (100x gain)
Risk Detection:
- •Automated alerts for high-risk clients
- •Early warning for churn candidates
- •Proactive partner-client matching
Compliance:
- •Audit trail maintained (SHA-256 provenance)
- •Independence safeguards built-in
- •PCAOB-compliant validation
Ask
6 weeks + AWS/Azure credits for 50-company pilot
Common Failure Modes
Extraction Errors
CAM extraction returns 0 CAMs (but company likely has them):
- •Cause: Section locator regex doesn't match 10-K format
- •Fix: Debug section headers, test on 3-5 known CAM companies
Balance sheet doesn't balance:
- •Cause: XBRL period inconsistency (mixing quarterly/annual)
- •Fix: Consistent period selection logic (see PFSI bug fix)
"Revenues" fact missing for banks:
- •Cause: Banks use different XBRL tags (InterestAndDividendIncomeOperating)
- •Fix: Add sector-specific tag mappings
Validation Failures
EPS consistency check fails:
- •Possible causes: Preferred dividends, share count timing, rounding
- •Action: Check 10-K footnotes, verify share count source
Data quality FAIL:
- •Cause: Required fact missing (Assets, Liabilities, etc.)
- •Action: Check XBRL API response, add fallback tag mappings
Independence Violations
Extracted JPM data (PWC client):
- •Risk: PCAOB Rule 3520 violation
- •Action: Delete immediately, add to blocked list, re-check permitted companies
Reference Files
In ground-truth repo:
- •
config/test_companies_permitted.csv— 127 independence-compliant companies - •
INDEPENDENCE.md— Full PCAOB compliance framework - •
RAG_ARCHITECTURE.md— Technical deep dive on RAG system - •
MILESTONE_01_PFSI_SUCCESS.md— PFSI case study (first successful extraction)
Vocabulary
Terms to use consistently:
- •Ground truth: Authoritative data from primary sources (SEC filings, regulatory databases)
- •Provenance: Metadata chain (source URL → SHA-256 → extraction method)
- •CAM: Critical Audit Matter (NOT "significant audit matter")
- •ICFR: Internal Control over Financial Reporting (NOT "internal controls")
- •Extraction: Automated data retrieval (NOT "scraping")
- •Validation gates: Quality checks (balance sheet, EPS, provenance)
Avoid:
- •"Scraping" (implies unstructured/aggressive data collection)
- •"AI-generated" (use "LLM-assisted" or "RAG-powered")
- •"Black box" (emphasize explainability, provenance, human-in-the-loop)
Notes
Baseline date: October 25, 2025
Current status:
- •Phase 1 (Extraction): ✅ COMPLETE (13 companies)
- •Phase 2 (RAG): 🔴 IN PROGRESS (3-week timeline)
- •Phase 3 (ML): ⚪ NOT STARTED (4-6 weeks after RAG)
Key decisions:
- •ChromaDB selected over Pinecone (cost, simplicity)
- •OpenAI embeddings selected over open-source (quality)
- •RAG-first strategy (not scale-first)
- •Supervised ML (not RL)
Audit context:
- •Owner: Nirvan Chitnis (PWC Audit Associate, started Oct 3, 2025)
- •Public repo: https://github.com/nirvanchitnis-cmyk/ground-truth
- •Professional reputation protection: No inappropriate content, audit-grade standards