AgentSkillsCN

education-data-explorer

探索 Urban Institute 教育数据门户中的表格与变量。适用于识别可用的教育数据集、了解学校/学区/高校所拥有的变量,或规划教育研究的数据查询时使用。

SKILL.md
--- frontmatter
name: education-data-explorer
description: Explore Urban Institute Education Data Portal tables and variables. Use when identifying available education datasets, understanding what variables exist for schools/districts/colleges, or planning data queries for education research.
metadata:
  audience: data-analysts
  domain: education-data

Education Data Explorer

Discover available education data from the Urban Institute Education Data Portal for research planning and query design.

What is the Education Data Portal?

  • Comprehensive education data from Urban Institute - free and publicly available
  • Three data levels: schools, school-districts, college-university
  • Multiple data sources: CCD, IPEDS, CRDC, College Scorecard, EDFacts, SAIPE, FSA, MEPS, PSEO, etc.
  • Coverage: 1980-2023 depending on source
  • Access: Mirror downloads (parquet/CSV) via education-data-query skill
  • Documentation: https://educationdata.urban.org/documentation/

Note: This workflow uses mirror-based file downloads, not paginated API calls. See education-data-query skill for fetch patterns and datasets-reference.md for file paths.

Reference File Structure

FilePurposeWhen to Read
schools-endpoints.mdAll school-level endpoints and variablesResearching K-12 schools
districts-endpoints.mdAll district-level endpoints and variablesResearching school districts
colleges-endpoints.mdAll college-level endpoints and variablesResearching higher education
variable-codes.mdCode values for states, grades, race, etc.Interpreting or filtering data
metadata-api.mdProgrammatic endpoint/variable discoveryDynamic exploration

Decision Trees

What data level do I need?

code
What entity am I researching?
├─ Individual K-12 schools → schools level
│   └─ See ./references/schools-endpoints.md
├─ School districts / LEAs → school-districts level
│   └─ See ./references/districts-endpoints.md
├─ Colleges / Universities → college-university level
│   └─ See ./references/colleges-endpoints.md
└─ Not sure
    ├─ Need school-specific data (discipline, AP, demographics) → schools
    ├─ Need aggregate district data (finance, poverty) → school-districts
    └─ Need postsecondary data (enrollment, aid, outcomes) → college-university

What topic am I researching?

code
Research topic?
├─ Enrollment / Demographics
│   ├─ K-12 public schools → CCD enrollment endpoints
│   ├─ Civil rights indicators → CRDC enrollment
│   └─ Colleges → IPEDS enrollment
├─ School Finance
│   ├─ District revenue/expenditure → CCD finance
│   └─ College finance → IPEDS finance
├─ Student Outcomes
│   ├─ K-12 assessments → EDFacts
│   ├─ Graduation rates (K-12) → EDFacts
│   ├─ College completion → IPEDS completions
│   └─ Post-college earnings → College Scorecard
├─ Student Aid / Loans
│   ├─ College financial aid → IPEDS aid
│   ├─ Federal loans/grants → FSA
│   └─ Debt/repayment → College Scorecard
├─ Discipline / Civil Rights
│   └─ K-12 discipline, harassment, restraint → CRDC
├─ Poverty Estimates
│   └─ District-level → SAIPE
└─ Directory / Location
    ├─ K-12 schools → CCD directory
    ├─ Districts → CCD directory
    └─ Colleges → IPEDS directory

How do I find specific variables?

code
Finding variables?
├─ Know the endpoint → Check reference file for variable list
├─ Know the topic → Use topic index below
├─ Need to search programmatically → See ./references/metadata-api.md
└─ Need code definitions → See ./references/variable-codes.md

Quick Reference: Data Levels

LevelKey SourcesPrimary IDID Format
schoolsCCD, CRDC, EDFacts, MEPS, NHGISncessch12-char string
school-districtsCCD, SAIPE, EDFactsleaid7-char string
college-universityIPEDS, Scorecard, FSA, PSEO, EADAunitid6-digit integer

Quick Reference: Data Sources

SourceLevelDescriptionYears
CCDSchools, DistrictsPublic K-12 directory, enrollment, finance1986-2023
CRDCSchoolsCivil rights indicators, discipline, AP courses2011-2021
EDFactsSchools, DistrictsAssessments, graduation rates2009-2020
IPEDSCollegesEnrollment, completions, finance, institutional data1980-2023
College ScorecardCollegesEarnings, debt, student outcomes1996-2020
SAIPEDistrictsCensus poverty estimates for school-age children1995-2023
FSACollegesFederal student aid, loans, grants, 90/101999-2021
MEPSSchoolsSchool poverty measure2006-2019
NHGISSchoolsCensus geography crosswalks1990, 2000, 2010, 2020

Quick Reference: Common Endpoints

Schools

EndpointDescription
/schools/ccd/directory/{year}/School directory (location, type, enrollment)
/schools/ccd/enrollment/{year}/{grade}/Enrollment by grade
/schools/crdc/discipline/{year}/Discipline incidents
/schools/crdc/ap-ib-enrollment/{year}/race/sex/AP/IB enrollment (requires disaggregation)
/schools/edfacts/assessments/{year}/{grade}/Assessment results

Districts

EndpointDescription
/school-districts/ccd/directory/{year}/District directory
/school-districts/ccd/enrollment/{year}/{grade}/District enrollment
/school-districts/ccd/finance/{year}/Revenue and expenditure
/school-districts/saipe/{year}/Poverty estimates

Colleges

EndpointDescription
/college-university/ipeds/directory/{year}/Institution directory
/college-university/ipeds/admissions-enrollment/{year}/Admissions data
/college-university/ipeds/enrollment-full-time-equivalent/{year}/FTE enrollment
/college-university/ipeds/fall-enrollment/{year}/{level}/Fall enrollment
/college-university/ipeds/graduation-rates/{year}/Graduation rates
/college-university/scorecard/earnings/{year}/Post-college earnings

Exploration Workflow

Follow these steps to identify data for a research question:

  1. Identify data level

    • Schools: individual K-12 school records
    • Districts: school district / LEA records
    • Colleges: postsecondary institution records
  2. Identify relevant data source(s)

    • Use the data sources table above
    • Multiple sources may be needed (e.g., CCD + CRDC)
  3. Check available endpoints

    • Read the appropriate reference file
    • Note endpoint URL pattern and variables
  4. Review variables and filters

    • Check variable lists in reference files
    • Note which variables can be used as filters
  5. Check years available

    • Each endpoint has different year coverage
    • Use metadata API to get exact years
  6. Understand source context (RECOMMENDED)

    • Load the appropriate education-data-source-* skill for deep context
    • Understand data collection methodology and limitations
    • Review variable definitions and coding schemes
  7. Plan query

    • Load education-data-query skill for query construction
    • Or use metadata API to build query programmatically

URL Pattern Structure

All endpoints follow this pattern:

code
/api/v1/{level}/{source}/{topic}/{year}/[{disaggregation}/]

Examples:

  • /api/v1/schools/ccd/directory/2022/
  • /api/v1/schools/ccd/enrollment/2022/grade-5/
  • /api/v1/schools/ccd/enrollment/2022/grade-5/race/
  • /api/v1/school-districts/ccd/finance/2021/
  • /api/v1/college-university/ipeds/fall-enrollment/2022/undergraduate/

Filtering

Query Parameters

ParameterDescriptionExample
fipsState FIPS code?fips=6 (California)
leaidDistrict ID?leaid=0600001
ncesschSchool ID?ncessch=060000100001
unitidCollege ID?unitid=110635
yearFilter by year?year=2022

Response Format

json
{
  "count": 12345,
  "next": "https://educationdata.urban.org/api/v1/...?page=2",
  "previous": null,
  "results": [
    {"ncessch": "...", "school_name": "...", ...},
    ...
  ]
}

Cross-Reference to Related Skills

SkillPurposeWhen to Use
education-data-queryDownload data from mirrorsAfter identifying endpoints/variables
education-data-contextInterpret data, understand limitationsAfter retrieving data

Deep-Dive Data Source Skills

For comprehensive understanding of each data source beyond the portal documentation, load the appropriate source-specific skill:

SkillData SourceKey Topics
education-data-source-ccdCommon Core of DataK-12 directory, enrollment, finance, staffing surveys
education-data-source-crdcCivil Rights Data CollectionDiscipline, harassment, course access, civil rights context
education-data-source-saipeSmall Area Income & PovertyDistrict poverty estimates, model methodology
education-data-source-edfactsEDFactsState assessments, graduation rates, accountability
education-data-source-ipedsIPEDSCollege enrollment, graduation, finance, completions
education-data-source-scorecardCollege ScorecardPost-college earnings, debt, repayment
education-data-source-nhgisNHGISCensus geography, demographic crosswalks
education-data-source-fsaFederal Student AidPell, loans, financial responsibility, 90/10
education-data-source-nacuboNACUBOCollege endowment data
education-data-source-nccsNCCSNonprofit data for private colleges
education-data-source-mepsMEPSModel-based school poverty (superior to FRPL)
education-data-source-eadaEADACollege athletics equity data
education-data-source-campus-safetyCampus SafetyCampus crime, Clery Act data
education-data-source-pseoPSEOPost-graduation employment outcomes

When to load source skills:

  • Need deeper understanding of data collection methodology
  • Encountering unexpected values or patterns
  • Planning analysis that requires understanding source limitations
  • Working with less common data elements not covered in this skill

Topic Index

TopicReference FileSection
School directoryschools-endpoints.mdCCD Directory
School enrollmentschools-endpoints.mdCCD Enrollment
Discipline dataschools-endpoints.mdCRDC Discipline
AP/IB coursesschools-endpoints.mdCRDC AP-IB-GT
K-12 assessmentsschools-endpoints.mdEDFacts
District directorydistricts-endpoints.mdCCD Directory
District financedistricts-endpoints.mdCCD Finance
District povertydistricts-endpoints.mdSAIPE
College directorycolleges-endpoints.mdIPEDS Directory
College enrollmentcolleges-endpoints.mdIPEDS Enrollment
College graduationcolleges-endpoints.mdIPEDS Graduation
Financial aidcolleges-endpoints.mdIPEDS Aid, FSA
Post-college earningscolleges-endpoints.mdScorecard
Student debtcolleges-endpoints.mdScorecard, FSA
State FIPS codesvariable-codes.mdState FIPS
Grade codesvariable-codes.mdGrade Codes
Race/ethnicity codesvariable-codes.mdRace Codes
Locale codesvariable-codes.mdUrban-Centric Locale
Programmatic discoverymetadata-api.mdAll

Example: Planning a Research Query

Research question: "What is the relationship between school poverty and AP course offerings in California high schools?"

  1. Data level: Schools (individual school records)

  2. Data sources needed:

    • CRDC for AP course data
    • MEPS or CCD for poverty measure
  3. Endpoints:

    • /schools/crdc/ap-ib-enrollment/{year}/race/sex/ - AP enrollment (requires disaggregation)
    • /schools/meps/{year}/ - School poverty measure
  4. Key variables:

    • ncessch - school identifier (for joining)
    • fips=6 - California filter
    • AP enrollment variables from CRDC
    • Poverty measure from MEPS
  5. Years: Check overlap (CRDC: 2011-2021, MEPS: 2006-2019)

  6. Next step: Load education-data-query skill to construct the actual API calls

Common Pitfalls

  • Year coverage varies: Always check years available for each endpoint
  • Different ID formats: ncessch (12-char), leaid (7-char), unitid (6-digit)
  • Disaggregation in URL: Grade, race, sex are often URL path components, not query params
  • Missing data codes: -1, -2, -3 have specific meanings (see variable-codes.md)

Pre-Query Validation

CRITICAL: Variable Name Discrepancies

The Education Data Portal API often uses different variable names than documentation suggests. Always fetch a sample first:

python
# Test query to verify actual column names
response = requests.get(
    "https://educationdata.urban.org/api/v1/college-university/ipeds/directory/2023/"
)
data = response.json()
print("Actual columns:", list(data['results'][0].keys()))

Known discrepancies:

DocumentedActual API FieldEndpoint
inst_levelinstitution_levelIPEDS Directory
applicants_totalnumber_appliedIPEDS Admissions
admissions_totalnumber_admittedIPEDS Admissions
grad_rate_150pctcompletion_rate_150pctIPEDS Graduation Rates
school_povertymeps_poverty_pctMEPS
population_5_17_povertyest_population_5_17_povertySAIPE

See the relevant education-data-source-* skill for comprehensive variable mappings per source.

Metadata API Limitations

The metadata API has undocumented limitations:

  • ?section=schools works to filter by data level
  • ?source=ipeds does NOT work - filter client-side instead
  • Response field names differ: source is actually class_name, source_name is actually label

Data Source Details

Quick summaries below. For comprehensive documentation including methodology, variable definitions, data quality issues, and historical changes, load the corresponding education-data-source-* skill.

CCD (Common Core of Data)

Coverage: All public elementary and secondary schools and districts in the U.S.

TopicSchoolsDistricts
DirectoryYesYes
EnrollmentYes (by grade, race, sex)Yes (by grade, race, sex)
FinanceNoYes (revenue, expenditure)

Key Variables:

  • ncessch: 12-character NCES school ID
  • leaid: 7-character NCES district ID
  • enrollment: Total enrollment count
  • free_or_reduced_price_lunch: FRPL-eligible students (poverty proxy)
  • charter: Charter school indicator
  • urban_centric_locale: Urban/suburban/town/rural classification

Deep dive: Load education-data-source-ccd for survey components, data collection process, variable coding, state variations, and historical changes (e.g., 2006 locale code revision, 2010 race category changes).

CRDC (Civil Rights Data Collection)

Coverage: Biennial survey of public schools (2011, 2013, 2015, 2017, 2020, 2021)

Topics:

  • Discipline (suspensions, expulsions, arrests)
  • Chronic absenteeism
  • Harassment and bullying
  • Restraint and seclusion
  • Advanced courses (AP, IB, gifted)
  • Course offerings
  • Teacher qualifications
  • Retention
  • COVID impacts (2020 only)

Key Feature: Disaggregation by race, sex, disability, and LEP status

Deep dive: Load education-data-source-crdc for civil rights legal context (Title VI, IX, Section 504), collection methodology, underreporting issues, and year-to-year changes.

EDFacts

Coverage: State assessment and accountability data

Topics:

  • Assessment proficiency rates (reading, math)
  • Graduation rates (4-year adjusted cohort)

Key Feature: Data available by special populations (disability, economically disadvantaged, LEP, homeless, migrant, foster care)

CRITICAL: State assessment scores CANNOT be compared across states (different tests, cut scores).

Deep dive: Load education-data-source-edfacts for ESSA/NCLB accountability context, why cross-state comparison is invalid, ACGR methodology, and subgroup reporting rules.

IPEDS (Integrated Postsecondary Education Data System)

Coverage: All Title IV-eligible postsecondary institutions

Topics:

  • Institutional characteristics and directory
  • Admissions and enrollment
  • Student charges (tuition, fees, room, board)
  • Financial aid
  • Finance (revenue, expenditure, assets)
  • Graduation rates
  • Completions (degrees awarded by CIP code)
  • Human resources (salaries, faculty)

Key Variables:

  • unitid: 6-digit IPEDS institution ID
  • inst_control: 1=Public, 2=Private nonprofit, 3=Private for-profit
  • institution_level: 1=Less-than-2-year, 2=2-year, 4=4-year (no code 3)
  • hbcu: Historically Black college indicator

Deep dive: Load education-data-source-ipeds for critical graduation rate limitations (first-time full-time only), GASB vs FASB finance accounting, survey components, and identifier changes.

College Scorecard

Coverage: Title IV institutions with outcome data

Topics:

  • Post-college earnings (6 and 10 years after entry)
  • Student debt and repayment
  • Default rates
  • Completion rates by income level

Key Feature: Links education to labor market outcomes

CRITICAL: Only covers Title IV aid recipients (selection bias toward lower-income students).

Deep dive: Load education-data-source-scorecard for earnings methodology (IRS data), population coverage limitations, suppression rules, and field-of-study data.

SAIPE (Small Area Income and Poverty Estimates)

Coverage: Census Bureau poverty estimates for school districts

Key Variables:

  • population_5_17_poverty: Children 5-17 in poverty
  • population_5_17_poverty_pct: Percent in poverty
  • median_household_income: District median income

Deep dive: Load education-data-source-saipe for model-based estimation methodology, confidence intervals (not available at district level), and comparison to other poverty measures.

FSA (Federal Student Aid)

Coverage: Title IV institutions receiving federal aid

Topics:

  • Pell grants
  • Direct loans (subsidized, unsubsidized, PLUS)
  • Campus-based aid (Perkins, work-study)
  • Financial responsibility scores
  • 90/10 revenue (for-profit institutions)

Deep dive: Load education-data-source-fsa for Title IV program details, financial responsibility composite scores, and 90/10 rule compliance.

Additional Data Sources

SourceCoverageDeep Dive Skill
MEPSSchool-level poverty estimates (superior to FRPL for cross-state comparison)education-data-source-meps
NHGISCensus geography crosswalks for schoolseducation-data-source-nhgis
NACUBOCollege endowment dataeducation-data-source-nacubo
NCCSNonprofit data for private colleges (Form 990)education-data-source-nccs
EADACollege athletics equity dataeducation-data-source-eada
Campus SafetyCampus crime statistics (Clery Act)education-data-source-campus-safety
PSEOPost-graduation employment outcomeseducation-data-source-pseo

Joining Data Across Sources

School-Level Joins

Join school data across sources using ncessch:

Source 1Source 2Join KeyUse Case
CCDCRDCncesschEnrollment + discipline
CCDEDFactsncesschDirectory + assessments
CCDMEPSncesschEnrollment + poverty
CRDCMEPSncesschAP courses + poverty

Note: Match on year when joining (years may not align perfectly)

District-Level Joins

Join district data using leaid:

Source 1Source 2Join KeyUse Case
CCD DirectoryCCD FinanceleaidCharacteristics + spending
CCDSAIPEleaidEnrollment + poverty
CCDEDFactsleaidEnrollment + outcomes

College-Level Joins

Join college data using unitid:

Source 1Source 2Join KeyUse Case
IPEDS DirectoryIPEDS FinanceunitidCharacteristics + finance
IPEDSScorecardunitidEnrollment + earnings
IPEDSFSAunitidEnrollment + aid data

Disaggregation Patterns

URL Path Disaggregation

Some disaggregations are part of the URL path:

code
/schools/ccd/enrollment/{year}/{grade}/           # By grade
/schools/ccd/enrollment/{year}/{grade}/race/      # By grade and race
/schools/ccd/enrollment/{year}/{grade}/race/sex/  # By grade, race, and sex

Query Parameter Disaggregation

Other filters are query parameters:

code
?fips=6                    # California only
?charter=1                 # Charter schools only
?school_level=3            # High schools only
?urban_centric_locale=11   # Large cities only

Available Disaggregations by Source

SourceGradeRaceSexDisabilityEcon StatusLEP
CCDYesYesYesNoNoNo
CRDCNoYesYesYesNoYes
EDFactsYesYesYesYesYesYes
IPEDSLevelYesYesNoNoNo

Year Coverage Quick Reference

SourceEarliestLatestUpdate Frequency
CCD Directory19862023Annual
CCD Finance19892021Annual (2-year lag)
CRDC20112021Biennial
EDFacts20092020Annual
IPEDS19802023Annual
Scorecard19962020Annual
SAIPE19952023Annual
FSA19992021Annual

Example Research Scenarios

ScenarioData SourcesKey Variables
Charter vs traditional school outcomesCCD directory + EDFacts assessmentscharter, read_test_pct_prof_midpt
College affordability by incomeIPEDS directory + net-price-by-incomeinst_control, income_level, avg_net_price
Discipline disparities by raceCRDC discipline + enrollment (by race)race, oss_one, expulsions_*
Spending and graduation ratesCCD finance + EDFacts grad-ratesexp_current_per_pupil, grad_rate_midpt
School poverty and AP accessCRDC ap-ib-enrollment + MEPSap_enrollment, meps_poverty_pct
College earnings by majorIPEDS completions + Scorecard earningscip_code, earn_median_wne_p10