AgentSkillsCN

education-data-source-ipeds

深入参考 IPEDS(综合高等教育数据系统)——美国高校与大学的主要联邦数据源。适用于分析高等教育数据、理解毕业率的局限性、比较院校财务状况、解读入学指标,或使用 UNITID/OPEID 标识符时使用。

SKILL.md
--- frontmatter
name: education-data-source-ipeds
description: >-
  Deep reference for IPEDS (Integrated Postsecondary Education Data System) -
  the primary federal data source on U.S. colleges and universities. Use when
  analyzing postsecondary data, understanding graduation rate limitations,
  comparing institution finances, interpreting enrollment metrics, or working
  with UNITID/OPEID identifiers.
metadata:
  audience: data-analysts
  domain: education-data

IPEDS Data Source Reference

Comprehensive guide to understanding and using IPEDS data correctly. IPEDS is the most widely used source for postsecondary education data but has significant complexities — including sector-specific accounting standards, cohort-limited graduation rates, and integer-encoded categorical variables — that users must understand.

CRITICAL: Value Encoding

This document describes Education Data Portal integer encodings, which differ from NCES raw file string codes. The Portal converts categorical variables to integers for consistency across sources.

ContextRace WhiteRace BlackSex MaleSector Public 4-yr
Portal (integers)1211
NCES raw filesEFFY_WHITEEFFY_BKAAMvaries

Always verify codes against Portal codebooks (available alongside each dataset in the Portal mirrors).

What is IPEDS?

IPEDS (Integrated Postsecondary Education Data System) is a system of 12+ interrelated survey components:

  • Administered by: National Center for Education Statistics (NCES)
  • Coverage: ~6,500 Title IV-participating postsecondary institutions
  • Frequency: Annual collection in three periods (Fall, Winter, Spring)
  • Mandate: Required for Title IV federal student aid participation
  • Available years: 1980-present (varies by component)
  • Primary identifier: UNITID (6-digit institution ID)

Reference File Structure

FilePurposeWhen to Read
survey-components.mdAll 12+ IPEDS surveys with collection periodsUnderstanding data structure
graduation-rates.mdCRITICAL GRS limitations and who is trackedAny graduation rate analysis
enrollment-data.mdFall vs 12-month, FTE calculationsEnrollment comparisons
finance-data.mdGASB vs FASB accounting standardsCross-sector finance analysis
financial-aid.mdNet price, aid types, populationsAid and cost analysis
institution-identifiers.mdUNITID, OPEID, mergers, closuresData linking and longitudinal work
completions-data.mdDegrees awarded, CIP codesCompletions and outcomes
data-quality.mdKnown issues, sector comparisonsQuality assurance

Decision Trees

What data am I working with?

code
Working with IPEDS data?
├─ Graduation rates → ./references/graduation-rates.md (READ FIRST!)
├─ Enrollment counts → ./references/enrollment-data.md
├─ Finance/revenue/expenses → ./references/finance-data.md
├─ Financial aid/net price → ./references/financial-aid.md
├─ Degrees/completions → ./references/completions-data.md
├─ Institutional info → ./references/survey-components.md (IC section)
├─ Human resources/salaries → ./references/survey-components.md (HR section)
└─ Linking to other data → ./references/institution-identifiers.md

Is my analysis valid?

code
Cross-sector comparison?
├─ Comparing grad rates across sectors
│   └─ CAUTION: Different populations → ./references/graduation-rates.md
├─ Comparing finances across sectors
│   └─ CAUTION: GASB vs FASB → ./references/finance-data.md
├─ Comparing net price across sectors
│   └─ CAUTION: Aid populations differ → ./references/financial-aid.md
└─ Time series analysis
    └─ Check for institutional changes → ./references/institution-identifiers.md

Finding specific variables?

code
Need variable definitions?
├─ Survey component overview → ./references/survey-components.md
├─ Graduation cohort definitions → ./references/graduation-rates.md
├─ Enrollment level/status → ./references/enrollment-data.md
├─ Revenue/expense categories → ./references/finance-data.md
├─ Aid types and populations → ./references/financial-aid.md
└─ CIP codes for programs → ./references/completions-data.md

Quick Reference: Survey Components

ComponentAbbrevCollectionKey Content
Institutional CharacteristicsICFallDirectory, tuition, mission
12-Month EnrollmentE12FallUnduplicated headcount, FTE
CompletionsCFallDegrees by CIP, demographics
CostCSTFall/WinterCost of attendance, net price
AdmissionsADMWinterApplications, admits, enrollees
Student Financial AidSFAWinterAid counts and amounts
Graduation RatesGRWinter150% completion rates
Graduation Rates 200%GR200Winter200% completion rates
Outcome MeasuresOMWinterPart-time and transfer outcomes
Fall EnrollmentEFSpringPoint-in-time enrollment
FinanceFSpringRevenue, expenses, assets
Human ResourcesHRSpringEmployees, salaries
Academic LibrariesALSpringLibrary resources (biennial)

Key Identifiers

IDFormatLevelExampleNotes
unitid6-digit integerInstitution100654Unique, persistent across years; changes on merger
opeid8-digit stringInstitution (Title IV)00100200Links to FSA/NSLDS; shared across branches

Institution Type Codes

VariableValuesMeaning
inst_control1Public
2Private nonprofit
3Private for-profit
-1Missing/not reported
institution_level1Less than 2-year
22-year (at least 2 but less than 4)
44-year or above
-1Missing/not reported
sector0Administrative unit
1Public, 4-year or above
2Private not-for-profit, 4-year or above
3Private for-profit, 4-year or above
4Public, 2-year
5Private not-for-profit, 2-year
6Private for-profit, 2-year
7Public, less-than 2-year
8Private not-for-profit, less-than 2-year
9Private for-profit, less-than 2-year
-1Sector unknown (not active)
hbcu1Historically Black College/University
0Not HBCU
-1Missing/not reported
tribal_college1Tribal College
0Not Tribal College
-1Missing/not reported
degree_granting1Degree-granting
0Non-degree-granting

Note: There is no code 3 for institution_level. The Portal uses codes 1, 2, 4 (not 1, 2, 3).

inst_size Categories

CodeMeaning
1Under 1,000
21,000 - 4,999
35,000 - 9,999
410,000 - 19,999
520,000 and above

Note: inst_size is a category code (1-5), not an actual enrollment count.

Race/Ethnicity Codes (Portal Integer Encoding)

CodeCategoryNotes
1WhiteSingle race, non-Hispanic
2BlackSingle race, non-Hispanic
3HispanicAny race
4AsianSingle race, non-Hispanic
5American Indian/Alaska NativeSingle race, non-Hispanic
6Native Hawaiian/Pacific IslanderSingle race, non-Hispanic
7Two or more racesMultiple races selected, non-Hispanic
8Nonresident alienInternational students
9UnknownRace/ethnicity unknown
20OtherOther race/ethnicity
99TotalAll races combined
-1Missing/not reported
-2Not applicable
-3SuppressedPrivacy protection

Historical note: Prior to 2010, Asian included Pacific Islanders (code 6 did not exist), and "Two or more races" (code 7) was not collected.

Sex Codes (Portal Integer Encoding)

CodeCategory
1Male
2Female
3Nonbinary/Another gender
4Unknown/Prefer not to say
9Unknown
99Total
-1Missing/not reported
-2Not applicable
-3Suppressed

Note: Codes 3 and 4 are recent additions for non-binary gender reporting. Historical data may only have codes 1, 2, and 99. The exact meaning of codes 3 vs 4 may vary by endpoint — check the specific codebook.

Missing Data Codes

CodeMeaningWhen Used
-1Missing/not reportedData not submitted by institution
-2Not applicableItem doesn't apply to this institution type
-3SuppressedData suppressed for privacy
nullNot availableField not collected for this survey year

Year Field Meanings

Data TypeYear Field Meaning
Institutional characteristicsAs of fall of indicated year
Fall enrollmentAs of fall census date
12-month enrollmentJuly 1 to June 30 academic year
CompletionsAwarded during academic year
Graduation ratesCohort entered in indicated year
FinanceFiscal year ending in indicated year
Student financial aidFor indicated academic year

Data Access

Datasets for IPEDS are available via the mirror system. See datasets-reference.md for canonical paths, mirrors.yaml for mirror configuration, and fetch-patterns.md for fetch code patterns.

Key datasets:

DatasetTypePathCodebook
DirectorySingleipeds/colleges_ipeds_directoryipeds/codebook_colleges_ipeds_directory
AdmissionsSingleipeds/colleges_ipeds_admissions-enrollmentipeds/codebook_colleges_ipeds_admissions-enrollment
Enrollment FTESingleipeds/colleges_ipeds_enrollment-fteipeds/codebook_colleges_ipeds_enrollment-fte
Graduation RatesSingleipeds/colleges_ipeds_grad-ratesipeds/codebook_colleges_ipeds_grad-rates
FinanceSingleipeds/colleges_ipeds_financeipeds/codebook_colleges_ipeds_finance

32 IPEDS datasets exist in the mirror (5 shown above). See datasets-reference.md for the complete list with all paths and codebook paths.

Codebooks are .xls files co-located with data in all mirrors. Use get_codebook_url() from fetch-patterns.md to construct download URLs:

python
url = get_codebook_url("ipeds/codebook_colleges_ipeds_directory")

Truth Hierarchy: When interpreting variable values, apply this priority:

  1. Actual data file (what you observe in the parquet/CSV) — this IS the truth
  2. Live codebook (.xls in mirror) — authoritative documentation, may lag
  3. This skill documentation — convenient summary, may drift from codebook

If this documentation contradicts the codebook, trust the codebook. If the codebook contradicts observed data, trust the data and investigate.

Filtering

python
import polars as pl

# Admissions totals: filter to sex=99 for institution-level totals
# WRONG - includes duplicates (~26K rows with multiple sex values per institution)
df = pl.read_parquet("data/raw/admissions.parquet")
# CORRECT - one row per institution-year (~8K rows)
df_totals = df.filter(pl.col("sex") == 99)

# Calculate admission rate (not provided directly)
df = df.with_columns(
    (pl.col("number_admitted") / pl.col("number_applied") * 100).alias("admit_rate")
)

# Filter to active, degree-granting, 4-year public institutions
df = df.filter(
    (pl.col("sector") == 1) &
    (pl.col("degree_granting") == 1)
)

Data Availability & Lag Times

IPEDS data becomes available with significant lag. Always verify year availability before committing to a year range.

Survey ComponentTypical LagLatest Available (as of Jan 2026)
Directory~1 year2023
Admissions-Enrollment~2 years2022
Fall Enrollment~2-3 years2022
Completions~2 years2022
Finance~4+ years2017 (see warning below)
Graduation Rates~2-3 years2022

CRITICAL: IPEDS Finance Data Cutoff. As of January 2026, IPEDS Finance data is only available through 2017 in the Portal mirrors. This affects endowment values (endowment_end), revenue/expense data, and any financial ratios. Options: (1) limit analysis to available years, (2) use NCCS 990 data for private institutions as an alternative, or (3) forward-fill with a documented caveat and indicator column.

Variable Name Mappings

The Portal uses different names than NCES raw file documentation. The table below lists commonly confused mappings:

NCES Raw File NameActual Portal NameNotes
INSTNMinst_nameInstitution name
STABBRstate_abbrState abbreviation
CONTROLinst_controlInstitutional control
ICLEVELinstitution_levelLevel of institution
DEGGRANTdegree_grantingDegree-granting status
CYACTIVEcurrently_active_ipedsCurrently active flag
DEATHYRyear_deletedYear institution closed
APPLCNnumber_appliedTotal applicants
ADMSSNnumber_admittedTotal admitted
EFTOTLTenrollment_fallFall enrollment (in fall-enrollment-race dataset)
various GR*completion_rate_150pct, completers_150pct, etc.Grad rate variables

Note: Portal variable names are always lowercase with underscores. NCES documentation often uses UPPERCASE or CamelCase. When in doubt, fetch a sample of the actual data and inspect its column names.

Enrollment Dataset Clarification

IPEDS has multiple enrollment-related datasets in the Portal:

DatasetKey ColumnsBest For
fall-enrollment-race (yearly)enrollment_fall, race, sex, level_of_study, ftpt, class_level, degree_seekingDetailed demographic breakdowns
fall-enrollment-age (yearly)Enrollment by age groupAge distribution analysis
enrollment-fte (single)est_fte, rep_fteFTE-based comparisons
enrollment-headcount (single)Headcount dataHeadcount-based analysis
fall-retention (single)Retention ratesRetention analysis

Note: The fall-enrollment-race yearly dataset provides the most granular enrollment data, disaggregated by multiple dimensions. For institution-level totals, filter to race == 99, sex == 99, ftpt == 99, level_of_study == 99.

Common Pitfalls

PitfallIssueSolution
Using string codesPortal uses integer encodings, not NCES string codesAlways verify against Portal codebooks; see encoding table above
Grad rates as sole quality metricIPEDS tracks only first-time, full-time, fall-entering students; excludes ~40% transfers, ~40% part-timeUse Outcome Measures (OM) for part-time/transfer data; note limitations
Cross-sector finance comparisonPublic (GASB) and private (FASB) use different accounting standardsCompare within sector only; see ./references/finance-data.md for crosswalk
Net price for all studentsNet price covers only first-time, full-time students who received Title IV aidDocument population limitation; excludes full-pay students
Admissions without sex filterAdmissions data disaggregated by sex — unfiltered data has duplicatesFilter to sex == 99 for institution totals
No institution_level 3Codes are 1, 2, 4 — not sequential 1, 2, 3Use exact codes: 1=less-than-2yr, 2=2yr, 4=4yr+
Ignoring mergers/closuresInstitutions merge, close, or change sector over timeCheck currently_active_ipeds and year_deleted; track UNITID changes; see ./references/institution-identifiers.md
inst_size as enrollmentinst_size is a 1-5 category code, not an enrollment countUse enrollment endpoints for actual counts

Critical Limitations

Graduation Rates (GRS)

CRITICAL: IPEDS graduation rates track ONLY first-time, full-time, fall-entering students.

Excluded PopulationApproximate % of Undergrads
Transfer students~40%
Part-time students~40%
Spring/summer startsVaries
Students who transfer OUTCounted as non-completers

At community colleges, IPEDS grad rates may represent <25% of students.

See ./references/graduation-rates.md for complete details.

Finance Data

CRITICAL: Public and private institutions use different accounting standards.

StandardInstitution TypeComparison
GASBPublicCompare within sector only
FASBPrivate nonprofitDifferent from GASB
FASBPrivate for-profitDifferent revenue treatment

See ./references/finance-data.md for crosswalk guidance.

Net Price

Net price is calculated ONLY for:

  • First-time, full-time students
  • Who received Title IV aid
  • Excludes full-pay students

See ./references/financial-aid.md for details.

Data Quality Checklist

python
import polars as pl

def ipeds_quality_check(df):
    """Basic IPEDS data quality checks using Portal variable names."""
    issues = []

    # Check graduation rates are 0-100
    if "completion_rate_150pct" in df.columns:
        bad = df.filter(
            (pl.col("completion_rate_150pct") > 100) |
            (pl.col("completion_rate_150pct") < 0)
        )
        if bad.height > 0:
            issues.append(f"Invalid grad rates: {bad.height} rows")

    # Check for non-active institutions (directory dataset)
    if "currently_active_ipeds" in df.columns:
        inactive = df.filter(pl.col("currently_active_ipeds") != 1)
        if inactive.height > 0:
            issues.append(f"Non-active institutions: {inactive.height}")

    # Check sector consistency
    if "inst_control" in df.columns:
        invalid = df.filter(
            ~pl.col("inst_control").is_in([1, 2, 3, -1])
        )
        if invalid.height > 0:
            issues.append(f"Invalid control codes: {invalid.height}")

    return issues

Related Data Sources

SourceRelationshipWhen to Use
education-data-source-scorecardNon-traditional student outcomesPost-college earnings, broader student population
education-data-source-fsaDetailed loan/grant dataFederal student aid analysis (link on OPEID)
education-data-source-nccsPrivate institution 990 dataFinancial data beyond IPEDS cutoff year
education-data-source-pseoPost-college employmentState-level employment outcomes
education-data-source-eadaCollege athleticsAthletics equity and finance
education-data-source-nacuboEndowment dataEndowment analysis beyond IPEDS
education-data-source-campus-safetyCampus crime statisticsSafety and compliance
education-data-explorerParent discovery skillFinding available endpoints
education-data-queryData fetchingDownloading parquet/CSV files

Topic Index

TopicReference File
Survey components overview./references/survey-components.md
Graduation rate cohort definition./references/graduation-rates.md
First-time full-time limitation./references/graduation-rates.md
Transfer-out rates./references/graduation-rates.md
Outcome Measures survey./references/graduation-rates.md
150% vs 200% time./references/graduation-rates.md
Fall enrollment./references/enrollment-data.md
12-month enrollment./references/enrollment-data.md
FTE calculations./references/enrollment-data.md
Enrollment by level./references/enrollment-data.md
GASB accounting./references/finance-data.md
FASB accounting./references/finance-data.md
Revenue categories./references/finance-data.md
Expense categories./references/finance-data.md
Net price definition./references/financial-aid.md
Pell grant data./references/financial-aid.md
Aid by income level./references/financial-aid.md
UNITID./references/institution-identifiers.md
OPEID./references/institution-identifiers.md
Institutional mergers./references/institution-identifiers.md
Sector changes./references/institution-identifiers.md
CIP codes./references/completions-data.md
Award levels./references/completions-data.md
Completers vs completions./references/completions-data.md
Data quality issues./references/data-quality.md
Missing data codes./references/data-quality.md
Sector comparisons./references/data-quality.md