AgentSkillsCN

education-data-source-pseo

人口普查局 LEHD 项目提供的高等教育就业成果(PSEO)数据。实验性统计表将高校毕业生与就业成果关联起来。适用于研究毕业生收入、行业就业分布、毕业生的地域流动,或比较不同院校与不同学位项目的就业成果时使用。覆盖范围仅限于参与州约 29% 的毕业生。

SKILL.md
--- frontmatter
name: education-data-source-pseo
description: >-
  Postsecondary Employment Outcomes (PSEO) data source from the Census Bureau
  LEHD program. Experimental tabulations linking college graduates to employment
  outcomes. Use when researching graduate earnings, employment by industry,
  geographic flows of graduates, or comparing outcomes across institutions and
  degree programs. Coverage limited to ~29% of graduates from participating
  states.
metadata:
  audience: data-analysts
  domain: education-data

PSEO Data Source Reference

Postsecondary Employment Outcomes (PSEO) is an experimental data product from the U.S. Census Bureau that links college graduate records to national employment data, providing earnings and employment outcomes by institution, degree level, and field of study.

CRITICAL: Value Encoding

This document describes Education Data Portal integer encodings, which differ from Census API string codes. The Portal converts categorical variables to integers for consistency.

ContextBaccalaureateAssociatesMastersCensus Division Pacific
Portal (integers)5379
Census API (strings)0503079

Key differences: Degree level uses simple integers (1-10), not string codes like "1C", "05". CIP codes are 2-digit integers (11 for Computer Science), not strings like "11.01".

See ./references/variable-definitions.md for complete encoding tables.

What is PSEO?

  • Producer: U.S. Census Bureau, LEHD program (Longitudinal Employer-Household Dynamics)
  • Coverage: ~29% of all U.S. college graduates from 31 states + D.C. + Western Governors University
  • Content: Links university transcript data with national UI wage records to track graduate employment outcomes
  • Two data types: Graduate Earnings (percentile earnings) and Employment Flows (industry/geography)
  • Frequency: Updated periodically; cohorts span 3-year (Bachelor's) or 5-year (all others) windows
  • Primary identifiers: unitid (IPEDS Unit ID, integer), opeid (integer in Portal data)
  • Privacy method: Differential privacy mechanisms protect individual data

Reference File Structure

FilePurposeWhen to Read
lehd-methodology.mdHow LEHD produces tabulations, data matching processUnderstanding data creation
earnings-data.mdPercentile earnings, cohort definitions, labor attachmentAnalyzing graduate earnings
geographic-flows.mdWhere graduates work by Census DivisionStudying migration patterns
industry-flows.mdWhat industries graduates enter by NAICS sectorCareer pathway analysis
variable-definitions.mdAll variables, codes, and status flagsBuilding queries or interpreting values
state-coverage.mdParticipating states, coverage rates, data partnersUnderstanding limitations

Decision Trees

What type of outcome am I researching?

code
Graduate outcomes research?
├─ Earnings by program/institution
│   ├─ Median earnings → `p50_earnings` column, filter by `years_after_grad`
│   ├─ Earnings distribution → `p25_earnings`/`p50_earnings`/`p75_earnings`
│   └─ See ./references/earnings-data.md
├─ Where graduates work (geography)
│   ├─ Census Division of employment → `census_division` column
│   ├─ In-state vs out-of-state → `employed_instate_grads_count`
│   └─ See ./references/geographic-flows.md
├─ What industries graduates enter
│   ├─ NAICS sector employment → `industry` column (String)
│   └─ See ./references/industry-flows.md
└─ How many graduates are employed
    ├─ Employment counts → `employed_grads_count_f`
    ├─ Non-employed/marginal → `jobless_m_emp_grads_count`
    └─ See ./references/variable-definitions.md

What degree level am I researching?

code
Degree level?
├─ Certificate (<1 year) → degree_level=1
├─ Certificate (1-2 years) → degree_level=2
├─ Certificate (2-4 years) → degree_level=4
├─ Associate's → degree_level=3
├─ Bachelor's → degree_level=5 (default, 3-year cohorts)
├─ Post-Bacc Certificate → degree_level=6
├─ Master's → degree_level=7 (2-digit CIP only)
├─ Post-Masters Certificate → degree_level=8
├─ Doctoral-Research → degree_level=9 (2-digit CIP only)
└─ Doctoral-Professional Practice → degree_level=10

Note: Portal uses integers 1-10. Census Bureau source data uses string codes like "05", "1C" -- these do not appear in Portal data.

Is my institution/state covered?

code
Checking data availability?
├─ Which states participate → ./references/state-coverage.md
├─ Which institutions have data → Check PSEO Explorer or mirror data
├─ Coverage rate for state → ./references/state-coverage.md
└─ Why data might be missing
    ├─ Institution not partnered
    ├─ Cell suppressed (count < 30)
    └─ Insufficient labor force attachment

Quick Reference: PSEO Variables

Earnings Variables

Portal VariableDescription
p25_earnings25th percentile earnings (2022 dollars)
p50_earningsMedian earnings (2022 dollars)
p75_earnings75th percentile earnings (2022 dollars)
years_after_gradYears post-graduation: 1, 5, or 10
employed_grads_count_eGraduate count with earnings data
total_grads_countTotal IPEDS-reported graduates

Flows Variables

Portal VariableDescription
employed_grads_count_fEmployed graduates count
employed_instate_grads_countEmployed in institution's state
jobless_m_emp_grads_countNon-employed or marginally employed
industry2-digit NAICS sector (String, e.g., "54", "31-33")
census_divisionCensus Division of employment (1-9, 99)

Note: Portal uses restructured schema with years_after_grad column instead of Census API's Y1_*/Y5_*/Y10_* naming. The industry column is String type because some NAICS sectors span ranges (e.g., "31-33" for Manufacturing, "44-45" for Retail Trade).

Key Identifiers

IDFormatLevelExampleNotes
unitidIntegerInstitution100751IPEDS Unit ID (University of Alabama)
opeidIntegerInstitution105100Portal stores as integer (Census uses 8-digit zero-padded string)
fipsIntegerState48State of institution (Texas)
cipcode2-digit integerField of study11Computer Science; Portal uses integers, not "11.01"

Key Filters (Portal Integer Encoding)

ParameterDescriptionExample
degree_levelDegree type integer5 (Bachelor's)
pseo_cohortGraduation cohort"2016-2020" or "2019-2021" (string format, full year range)
years_after_gradYears post-graduation1, 5, or 10

Cohort Definitions

Degree LevelCohort YearsExample Cohorts
Bachelor's3-year"2001-2003", "2004-2006", "2007-2009", "2010-2012", "2013-2015", "2016-2018", "2019-2021"
All others5-year"2001-2005", "2006-2010", "2011-2015", "2016-2020"

Missing Data Codes

CodeMeaningWhen Used
-1Missing/not reportedPrimary missing data indicator; very common in earnings and flows columns
-3SuppressedCell count < 30 graduates (differential privacy suppression)
-2Not applicableItem doesn't apply to this entity (Portal convention)

Note: PSEO data has no null values in the parquet files. All missing/suppressed data uses integer codes (-1, -3). Filter with pl.col("p50_earnings") > 0 to get valid earnings, not .is_not_null(). PSEO uses differential privacy rather than traditional suppression. Cells with fewer than 30 graduates are suppressed entirely (coded as -3). Earnings values coded -1 may indicate insufficient labor force attachment.

Data Access

Datasets for PSEO are available via the mirror system. See datasets-reference.md for canonical paths, mirrors.yaml for mirror configuration, and fetch-patterns.md for fetch code patterns.

DatasetTypePathCodebook
Earnings and FlowsYearly (2001-2021)pseo/colleges_pseo_{year}pseo/codebook_colleges_pseo

Codebooks are .xls files co-located with data in all mirrors. Use get_codebook_url() from fetch-patterns.md to construct download URLs.

Truth Hierarchy: When interpreting variable values, apply this priority:

  1. Actual data file (what you observe in the parquet/CSV) -- this IS the truth
  2. Live codebook (.xls in mirror) -- authoritative documentation, may lag
  3. This skill documentation -- convenient summary, may drift from codebook

If this documentation contradicts the codebook, trust the codebook. If the codebook contradicts observed data, trust the data and investigate.

Fetching PSEO Data

python
import polars as pl

# PSEO is a yearly dataset -- fetch individual years
df = fetch_yearly_from_mirrors(
    path_template="pseo/colleges_pseo_{year}",
    years=[2018, 2019, 2020],
)

# Or fetch a single year
df = fetch_from_mirrors("pseo/colleges_pseo_2020")

Filtering

python
# Filter by institution
df.filter(pl.col("unitid") == 100751)  # University of Alabama

# Filter by field of study
df.filter(pl.col("cipcode") == 11)  # Computer Science

# Filter by cohort (note: full year range format)
df.filter(pl.col("pseo_cohort") == "2019-2021")

# Earnings rows only (exclude missing/suppressed)
df.filter(pl.col("p50_earnings") > 0)

# Filter by industry (String column, not integer)
df.filter(pl.col("industry") == "54")  # Professional Services

Additional Access Methods (Census Bureau Source)

  1. PSEO Explorer: Interactive visualization tool at https://lehd.ces.census.gov/data/pseo_explorer.html
  2. Census bulk download: CSV/XLS files at https://lehd.ces.census.gov/data/pseo/
  3. Census API: https://api.census.gov/data/timeseries/pseo/earnings and .../flows (uses different variable naming and string codes; not used in this system)

Common Pitfalls

PitfallIssueSolution
Using Census string codesPortal uses integers (e.g., 5 for Bachelor's), not Census strings ("05")Always check encoding; see variable-definitions.md
Ignoring suppressionCells with <30 graduates are suppressed; missing data looks like no program existsCheck total_grads_count to confirm cell exists; null earnings may mean suppression
Cross-institution comparison without controlling degree/CIPInstitutions offer different program mixes; aggregate comparison is misleadingAlways filter to same degree_level and cipcode when comparing institutions
Treating PSEO as comprehensiveOnly ~29% of graduates covered; participating states differ systematicallyAcknowledge selection bias; do not generalize to all U.S. graduates
Ignoring labor attachmentWorkers need 3+ quarters above minimum wage threshold to appear in earnings dataSome graduates are employed but excluded; note this limitation
Treating Portal opeid as stringPortal stores opeid as integer (e.g., 105100), not Census's 8-digit zero-padded string ("00105100")Use integer comparison in Portal data; only Census API uses string format
Mixing cohort spansBachelor's uses 3-year cohorts; all others use 5-yearFilter by degree_level first, then verify cohort format matches
Assuming inflation comparabilityAll earnings are in 2022 CPI-U dollarsNo manual inflation adjustment needed; values are already real dollars

PSEO vs Other Data Sources

FeaturePSEOCollege ScorecardState Systems
CoverageGraduates onlyAll enrolleesGraduates only
Geographic scopeNational (cross-state)NationalIn-state only
SampleAll graduates from partnersFederal aid recipientsAll graduates
Earnings detail25th/50th/75th percentileMedian onlyVaries
Industry dataYes (NAICS sector)NoVaries
Geographic flowsYes (Census Division)NoNo
Privacy methodDifferential privacyTraditional suppressionVaries

Common Use Cases

Use CaseData NeededKey Considerations
Compare programs within institutionEarnings by CIPCODECheck cell counts for suppression
Compare institutions for same programEarnings by INSTITUTIONEnsure same degree level and CIP
Analyze brain drain/retentionFlows by division + in-stateOnly 9 Census Divisions
Career pathway analysisFlows by NAICS sector2-digit NAICS only
ROI by degree levelEarnings across DEGREE_LEVELDifferent cohort spans

Important Limitations

  1. Experimental status: Not official Census statistics; methodology may change
  2. Partial coverage: Only ~29% of graduates from participating institutions
  3. Selection bias: Participating states/institutions may differ systematically
  4. Employment coverage: Excludes self-employed, independent contractors, military, some federal
  5. Labor attachment requirement: Workers must have 3+ quarters of earnings above minimum wage threshold
  6. Suppression: Cells with fewer than 30 graduates are suppressed
  7. Earnings inflation-adjusted: All earnings in 2022 dollars (CPI-U)

Related Data Sources

SourceRelationshipWhen to Use
education-data-source-scorecardAlternative earnings source (median only, all enrollees)When PSEO coverage is insufficient or need non-graduate outcomes
education-data-source-ipedsInstitution characteristics, enrollment, graduation ratesContextualizing PSEO institutions; join on unitid
education-data-explorerParent discovery skillFinding available endpoints
education-data-queryData fetchingDownloading parquet/CSV files

Topic Index

TopicReference File
LEHD program overview./references/lehd-methodology.md
Data matching process./references/lehd-methodology.md
Differential privacy./references/lehd-methodology.md
Percentile earnings./references/earnings-data.md
Labor force attachment./references/earnings-data.md
Cohort definitions./references/earnings-data.md
Census Division employment./references/geographic-flows.md
In-state employment./references/geographic-flows.md
NAICS sector employment./references/industry-flows.md
Industry code reference./references/industry-flows.md
Variable names and codes./references/variable-definitions.md
Status flags./references/variable-definitions.md
State participation./references/state-coverage.md
Coverage rates./references/state-coverage.md
Data partners./references/state-coverage.md
Mirror-based data downloadData Access section above
Bulk data downloadData Access section above