AgentSkillsCN

education-data-source-edfacts

Urban Institute 教育数据门户的 EDFacts 州级问责数据,涵盖 K-12 评估、毕业率与联邦报告。适用于处理州级熟练度数据、ACGR 毕业率,或 ESSA 问责指标时使用。重要提醒——州级评估分数无法在各州之间进行比较。

SKILL.md
--- frontmatter
name: education-data-source-edfacts
description: >-
  EDFacts state accountability data for K-12 assessments, graduation rates, and
  federal reporting. Use when working with state proficiency data, ACGR graduation
  rates, or ESSA accountability indicators. CRITICAL - state assessment scores
  CANNOT be compared across states.
metadata:
  audience: data-analysts
  domain: education-data

EDFacts Data Source Reference

EDFacts is the U.S. Department of Education's centralized data collection system for pre-K through grade 12 education data from State Education Agencies (SEAs). It provides state assessment proficiency rates, graduation rates, and accountability indicators — the authoritative federal source for state-level K-12 outcome data.

CRITICAL: Value Encoding

The Urban Institute Education Data Portal converts NCES string codes (e.g., ALL, CWD, LEP) to integer codes. Always verify actual data values before filtering — do not rely on documentation labels alone.

ContextSubgroup "All"English LearnerSex "Male"
Portal integer9911
NCES stringALLLEPM

See ./references/variable-definitions.md for complete encoding tables.

What is EDFacts?

  • Collector: U.S. Department of Education, via State Education Agencies (SEAs)
  • Coverage: All public schools and districts in 50 states + DC
  • Content: State assessment proficiency rates, ACGR graduation rates, participation rates, accountability indicators
  • Frequency: Annual collection
  • Available years: Assessments 2009-10 to present; Graduation rates 2010-11 to present
  • Primary identifiers: ncessch (school ID, Int64), leaid (district ID, Int64), fips (state FIPS code, Int64)
  • Key limitation: State assessment scores CANNOT be compared across states (different tests, different cut scores)

Reference File Structure

FilePurposeWhen to Read
accountability-context.mdESSA, NCLB history, accountability systemsUnderstanding policy context
assessment-data.mdProficiency levels, test scores, limitationsWorking with assessment data
graduation-rates.mdACGR methodology, cohort definitionsAnalyzing graduation data
variable-definitions.mdKey variables, suppression codes, special valuesInterpreting specific variables
data-quality.mdKnown issues, state variations, COVID impactsData cleaning, limitations
subgroup-reporting.mdSpecial populations, disaggregationAnalyzing by student groups

Decision Trees

What type of analysis?

code
What EDFacts data do you need?
├─ Assessment/proficiency data
│   ├─ Within-state trends → Valid analysis
│   ├─ Cross-state comparison → INVALID - use NAEP instead
│   └─ Subgroup gaps → See ./references/subgroup-reporting.md
├─ Graduation rates (ACGR)
│   ├─ Understand methodology → See ./references/graduation-rates.md
│   ├─ Extended rates (5-year, 6-year) → See ./references/graduation-rates.md
│   └─ Subgroup rates → See ./references/subgroup-reporting.md
├─ Understanding variables
│   ├─ Missing/suppressed values → See ./references/variable-definitions.md
│   ├─ Range vs. exact values → See ./references/variable-definitions.md
│   └─ Subgroup codes → See ./references/subgroup-reporting.md
└─ Data quality concerns
    ├─ COVID-19 impacts (2019-20) → See ./references/data-quality.md
    ├─ State reporting changes → See ./references/data-quality.md
    └─ Suppression rates → See ./references/data-quality.md

Is my comparison valid?

code
What are you comparing?
├─ Same state, different years
│   ├─ Same assessment system? → Valid
│   └─ Different tests? → Break in time series
├─ Schools within same state → Valid
├─ Districts within same state → Valid
├─ Subgroups within same school → Valid (check suppression)
├─ Different states
│   ├─ Proficiency rates → INVALID
│   ├─ Graduation rates (ACGR) → More comparable
│   └─ Use NAEP instead → Valid
└─ National ranking by proficiency → INVALID

Quick Reference: EDFacts Data Elements

Assessment Data

Data ElementDescriptionAvailable Years
Proficiency rates% meeting state standards in reading/math2009-10 to present
Participation rates% of students assessed2012-13 to present
Achievement levelsBelow Basic, Basic, Proficient, AdvancedVaries by state
Grade levelsGrades 3-8, high school (varies)2009-10 to present

Graduation Data

Data ElementDescriptionAvailable Years
4-year ACGRAdjusted Cohort Graduation Rate2010-11 to present
5-year ACGRExtended graduation rate2011-12 to present
6-year ACGRFurther extended rate2012-13 to present
Diploma typesRegular diploma only in ACGRAll years

Key Identifiers

Portal Data Types: All identifiers are Int64 in the Portal parquet files. The NCES source format (zero-padded strings) is shown for reference only. When joining with other Portal datasets, join on the integer columns directly.

IDPortal TypeNCES Source FormatLevelExample (Int64)
ncesschInt6412-char zero-paddedSchool10000500870
ncessch_numInt64Same as ncesschSchool10000500870
leaidInt647-char zero-paddedDistrict/LEA100005
leaid_numInt64Same as leaidDistrict/LEA100005
fipsInt642-digitState1 (Alabama)

Data Levels

LevelIdentifierDataset Path Pattern
Schoolncessch (Int64)edfacts/schools_edfacts_*
District/LEAleaid (Int64)edfacts/districts_edfacts_*
Statefips (Int64)Aggregate from lower levels

Subgroups Reported

Note: Not all subgroup columns are present in every dataset. Grad rates data does NOT have sex, migrant, or military_connected columns.

SubgroupNCES CodePortal IntegerColumnAvailable In
All studentsALL99race, sex, lep, disabilityAssessments, Grad Rates
Economically disadvantagedECODIS1econ_disadvantagedAssessments, Grad Rates
Students with disabilitiesCWD1disabilityAssessments, Grad Rates
English learnersLEP1lepAssessments, Grad Rates
HomelessHOM1homelessAssessments, Grad Rates
Foster careFCS1foster_careAssessments, Grad Rates
MigrantMIG1migrantAssessments only
Military connectedMIL1military_connectedAssessments only
Race/ethnicityMultiple1-7, 99raceAssessments, Grad Rates
SexM/F1, 2, 99sexAssessments only

EDFacts Filter Column Pattern:

  • Special population columns (lep, disability, homeless, etc.) use 1 = subgroup, 99 = total
  • Race column uses integer codes (1=White, 2=Black, etc.)
  • Sex column uses 1 = Male, 2 = Female, 99 = Total (assessments only)

Grade Codes (grade_edfacts)

CodeGrade Level
3-8Grades 3-8 (individual)
9Grades 9-12 combined
99Total (all grades)

Race Codes

Empirically verified from 2018 school assessment data. Only these values appear in the race column:

CodeCategory
1White
2Black
3Hispanic
4Asian
5American Indian/Alaska Native
7Two or More Races
99Total

Note: Code 6 (Native Hawaiian/Pacific Islander) is NOT observed in the data. Codes 8 (Nonresident alien), 9 (Unknown), 20 (Other), -1, -2, -3 are also not observed in the race column. These codes may exist in other Portal sources but are absent from EDFacts.

Sex Codes

CodeCategory
1Male
2Female
9Unknown
99Total

Disability Codes

Empirically verified from 2018 school assessment and 2019 grad rate data. Only 1 and 99 are observed in the disability column. The expanded codes (0-4) documented in other Portal sources are NOT present in EDFacts datasets.

CodeCategory
1Students with disabilities (IDEA-eligible)
99Total (all students)

LEP Codes

CodeCategory
1Students who are limited English proficient
99All students (total)

Special Population Columns

For homeless, migrant, econ_disadvantaged, foster_care, military_connected:

CodeCategory
1Yes (in subgroup)
99Total (all students)

Missing Data Codes

CodeMeaningWhen Used
-1Missing/not applicableData not reported
-2Not reportedItem doesn't apply to this entity
-3Suppressed for privacyData suppressed for small N-size
-9Rounds to zeroValue rounds to zero
Range valuesExact value suppressedRange provided instead of exact value
_midpt suffixCalculated midpoint of suppressed rangeUse for analysis when exact values are suppressed

Always use _midpt variables for analysis when exact values are suppressed.

Data Access

All EDFacts data is fetched via the mirror-based bulk download system. There is no API access.

Key references:

  • mirrors.yaml -- Mirror definitions, URL templates, read strategies
  • datasets-reference.md -- Canonical dataset paths (one path works for all mirrors)
  • fetch-patterns.md -- fetch_from_mirrors() and fetch_yearly_from_mirrors() patterns

Truth Hierarchy: When interpreting variable values, apply this priority:

  1. Actual data file (what you observe in the parquet/CSV) — this IS the truth
  2. Live codebook (.xls in mirror) — authoritative documentation, may lag
  3. This skill documentation — convenient summary, may drift from codebook

If this documentation contradicts the codebook, trust the codebook. If the codebook contradicts observed data, trust the data and investigate.

Key Datasets

DatasetPathTypeColumns
School Assessmentsedfacts/schools_edfacts_assessments_{year}Yearly (2009-2018, 2020)26 cols
School Grad Ratesedfacts/schools_edfacts_grad_rates_{year}Yearly (2010-2019)18 cols
District Assessmentsedfacts/districts_edfacts_assessments_{year}Yearly (2009-2018, 2020)23 cols
District Grad Ratesedfacts/districts_edfacts_grad_rates_{year}Yearly (2010-2019)15 cols

Note: 2019 assessment data is NOT available (at any level) due to COVID testing waivers.

Codebooks

Codebook .xls files are available for both assessment and graduation rate datasets. Use get_codebook_url() from fetch-patterns.md:

python
# Assessment codebooks:
url = get_codebook_url("edfacts/codebook_schools_edfacts_assessments")
url = get_codebook_url("edfacts/codebook_districts_edfacts_assessments")

# Graduation rate codebooks:
url = get_codebook_url("edfacts/codebook_schools_edfacts_graduation")
url = get_codebook_url("edfacts/codebook_districts_edfacts_graduation")

Codebook naming note: Graduation rate codebooks use _graduation (not _grad_rates), while the data files use _grad_rates. This follows the same pattern as other Portal sources where codebook names differ from data file names. See datasets-reference.md for the authoritative path mapping.

Dataset Column Differences

Assessment and graduation rate datasets have different column sets:

ColumnAssessmentsGrad Rates
sexYes (1, 2, 99)No
migrantYes (1, 99)No
military_connectedYes (1, 99)No
grade_edfactsYes (3-9, 99)No
read_test_* / math_test_*YesNo
grad_rate_*NoYes
cohort_numNoYes
school_name / lea_nameYesYes

Filtering

python
# Grade filtering: grade_edfacts uses integer codes
df = df.filter(pl.col("grade_edfacts") == 4)  # Grade 4
df = df.filter(pl.col("grade_edfacts") == 99)  # All grades combined

# Subgroup filtering: special population columns use 1/99 pattern
df_total = df.filter(pl.col("sex") == 99)  # All students (total)
df_econ = df.filter(pl.col("econ_disadvantaged") == 1)  # Economically disadvantaged only

# Race filtering: integer codes
df_black = df.filter(pl.col("race") == 2)  # Black students

Common Pitfalls

PitfallIssueSolution
Ranking states by proficiencyDifferent tests, different cut scores make comparisons meaninglessUse NAEP for cross-state comparisons
Comparing 2019-20 to other yearsCOVID testing waivers created data gapsNote data gap, exclude year
Ignoring suppressionResults biased toward larger schools/subgroupsDocument suppression rates, use _midpt variables
Assuming proficiency = same thingState definitions of "proficient" vary widelyClarify each state's definition
Pre/post ESSA comparisonDifferent accountability systems (NCLB vs ESSA)Note policy change at 2015 boundary
Using string codes for filteringPortal uses integer encoding, not NCES stringsAlways check actual data values; see encoding tables above

Key Policy Context

LawYearsKey Features
NCLB2002-2015AYP, 100% proficiency goal, HQT
ESSA2015-presentState flexibility, multiple indicators
  • AYP (Adequate Yearly Progress): NCLB requirement eliminated by ESSA
  • ESSA Accountability: States design own systems with federal guardrails
  • N-size: Minimum students required for reporting (varies by state, typically 10-30)

CRITICAL WARNING: Cross-State Comparisons

State assessment proficiency rates CANNOT be compared across states.

FactorWhy It Varies
Assessment contentEach state creates its own tests
Proficiency cut scoresEach state sets own thresholds
Standards alignmentState academic standards differ
Test difficultyNot calibrated nationally

A student "proficient" in one state may score "below basic" in another state taking a harder test with higher cut scores. Rankings of states by proficiency rates are meaningless.

Use NAEP (National Assessment of Educational Progress) for valid cross-state comparisons.

Valid vs. Invalid Analysis Examples

Valid Analysis:

python
# Within-state trend analysis
state_df = df.filter(pl.col("fips") == 6)  # California only
trend = state_df.group_by("year").agg(
    pl.col("read_test_pct_prof_midpt").mean()
)
# Valid: Same state, same test system

INVALID Analysis:

python
# DO NOT DO THIS - Cross-state comparison
# This comparison is MEANINGLESS
state_comparison = df.group_by("fips").agg(
    pl.col("read_test_pct_prof_midpt").mean()
).sort("read_test_pct_prof_midpt", descending=True)
# INVALID: Different tests, different standards

Related Data Sources

SourceRelationshipWhen to Use
education-data-source-ccdCCD provides school/district demographicsCombining outcome data with school characteristics
education-data-source-crdcCRDC has discipline, AP, school climate dataAnalyzing school equity alongside achievement
education-data-source-saipeSAIPE provides district poverty estimatesLinking poverty to achievement
education-data-source-mepsMEPS provides school poverty estimatesSchool-level poverty and assessment analysis
education-data-explorerParent discovery skillFinding available endpoints
education-data-queryData fetchingDownloading via mirrors

Topic Index

TopicReference File
NCLB to ESSA transition./references/accountability-context.md
State accountability systems./references/accountability-context.md
Federal reporting requirements./references/accountability-context.md
Proficiency levels./references/assessment-data.md
Why states can't be compared./references/assessment-data.md
NAEP comparison./references/assessment-data.md
Assessment system changes./references/assessment-data.md
ACGR calculation./references/graduation-rates.md
Cohort adjustments./references/graduation-rates.md
Extended graduation rates./references/graduation-rates.md
Diploma types./references/graduation-rates.md
Suppression codes./references/variable-definitions.md
Missing data values./references/variable-definitions.md
Range/midpoint variables./references/variable-definitions.md
Participation rates./references/variable-definitions.md
COVID-19 data gaps./references/data-quality.md
State reporting variations./references/data-quality.md
Known data issues./references/data-quality.md
Time series breaks./references/data-quality.md
Students with disabilities./references/subgroup-reporting.md
English learners./references/subgroup-reporting.md
Economically disadvantaged./references/subgroup-reporting.md
Race/ethnicity reporting./references/subgroup-reporting.md
Homeless/foster/migrant./references/subgroup-reporting.md
N-size requirements./references/subgroup-reporting.md