John Baker
  • About
  • Work
  • Contact

On this page

  • Why This Matters Right Now
  • What You Can Do With It
  • The Bias Mitigation Approach, in Plain English
  • Why New York State
  • The EdTech Context
  • How It Was Built
  • A Note on What This Isn’t

When Students Miss School, Reading Scores Follow: A New York State Dashboard

chronic absenteeism
K-12 analytics
educational equity
NYSED
Quarto dashboards
Python
An interactive look at chronic absenteeism and ELA proficiency across ~700 New York school districts, built responsibly with NYSED’s peer-grouping framework.
Published

May 4, 2026

Why This Matters Right Now

Two numbers tell most of the post-pandemic story in American K–12 education. The first is chronic absenteeism, which roughly doubled between the 2018–19 and 2021–22 school years and has remained stubbornly elevated since. The second is fourth-grade reading: on the 2024 Nation’s Report Card, 40 percent of fourth graders scored below the NAEP Basic level in reading — the largest share in more than two decades. These two trends are not coincidences, and they are not unrelated.

I built this dashboard to look at how they move together across New York State’s roughly 700 public school districts, using the New York State Education Department’s publicly available Report Card data. The goal is not to rank districts or name a villain. It is to give an honest, district-contextualized view of a correlation that should shape how every EdTech product designed for K–12 thinks about its user base in 2026.

What You Can Do With It

→ Open the interactive dashboard

The dashboard has five pages:

  1. Overview — statewide value boxes and a short framing of how to read the data, plus a peer group snapshot for the latest school year
  2. The Correlation — a scatter plot of chronic absenteeism versus English Language Arts (ELA) proficiency, colored by the New York State Education Department’s (NYSED’s) Need-to-Resource Capacity category, so you are always comparing districts to their peers
  3. Trends Over Time — statewide and by peer group, with the canceled 2020 assessment year annotated
  4. District Explorer — a sortable, filterable table for anyone who wants to look up their own district, with K–12 enrollment, absenteeism rate, and both Elementary and Middle (EM) 3–8 and Regents ELA proficiency
  5. Methodology — a full accounting of data sources, proficiency formula choices, suppression rules, the K–2 literacy gap, and what the data cannot tell you

The Bias Mitigation Approach, in Plain English

When you plot district outcomes against each other, it is tempting to rank them on whichever axis looks bad. Districts with the highest absenteeism rates become “the worst,” and the conversation ends there. That framing is wrong, and the data itself tells you why.

NYSED already publishes a peer-grouping index called Need-to-Resource Capacity, or N/RC, which sorts districts into six categories based on estimated poverty and combined wealth: New York City, the four large cities (Buffalo, Rochester, Syracuse, Yonkers), high-need urban-suburban, high-need rural, average need, and low need. The index is not a political statement. It is a measure, built by the state, of how much a district can meet its students’ needs with the resources it has. When you color the scatter plot by N/RC category, the so-called “worst” districts almost always turn out to be the ones working with the least. Comparing a high-need rural district to a low-need suburban district on raw chronic absenteeism is a category error.

So the dashboard is built around three commitments:

  • Compare districts within their peer group. The scatter plot is colored by N/RC category, and the trend lines are drawn per category, not statewide only.
  • Use student-first language. “Students experiencing chronic absenteeism,” not “chronically absent students.” The distinction is small. The reason for it is not.
  • Name what the data cannot do. The Methodology page walks through suppression rules, the difference between correlation and causation, and what the data leaves out.

Why New York State

Two reasons. First, I have worked directly with NYSED data before — three years as a data analyst in Hudson Valley school districts — so I know the quirks of the data. Second, New York is large and diverse enough that the N/RC peer-grouping story actually shows up in the data. A dashboard built on a smaller or more homogeneous state would flatten the contrast.

The EdTech Context

A growing segment of EdTech is organized around a shared premise: instruction, attendance, and engagement are closely linked. Research supports the connection — chronic absenteeism is associated with significantly lower reading proficiency, and school-level absenteeism predicts drops in ELA and Math scores even for students who do show up. Attendance intervention platforms act on that link directly, using family outreach and behavioral nudges to reduce absences. District analytics tools can surface that data in forms that school leaders can act on. Literacy platforms deliver the content. The underlying premise, visible across product categories if not always stated explicitly, is that improving reading outcomes requires also addressing who shows up. This dashboard is, in a sense, a visual argument for that premise.

How It Was Built

Data and extraction. All data comes from the NYSED public downloads at data.nysed.gov/downloads.php. Six tables were extracted from the Every Student Succeeds Act (ESSA) School Report Card (SRC) Microsoft Access database using Python and mdbtools:

  • Annual EM ELA — Grade 3–8 ELA proficiency by district and subgroup
  • Annual Regents Exams — filtered to Common Core English as the high school ELA analog
  • ACC EM Chronic Absenteeism — Grades 1–8
  • ACC HS Chronic Absenteeism — Grades 9–12
  • BOCES and N/RC — district peer-group classification
  • BEDS Day Enrollment (from the separate Enrollment Database) — true K–12 enrollment totals

Three SRC releases (2019, 2022, 2024) plus their matching Enrollment releases were stitched into a single time series covering 2018–2024, with duplicate years across releases deduplicated by keeping the later release’s values.

Cleaning. pandas, with explicit handling of NYSED’s "s" suppression values, the all-text-column quirk of the source Access database, Regents subject-label drift across releases, and the missing TOTAL_COUNT column in older ELA tables. Processed data is saved as parquet.

Proficiency formula. The dashboard reports NUM_PROF / TOTAL_COUNT rather than NYSED’s published PER_PROF (NUM_PROF / NUM_TESTED). The difference is the treatment of non-testers: our denominator includes them; NYSED’s does not. Including non-testers in the denominator is the more conservative, opt-out-adjusted choice, and it means proficiency rates here will not match data.nysed.gov. The Methodology page explains the asymmetry in full.

Dashboard. Quarto dashboard format, Plotly for interactive charts, ITables for the District Explorer. The dashboard ships with a dual light/dark theme that respects the user’s OS color-scheme preference. Plotly figures re-theme on toggle via a small MutationObserver that watches Quarto’s color-scheme stylesheet swap and calls Plotly.relayout on every figure. The six-category N/RC palette and the two-line trends chart both use the Okabe-Ito colorblind-safe palette.

Source code. The full data pipeline and dashboard source are at github.com/baker-jr-john/chronic-absenteeism-dashboard.

A Note on What This Isn’t

The dashboard is descriptive, not predictive. I am not making claims about causation. I am not building a fairness audit of a classifier. I am not proposing policy. What I am doing is showing, carefully, what the public data says and setting up the conversation so that someone looking at it has the context they need to think about it responsibly.

That is the kind of work I would want to do in any role focused on K–12 educational data — designing tools that give educators an honest, contextualized view of the numbers they need to act on — and this dashboard is the closest thing I can build to a demonstration of how I approach it.

Back to top

Made with and Quarto