Projecting Elementary and Secondary School Enrollment

Education Statistics

Enrollment Projections

Forecasting Methods

Data Analysis

Data Visualization

Educational Planning

Time Series Analysis

A comprehensive methodology for projecting elementary and secondary school enrollment using statistical techniques

Published

June 26, 2024

Modified

July 9, 2024

Introduction

According to the National Center for Education Statistics (NCES), total public and private elementary and secondary school enrollment was 56 million in fall 2019, representing a three percent increase since fall 2010. However, between fall 2019 and the first fall of the coronavirus pandemic in 2020, enrollment decreased two percent. From fall 2020 to fall 2030, enrollments are expected to decrease another six percent. Both public and private school enrollments are projected to be lower in 2030 than in 2019.

Accurate enrollment projections in elementary and secondary institutions are crucial for several reasons:

Budgeting and resource allocation: Enrollment projections help school districts plan their budgets effectively. By estimating the number of students expected to attend, districts can allocate resources appropriately, ensuring they have enough teachers, classrooms, and educational materials to meet the demand.
Short-term and long-term planning: Enrollment projections inform both short-term and long-term decision-making. In the short term, they help determine staffing needs and budgeting for specific programs. In the long term, they assist in planning for capital projects, such as building new schools or expanding existing facilities.
Public trust and support: Accurate enrollment projections can demonstrate the need for capital projects to the public, potentially influencing the outcome of school bond referenda. When the community understands the necessity of these projects based on reliable data, they are more likely to support them.

Accurate enrollment projections are vital for effective budgeting, resource allocation, short-term and long-term planning, maintaining public trust, ensuring financial stability, and informed decision-making in elementary and secondary institutions. With that said, many institutions lack a reliable means by which to make long-range enrollment forecasts. I’m here to help.

How to Project Enrollment

For this project, I attempted to emulate the work of the NCES, specifically their Projections of Education Statistics to 2030 (Irwin et al. 2024).

Click here for a detailed methodology for projecting student enrollment using various statistical techniques.

Projection Techniques

There are several key methods that can be used for projecting student enrollment in elementary and secondary institutions such as ratio-based methods, regression-based methods, the dwelling unit multiplier method, and the extended demographic model. After some research and experimentation, I elected to use an exponential smoothing technique.

Exponential Smoothing

Single exponential smoothing is a forecasting method suited for data that is relatively stable over time, where future values are expected to be around the same central value as observed historically, without significant shifts up or down. In developing projections of elementary and secondary enrollments, for example, the rate at which students progress from one particular grade to the next (e.g., from grade 2 to grade 3) can be projected using single exponential smoothing. Thus, this percentage is assumed to be constant over the forecast period.

Generally, exponential smoothing places more weight on recent observations than on earlier ones. The weights for observations decrease exponentially as one moves further into the past. As a result, the older data have less influence on the projections. The rate at which the weights of older observations decrease is determined by the smoothing constant.

When using single exponential smoothing for a time series, \(P_t\), a smoothed series, \(P\), is computed recursively by evaluating where \[\hat{P}_t = \alpha~P_t + (1 - \alpha) P_{t-1}\]\(0 < \alpha \leq 1\) is the smoothing constant.

By repeated substitution, we can rewrite the equation as \[ P_t = \alpha \sum_{s=0}^{t-1} (1 - \alpha)^s P_{t-s}\]where time, \(s\), goes from the first period in the time series, \(0\), to time period \(t-1\). The forecasts are constant for all years in the forecast period. The constant equals \[\hat{P}_{T+k} = \hat{P}_t\]where \(t\) is the last year of actual data and \(k\) is the \(k^{th}\) year in the forecast period where \(k > 0\).

These equations illustrate that the projection is a weighted average based on exponentially decreasing weights. For higher smoothing constants, weights for earlier observations decrease more rapidly than for lower smoothing constants.

Approach Overview

I utilized the grade progression rate method to project grades 2 through 12. With this approach, a rate of progression from each grade (1 through 11) to the next grade (2 through 12) was projected using single exponential smoothing. For example, the rate of progression from grade 2 to grade 3 is the current year’s grade 3 enrollment expressed as a percentage of the previous year’s grade 2 enrollment. To calculate enrollment for each year in the forecast period, the progression rate for each grade was applied to the previous year’s enrollment in the previous grade.

I also utilized the enrollment rate method to project prekindergarten, kindergarten, and first-grade enrollments as well as elementary and secondary ungraded enrollments. In this method, an enrollment rate for each grade (or ungraded level) was projected using single exponential smoothing. For example, the enrollment rate for grade 1 is the number of students enrolled in grade 1 divided by the number of 6-year-old children. To calculate enrollment for each year in the forecast period, the enrollment rate for each category was applied to the projected population in the appropriate age group.

Assumptions Underlying This Approach

The grade progression rate method assumes that past trends affecting public and private elementary and secondary school enrollments will continue over the forecast period. This assumption implies that all factors influencing enrollments will display future patterns consistent with past patterns. This method implicitly includes the net effect of such factors as migration, dropouts, deaths, non-promotion, and transfers between public and private schools.

Limitations of Projections

Projections are complicated by the onset of the coronavirus pandemic in 2020. Projections are based on the assumption that historical patterns will continue into the future. This presents challenges both for (1) using prepandemic historical data to predict unprecedented pandemic-era behaviors and (2) using pandemic-era data to predict post-pandemic behaviors. This exercise includes both scenarios.

Even without a pandemic, projections of a time series usually differ from the final reported data due to errors from many sources, such as the properties of the projection methodologies, which depend on the validity of many assumptions. These projections should be interpreted with caution.

Procedures and Equations

The notation and equations that follow describe the basic procedures used to project elementary and secondary enrollments in each of the three elementary and secondary enrollment projection models.

Let:

\(i\) = Subscript denoting age
\(j\) = Subscript denoting grade
\(t\) = Subscript denoting time
\(T\) = Subscript of the first year in the forecast period
\(N_t\) = Enrollment at the prekindergarten (nursery) level
\(K_t\) = Enrollment at the kindergarten level
\(G_{j,t}\) = Enrollment
\(E_t\) = Enrollment in elementary ungraded programs
\(S_t\) = Enrollment in secondary ungraded programs
\(P_{i,t}\) = Population
\(R_{j,t}\) = Progression rate
\(RN_t\) = Enrollment rate for prekindergarten (nursery school)
\(RK_t\) = Enrollment rate for kindergarten
\(RG_{1,t}\) = Enrollment rate for grade 1
\(RE_t\) = Enrollment rate for elementary ungraded programs
\(RS_t\) = Enrollment rate for secondary ungraded programs.

Step 1. Calculate historical grade progression rates for each of grade. The first step in projecting the enrollments using the grade progression method was to calculate, for each grade, a progression rate for each year of actual data used to produce the projections except for the first year. The progression rate for grade \(j\) in year \(t\) equals \[R_{j,t} = \frac{G_{j,t}}{G_{j-1,t-1}}\]Step 2. Produce a projected progression rate for each of grades 2 through 12. Projections for each grade’s progression rate were then produced for the forecast period using single exponential smoothing. A separate smoothing constant, chosen to minimize the sum of squared forecast errors, was used to calculate the projected progression rate for each grade. Single exponential smoothing produces a single forecast for all years in the forecast period. Therefore, for each grade \(j\), the projected progression rate, \(\hat{R}_j\), is the same for each year in the forecast period.

Step 3. Calculate enrollment projections for each of grades 2 through 12. For the first year in the forecast period, \(T\), enrollment projections, \(\hat{G}_{j,T}\), for grades 2 though 12 were produced using the projected progression rates and enrollments of grades 1 though 11 from the last year of actual data, \(T–1\). Specifically, \[\hat{G}_{j,T} = \hat{R}_j \cdot \hat{G}_{j-1, T-1}\]This same procedure was then used to produce the projections for the following year, \(T+1\), except that enrollment projections for year \(T\) were used rather than actual numbers: \[\hat{G}_{j,T+1} = \hat{R}_j \cdot \hat{G}_{j,T}\] The enrollment projections for grades 2 through 11 for year \(T\) were those just produced using the grade progression method. The projection for grade 1 for year \(T\) was produced using the enrollment rate method as outlined in steps 4, 5, and 6 below.

The same procedure was used for the remaining years in the projections period.

Step 4. Calculate historical enrollment rates for prekindergarten, kindergarten, grade 1, elementary ungraded, and secondary ungraded. The first step in projecting prekindergarten, kindergarten, first-grade, elementary ungraded, and secondary ungraded enrollments using the enrollment rate method was to calculate enrollment rates for each enrollment category for the last year of actual data, \(T–1\), where: \[RN_t = \frac{N_t}{P_{5,t}}\] \[RK_t = \frac{K_t}{P_{5,t}}\] \[RG_{1,t} = \frac{G_{1,t}} {P_{6,t}}\] \[RE_t = \frac{E_t}{\sum_{i=5}^{13}P_{i,t}}\] \[RS_t = \frac{S_t}{\sum_{i=14}^{17}P_{i,t}}\] Step 5. Produce a projected enrollment rate for prekindergarten, kindergarten, grade 1, elementary ungraded, and secondary ungraded. Projections for each category’s enrollment rate were produced for the forecast period using single exponential smoothing. A separate smoothing constant, chosen to minimize the sum of squared forecast errors, was used to calculate the projected enrollment rate for each of these grades (or ungraded levels), specifically for prekindergarten, kindergarten, grade 1, elementary ungraded, and secondary ungraded. Single exponential smoothing produces a single forecast for all years in the forecast period. These enrollment rates were then used as the projected enrollment rates for each year in the forecast period (\(\hat{RN}\), \(\hat{RK}\), \(\hat{RG}_1\), \(\hat{RE}\), and \(\hat{RS}\)).

Step 6. Calculate enrollment projections for prekindergarten through grad 1 and the ungraded categories. For each year in the forecast period, the enrollment rates were then multiplied by the appropriate population projections (\(\hat{P_{i,t}}\)) to calculate enrollment projections for prekindergarten (\(\hat{N_t}\)), kindergarten (\(\hat{K_t}\)), first grade (\(\hat{G}_{1,t}\)), elementary ungraded (\(\hat{E}_t\)), and secondary ungraded (\(\hat{S}_t\)).\[\hat{N}_t = \hat{RN} \cdot \hat{P}_{5,t}\] \[\hat{K}_t = \hat{RK} \cdot \hat{P}_{5,t}\] \[\hat{G}_{1,t} = \hat{RG}_1 \cdot \hat{P}_{6,t}\] \[\hat{E}_t = \hat{RE} \cdot \sum_{i=5}^{13}\hat{P}_{i,t}\] \[\hat{S}_t = \hat{RS} \cdot \sum_{i=14}^{17}\hat{P}_{i,t}\] Step 7. Calculate total elementary and secondary enrollments by summing the projections for each grade and the ungraded categories. To obtain projections of total enrollment, projections of enrollments for the individual grades, elementary ungraded, and secondary ungraded were summed.

By following these steps and utilizing the provided methodologies, I can project national public school enrollments from 2021 through 2030 with a reasonable degree of accuracy.

Tools Utilized

There are numerous tools to select from when tackling a project like this. Instead of delving into the various options, I decided to simply tell you what I used:

Microsoft Excel
Posit RStudio Desktop¹
- This required an installation of R, the programming language.

Getting Started With R

If you’re new to R, there are many resources out there to help you get started. I recommend R for the Rest of Us, particularly their Getting Started With R and Fundamentals of R courses.

Enrollment Projection Process

Process Overview

The process for forecasting student enrollment by grade level and total enrollment using historical data begins by viewing the enrollment data in Excel and making some minor adjustments before loading it into RStudio. It calculates enrollment rates for various grades based on historical and projected population data, then applies exponential smoothing to the historical data to create a trend. Using historical progression rates, future enrollments from 2021 to 2030 are projected. It combines actual and projected data, filters for the years 2010 to 2030, and creates aggregate columns for PK-Grade 8, Grades 9-12, and Total enrollments. The data is reshaped for plotting, and a line plot is generated. The plot visualizes enrollment trends, distinguishing between actual and projected data.

Data Used

For this exercise, I utilized data from the NCES’s Digest of Education Statistics’ Enrollment in public elementary and secondary schools, by level and grade: Selected years, fall 1980 through fall 2030. The dataset covers annual data from 1990 to 2020. it also includes data from 1980, 1985, and projections from 2021 through 2030 which I chose to ignore.

I also utilized Table B-1 and Table B-2 from the NCES’s Projections of Education Statistics to 2030. Table B-1 contains data on the population of prekindergarten- and kindergarten-age children from 2010 through 2030. Table B-2 contains data on the school-age population from 2010 through 2030, with the values. Both datasets include actual values for the years 2010 through 2020 and projected values from 2021 onward (Irwin et al. 2024).

Data Loading and Preparation

To use the data, I initially had to modify the spreadsheet in Excel. First, I un-merged cells A3:A4, B3:B4, and D4:E4. Then, I moved “Year” and “All” from cells A3 and B3 to cells A4 and B4. I also had to modify the remaining cells in row 4:

Cell C4 from “Total” to “Total PK-8”
Cell D4 from “Prekinder- garten” to “Prekindergarten”
Cell F4 from ” Kinder-garten” to “Kindergarten”
Cell G4 from ” 1st grade” to “1st grade”
Cell H4 from ” 2nd grade” to “2nd grade”
Cell I4 from ” 3rd grade” to “3rd grade”
Cell J4 from ” 4th grade” to “4th grade”
Cell K4 from ” 5th grade” to “5th grade”
Cell L4 from ” 6th grade” to “6th grade”
Cell M4 from ” 7th grade” to “7th grade”
Cell N4 from ” 8th grade” to “8th grade”
Cell O4 from “Un- graded\1” to “Ungraded PK-8”
Cell P4 from “Total” to “Total 9-12”
Cell Q4 from ” 9th grade” to “9th grade”
Cell R4 from ” 10th grade” to “10th grade”
Cell S4 from ” 11th grade” to “11th grade”
Cell T4 from ” 12th grade” to “12th grade”
Cell U4 from “Un- graded\1,2” to “Ungraded 9-12”

Additionally, I had to delete “\4” from cell A37. Finally, I deleted row 5 and I converted the spreadsheet from the .xls file format to .xlsx by saving it as .xlsx. After the data was in a usable state, I moved the project to RStudio.

Once in RStudio, the script begins by loading the readxl library for reading Excel files. The dplyr library is also loaded to facilitate data manipulation. The enrollment data is then read from an Excel file, specifically from a range of cells (A4:U37), with certain columns skipped, and the remaining columns are renamed for convenience. Similarly, population data, which includes projected school-age population by selected age groups from 2010 through 2030, is read from another Excel file. To calculate enrollment rates, the enrollment data is joined with the population data on the Year column. This allows the script to compute the enrollment rates for Prekindergarten, Kindergarten, 1st grade, and ungraded enrollments based on the respective age groups in the population data. The mean enrollment rates are then computed, which will be used for forecasting future enrollments.