Supplying Disadvantaged Schools with Effective Teachers: Experimental Evidence on Secondary Math Teachers from Teach For America
Abstract
Teach For America (TFA) is an important but controversial source of teachers for hard-to-staff subjects in high-poverty U.S. schools. We present findings from the first large-scale experimental study of secondary math teachers from TFA. We find that TFA teachers are more effective than other math teachers in the same schools, increasing student math achievement by 0.07 standard deviations over one school year. Addressing concerns about the fact that TFA requires only a two-year commitment, we find that TFA teachers in their first two years of teaching are more effective than more experienced non-TFA teachers in the same schools.
INTRODUCTION
The U.S. education system faces growing concerns about widening disparities in academic achievement and subsequent life outcomes between disadvantaged and nondisadvantaged students (Duncan & Murnane, 2011). In policy debates over how to improve the outcomes of disadvantaged students, ensuring a supply of effective teachers to high-poverty schools has been a central focus of attention. A key impetus has come from the accumulating body of empirical evidence demonstrating that teacher effectiveness is critical to students’ academic and life outcomes.1 Despite the importance of teacher quality to student success, school districts across the United States struggle with obtaining high-quality teachers for schools serving low-income students (Jacob, 2007; Monk, 2007). These challenges are more serious in particular academic subjects, especially math and science at the secondary level (Ingersoll & May 2012; Ingersoll & Perda, 2009).
Views differ widely on how to increase the supply of effective teachers to high-poverty schools. One prominent view is that increasing the amount of formal education and preparation a teacher receives before entering the classroom will help ensure effective teaching (Darling-Hammond, 2000). Critics of this view contend that the traditional preparation offered by schools of education adds little value to teachers’ effectiveness in the classroom and, instead, imposes substantial costs that can deter talented individuals from entering teaching (Hess, 2001). In response, many states have lowered the barriers to entering teaching by allowing teachers to participate in alternative certification programs, which allow people to start teaching before completing the certification-related coursework and student teaching that constitute the traditional route into teaching. However, most of these alternative certification programs, like most traditional certification programs, admit nearly all applicants (Mayer et al., 2003; Walsh & Jacobs, 2007), so they raise the quantity of teachers without necessarily ensuring quality. In fact, evidence indicates that teachers from less selective alternative certification programs are no more or less effective than traditionally certified teachers at the elementary level (Constantine et al., 2009).
Teach For America (TFA) represents an innovative approach to supplying teachers to disadvantaged schools. Founded in 1989, the TFA program (i) invests heavily in recruiting and screening, (ii) provides a short but intensive teacher training program, and (iii) provides additional support to new teachers. Like other alternative certification programs, it recruits people who typically do not have an education degree or other formal training in education. However, it is much more selective than typical alternative or traditional certification programs.2 It recruits high-achieving college graduates who, through an intensive application and screening process, demonstrate characteristics that TFA believes are correlated with success in the classroom. It also differs from other certification programs in that it requires its participants, known as “corps members,” to commit to only two years of teaching (although they can choose to remain longer). This increases the pool of potential recruits by including people who do not want to commit to a long-term career in teaching.
TFA assigns corps members to the region where they will teach at the time of their acceptance into the program, and then corps members are hired by partner schools and districts. Before beginning their first teaching job, corps members participate in an intensive five-week training program. Regional TFA staff provide them with ongoing training and support during their two-year commitment. This includes one-on-one coaching support, group meetings customized by grade and subject, and access to additional classroom resources and assessments via an online portal. Corps members in most regions must also complete alternative certification programs, state-defined routes through which people can begin teaching before completing all the requirements for state certification. TFA encourages corps members who complete their two-year commitment (known as TFA alumni) to continue to work to address educational inequities, whether by continuing to teach or by assuming educational and other leadership positions.
TFA is a growing and important source of teachers in low-income schools. Since it placed its first cohort of approximately 500 corps members in the 1990 to 1991 school year, TFA has expanded considerably, and in the 2011 to 2012 school year, more than 9,000 first- and second-year TFA corps members were teaching in 43 urban and rural regions across the country. The program's growth is expected to continue—in 2010, TFA received a $50 million Investing in Innovation (i3) Scale-Up grant from the U.S. Department of Education to increase the size of its teacher corps by 80 percent and to expand to up to 54 regions by the 2014 to 2015 school year.
Despite its growth as a source of teachers for high-poverty schools, TFA is highly controversial. One strand of criticism is that TFA teachers—and teachers from alternative certification programs more generally—are underprepared for teaching relative to teachers who have completed a traditional university-based teacher education program (Darling-Hammond, 1990, 2000; Darling-Hammond et al., 2005). An additional criticism, not applicable to other alternative certification programs, is that, because TFA asks its teachers to make only a two-year commitment to teaching, its teachers are more likely to be inexperienced and therefore ineffective (Heilig & Jez, 2010).
Despite the controversy surrounding TFA, rigorous evidence on the effectiveness of teachers from this program has been sparse. Only two experimental studies—both focused exclusively on elementary schools—have estimated the effectiveness of TFA teachers, and they have come to somewhat inconsistent conclusions. In an early experiment (Decker, Mayer, & Glazerman, 2004; Glazerman, Mayer, & Decker, 2006), researchers randomly assigned nearly 1,800 students to either TFA teachers or teachers who received their certification through other routes. They find that students of TFA teachers perform as well as students of non-TFA teachers in reading and score better in math by about 0.15 standard deviations. A subsequent experiment with a similar design (Clark et al., 2015) finds no significant difference in either reading or math scores between students of TFA and non-TFA teachers.
Some nonexperimental studies have compared the achievement of secondary school students taught by TFA and non-TFA teachers. Using data from New York City, Boyd et al. (2006) find that students of middle school TFA teachers in their first year of teaching score higher in math, but lower in reading, than those of traditionally certified teachers in their first year of teaching, after controlling for students’ prior scores, demographic covariates, and school fixed effects. With similar methods, both Kane, Rockoff, and Staiger (2008) and Boyd et al. (2012) also use data from New York City and find that students of middle school TFA teachers score higher in math than students of traditionally certified teachers. In reading, Kane, Rockoff, and Staiger (2008) find no effect of TFA teachers relative to traditionally certified teachers. Henry et al. (2014) employ similar nonexperimental methods and find that TFA teachers in North Carolina have positive effects relative to traditionally certified teachers in math (within elementary, middle, and high school grades) and English-language arts (in elementary and high school grades). Using data from high schools in North Carolina, Xu, Hannaway, and Taylor (2011) exploit within-student, cross-subject variation in the certification route of students’ teachers and find that TFA teachers raise student achievement relative to non-TFA teachers, especially achievement in science. The concern with these nonexperimental studies is that findings could be biased if students with characteristics unobserved in the data and correlated with achievement scores are more likely to be assigned to TFA teachers.
In total, the existing empirical evidence on TFA has been neither large enough nor consistent enough to resolve public debates about whether hiring top college graduates with little experience and formal preparation is a desirable approach to staffing high-poverty schools. Although the existing evidence is, on net, somewhat favorable to TFA, it has not swayed some high-profile public intellectuals who remain critical of TFA on the grounds that “five weeks of training is insufficient” and “the weakest teachers are in their first two years of teaching” (Ravitch, 2013). Public perceptions of TFA also remain ambivalent, with recent news articles even suggesting that a recent downturn in the number of TFA recruits could be partially due to a “loss of luster for … Teach for America's belief that new college graduates can jump into teaching without much training” (Rich, 2015).
This paper seeks to broaden and deepen the evidence about the impacts of TFA teachers by presenting findings from a large-scale random assignment study of the effectiveness of math TFA teachers in middle and high schools.3 It has three important contributions. First, it includes over 4,500 students and 140 teachers in 50 schools across 10 school districts in eight states.4 Unlike previous studies of TFA teachers at the secondary level, which focused on one particular state or school district, this study combines evidence from multiple school districts in multiple states producing findings applicable to a broad cross section of high-poverty schools in which TFA teachers work.
Second, whereas prior estimates of the impact of TFA are potentially subject to bias from the sorting of students to teachers, our study is free of this threat due to its experimental design. Students were randomly assigned to a math class taught by a TFA teacher or to a math class taught by a teacher from some other program. To estimate the effectiveness of TFA teachers, we compare the end-of-year math achievement of students taught by TFA teachers with those taught by non-TFA teachers. This experimental design is particularly important at the high school level because, to date, there has been no formal validation of the claim that nonexperimental methods can identify the causal effects of high school teachers. This stands in contrast to the elementary and middle school grades, in which emerging evidence supports the causal validity of nonexperimental value-added models (Chetty, Friedman, & Rockoff, 2014a; Kane et al., 2013).5
Third, this study focuses on a subject area, secondary math, that high-poverty schools find particular challenging to staff; previous experimental evidence at the elementary level does not specifically address hard-to-staff subject areas. The study focuses on TFA teachers teaching math in grades 6 through 12 (secondary math) for several reasons. First, school districts report greater difficulties filling vacancies in secondary math (as well as science and special education) than in other subjects (Ingersoll & Perda, 2009). For high-poverty schools, this challenge is compounded by a net tendency for math teachers to transfer from high- to low-poverty schools (Ingersoll & May, 2012). Second, poor math skills among U.S. students relative to those in other industrialized countries are a growing concern (Kelly et al., 2013). Third, a substantial number of TFA corps members—about 23 percent in the 2010 to 2011 school year—teach secondary math.
We find that math teachers from TFA are more effective than other teachers in the same schools, increasing student math achievement by an average of 0.07 standard deviations over the course of a school year. Addressing the concern about the limited experience of TFA teachers, we also find that inexperienced TFA math teachers (those in their first or second year of teaching) are more effective than experienced non-TFA math teachers (those with five or more years of teaching experience) in the same schools. In essence, our findings show that the TFA program model can simultaneously boost the quantity and quality of teachers in hard-to-staff subjects within high-poverty schools.
We also find that credentials typically found on a teacher resume, such as college quality, coursework taken, and professional experience, cannot explain why TFA teachers outperform their non-TFA colleagues. The remaining potential explanations, which this study cannot separate, is that TFA recruits teachers with better characteristics on less easily observed dimensions—such as persistence or interpersonal skills—or that TFA provides better training in a more compact time frame. Therefore, this study does not show that training is irrelevant. However, the evidence does show it is possible for a program to supply effective teachers to disadvantaged schools without providing lengthy periods of training or encouraging long-term commitments to teaching.
The rest of the paper is organized as follows. First, we provide more details on TFA and describe our research design and data collection. Then we describe the schools, teachers, and students in the sample, and our estimation methods. Next, we present the experimental findings on the effectiveness of TFA teachers and explore whether easily observed credentials can explain the difference in effectiveness between TFA and non-TFA teachers. Lastly, we provide some conclusions.
TEACH FOR AMERICA
The goal of TFA's recruitment process is to enroll people with characteristics that TFA believes are correlated with their becoming effective corps members: demonstrated leadership and achievement, perseverance, critical thinking skills, organizational ability, interpersonal skills, a strong dedication to TFA's mission, and respect for individuals’ diverse experiences and ability to work effectively with people from diverse backgrounds. In addition, all corps members must be U.S. citizens or permanent residents, have an undergraduate grade point average (GPA) of 2.5 (although the average GPA of admitted corps members is about 3.6), and have a bachelor's degree from an accredited college or university prior to beginning the TFA summer training program.
TFA's admission process is intensive and highly selective. The process has three stages: (i) an online application, (ii) a 25- to 45-minute telephone interview, and (iii) a full-day in-person “final interview” in which the applicant participates in a one-on-one interview and is observed presenting a lesson and participating in a group discussion. To determine who should be screened out at each stage of the application process, TFA relies heavily, although not completely, on a regression model of achievement growth of the students of current and previous corps members. The explanatory variables in the model comprise over 20 characteristics of the corps members collected during the application process.6 Of those applicants who submitted an online application in recent years, only about 12 percent were offered places in the program. Of those offered places, about 80 percent accepted.7
Before teaching, corps members must complete TFA's preservice training program. The core of this training is a five-week full-time “summer institute.” At this institute, corps members attend courses on lesson planning, content delivery, classroom management, student assessment, literacy, and effective interactions with diverse populations. Corps members also lead small-group or whole-class instruction in classes at a local school district's summer school program under the supervision of a regular classroom teacher. Training also involves self-directed assignments before the summer institute and several-day meetings before and after the summer institute in the region in which the corps member will teach.
Although the process by which corps members are placed in schools varies by region, in all regions TFA plays an active role. TFA assigns corps members to a region based on the corps members’ preferences, the needs of the region, and region-specific requirements for teachers. TFA staff direct corps members to schools at which to interview and may discuss with district officials and principals how best to assign the corps members to schools.
Once corps members begin teaching, TFA continues to provide training and support for two years. In the 10 regions in our study, TFA provides an average of just over 40 hours of formal training to each corps member after he or she begins teaching (Clark et al., 2013). In addition, TFA assigns each corps member to a TFA staff person who observes the corps member teaching and then meets one-on-one with the corps member to provide feedback. TFA also schedules group meetings with corps members to provide additional guidance.
TFA typically does not provide teacher certification, so most TFA corps members need to enroll in a state-authorized alternative certification program operated by another organization such as a local university or school district. (In a few regions, TFA is a state-authorized certification provider and certifies its own corps members.) These programs may require corps members to participate in coursework prior to entering the classroom, although this is typically not intensive; most programs require coursework during the first year of teaching, and some extend into the following summer or the second year of teaching.
Corps members are paid the same salary as other new teachers, but may receive additional financial support. As well as covering the costs of room and board during the summer institute and other meetings, TFA offers needs-based no-interest loans and grants to cover training, relocation, and testing and certification fees. Most TFA corps members at the time of our study were also eligible for AmeriCorps education awards of about $5,400 per year.
RESEARCH DESIGN AND DATA COLLECTION
Experimental Design
We conducted the study in the 2009 to 2010 and 2010 to 2011 school years on separate cross sections of teachers and students. Before each study school year, we identified schools in which TFA teachers and teachers from other certification routes were teaching different classes (or “sections”) covering the same math course. The classes typically needed to be at the same class period so that the random assignment of the students did not disrupt their schedules. For example, students could not be randomly assigned between period 1 and period 4 math classes if all students in band need to participate in band practice in period 4. Just prior to the start of the school year, we randomly assigned students in each study school who signed up for a particular math course to a class taught by a TFA teacher or a class taught by a comparison teacher who entered teaching through a traditional education or alternative certification program. Students who were assigned to a TFA teacher constitute the treatment group; those who were assigned to a comparison teacher constitute the control group. The set of classes between which students were assigned formed a randomization block. Classes in the same randomization block covered the same course at the same level (for instance, honors Algebra I or remedial sixth-grade math).
All secondary math teachers who entered teaching through TFA were potentially eligible to be included in the study sample. This included teachers who were still fulfilling their two-year commitment to the program (TFA corps members) and those who remained in teaching after completing their two-year commitment (TFA alumni). The comparison teachers could have entered teaching through a traditional route to certification or through an alternative route that was not highly selective in its admissions—this allowed the sample to reflect the typical mix of non-TFA math teachers in the study schools.8 Given that TFA teachers’ effects are estimated relative to non-TFA teachers currently teaching in the same schools, our study essentially treats those non-TFA teachers as the best approximation to the counterfactual teachers that students would have had if TFA teachers had not been teaching in the study schools.
We did not impose any restrictions on the amount of prior teaching experience that teachers in the study could possess. Therefore, TFA and comparison teachers who were compared in the study could (and did) have different experience levels. TFA teachers in the study had an average of two years of teaching experience, compared with an average of 10 years among the comparison teachers, consistent with the fact that TFA requires its teachers to make only a two-year commitment. Because we imposed no restrictions on teacher experience, the sample reflects differences in teaching experience of TFA and comparison teachers in the study schools. Therefore, the study design mimics the choice that a school administrator faces when selecting the type of teacher to fill a teaching position over the long run, given that relying on the group with higher expected turnover—TFA teachers—would imply that in steady state the position will be held by a less experienced teacher than otherwise would have occurred. Through this design, we can directly examine the common criticism that TFA teachers tend to be less effective than their counterparts from other programs due to their relative inexperience.
Although the comparison teachers realistically represent those who would teach students in high-need schools in the absence of TFA, their effectiveness may not necessarily resemble that of the average teacher in their districts or states. To the extent that high-need schools have trouble filling vacancies in secondary math (Ingersoll & Perda, 2009; Jacob, 2007), schools may be forced to staff those positions with less effective teachers. On the other hand, existing empirical evidence finds, at most, small gaps in effectiveness within districts between teachers of lower- and higher-income students (Isenberg et al., 2013). Nevertheless, the findings from this paper are not intended to predict how TFA teachers would perform relative to teachers in more affluent schools.
Even for high-need schools, the external validity of this experiment would be diminished if teachers’ participation in the experiment altered their behavior and effectiveness. For example, if TFA teachers who participated in the study received increased pressure or support from the TFA central organization to perform well, then we would find more positive effects of TFA teachers than the effects that would have been realized in the absence of the study. However, this sort of manipulation by TFA was generally not possible. Although TFA knew about and cooperated with the study's data collection efforts, it did not know which specific teachers were included in the study. On the other hand, teachers themselves were aware of their participation in the study, and we cannot rule out the possibility that it affected their behavior in ways that influenced the impact estimates.
A total of 5,790 students were randomized in 110 randomization blocks with 140 math teachers in 50 schools. To obtain this sample, we recruited 10 school districts with large concentrations of secondary math TFA teachers and then, within those districts, we contacted schools prior to each study year to determine their eligibility for the study and willingness to participate. Eligible schools were those with sets of TFA and comparison teachers teaching math classes that could form a randomization block for the study.9 Math courses eligible for inclusion included sixth-, seventh-, and eighth-grade math; general high school math; Algebra I; Algebra II; and Geometry.
Before the start of each new school year, schools sent us lists of students whom they wanted placed into one of the classes in an identified randomization block, and we randomly assigned these students to classes. Because in most cases the classes were in the same period, the random assignment did not affect the students’ class assignment or schedules for any other class. Schools could request specific assignments for a small number of students (for instance, students with disabilities whose Individualized Education Programs [IEPs] required them to be placed with particular teachers), in which case the students were excluded from the sample. In practice, this was rare, with fewer than 30 students who were enrolled at the start of the school year exempted from random assignment. After school began, we conducted random assignment for late-enrolling students, up through at least the first month of school.
We randomly assigned students between classes in a randomization block with equal probability, with a few exceptions. First, in randomization blocks in which a student had been exempted from random assignment and nonrandomly placed in a particular class, we randomly assigned the remaining students between the remaining available slots in the block—so they had a slightly lower probability of assignment to the class in which the exempted student had been placed. Second, after school began, if class sizes were imbalanced, we randomly assigned late-enrolling students with slightly higher probability to the smaller classes, with the goal of ensuring that final class sizes were roughly equivalent within blocks (both to accommodate schools’ preferences for balanced class sizes and to ensure comparability for the analysis). We adjusted for unequal probabilities of assignment within blocks by using sample weights, discussed further below.
To monitor movement in and out of the study classes, we asked the schools to send us updated class lists—essentially, enrollment snapshots—for the study classes at three times during the school year. From these lists, we were able to track study students moving out of the study classes and non-study students moving into the study classes.
Data Collection
We measured student math achievement using scores from math assessments administered at the end of the school year in which the students were randomly assigned. For students in grades 6 to 8, we obtained scores on state-required assessments. For students in grades 9 to 12, because state-required assessments are not consistently available, we administered end-of-course computer adaptive math assessments developed by the Northwest Evaluation Association (NWEA) in the subject in which the student was enrolled (general high school math, Algebra I, Algebra II, or Geometry). We attempted to collect test data on all students in the study sample unless they moved out of the school district, including students who moved to a different class within the school and those who moved to a different school within the district. For comparability across tests, all scores were converted to z-scores.10 For middle school grades, the z-score was based on the statewide mean and standard deviation of scores in the grade level and year in which the assessment was administered; for the high school grades, the z-score was based on the national mean and standard deviation of scores for the NWEA assessments. We collected baseline reading and math scores (also converted to z-scores) from prior state assessments and demographic characteristics on all students from district records. Baseline scores were drawn from the most recent prior grade at which end-of-grade state assessments were administered.
We asked all 140 teachers in the study in the spring of each of the study school years to complete a web-based survey; the response rate to the survey was 93 percent. We also collected teachers’ scores from either the Praxis II Mathematics Content Knowledge Test (taken by the high school teachers in the sample, along with a few middle school teachers in states that allowed or required middle school teachers to take this test) or the Praxis II Middle School Mathematics Test (taken by the remaining middle school teachers in the sample). We administered the Praxis test to teachers who had not taken it previously and gathered existing scores from those who had, obtaining scores for 84 percent of the study teachers.
Student Mobility and Attrition after Random Assignment
Nonrandom attrition from the randomization sample could threaten the internal validity of the estimates. Attrition occurred whenever we could not obtain the end-of-year math score of a student in the randomization sample. This occurred for four reasons: (i) parents did not provide consent for us to obtain state assessment scores (in middle schools) or administer the end-of-course test (in high schools); (ii) students left the participating school district; (iii) we were unable to administer the test to high school students because they were absent from class and did not show up for a make-up test; and (iv) school districts did not have state assessment data on the students.
We obtained end-of-year scores for 4,570 students (79 percent) of the 5,790 students who were randomly assigned, as shown in Table 1. Reassuringly, rates of mobility and nonmissing outcome data are similar between treatment students (80 percent) and control students (79 percent), suggesting that student attrition from the study is unlikely to have been related to treatment status. Some students left their originally assigned classes during the school year, but slightly over three-fourths of students in the randomization sample were, as of the end of the study school year, still in the set of study classrooms and with their originally assigned type of teacher (TFA or non-TFA). Only 2 percent of students had switched to a study classroom with a different type of teacher—that is, students who were assigned to a TFA teacher switched to a non-TFA study teacher or students who were assigned to a non-TFA teacher switched to a TFA study teacher. The remaining sample members transferred to a nonstudy classroom in the same school or left their original school. Table 1 shows that each type of mobility occurred with strikingly similar frequencies in the treatment and control groups; moreover, within each of those mobility groups, similar percentages of treatment and control students have nonmissing outcome data.
Percentages of students | ||
---|---|---|
Assigned to TFA teacher | Assigned to comparison teacher | |
All students | 100.0 | 100.0 |
Has valid end-of-year score | 79.5 | 78.5 |
Stayed in study classrooms and with originally assigned type of teacher | 77.5 | 77.4 |
And has valid end-of-year score | 69.0 | 68.4 |
Stayed in study classrooms but switched to opposite type of teacher | 2.2 | 2.4 |
And has valid end-of-year score | 1.8 | 1.8 |
Transferred to nonstudy classroom in the same school | 7.9 | 7.7 |
And has valid end-of-year score | 4.4 | 3.6 |
Left study school | 12.5 | 12.5 |
And has valid end-of-year score | 4.2 | 4.6 |
Number of students in the randomization sample | 2,880 | 2,910 |
- Note. In accordance with NCES publication policy, sample sizes have been rounded to the nearest 10.
If assigned treatment status is random and attrition is unrelated to treatment status, treatment and control students in the final analysis sample should have similar average values of baseline characteristics. Table 2 shows that this is indeed the case. For 13 measures of students’ baseline achievement and demographic characteristics, none of the differences between treatment and control students are substantively meaningful or statistically significant at the 5 percent level. Taken together, the descriptive statistics for mobility rates, prevalence of nonmissing outcome data, and baseline covariate values strongly suggest that random assignment was properly implemented and attrition poses little threat to estimating the causal effects of TFA teachers. Later, in our analysis, we show that the maximal amount of selection bias that could have been introduced by attrition is not large enough to alter our main findings.
Treatment mean | Control mean | Difference | |
---|---|---|---|
Baseline achievement, expressed as z-score within the statewide distribution | |||
Baseline math score | −0.512 | −0.504 | −0.008 |
[0.870] | [0.853] | (0.013) | |
Baseline reading score | –0.514 | –0.510 | −0.005 |
[0.908] | [0.893] | (0.014) | |
Demographic group dummy variables | |||
Old for grade | 0.073 | 0.064 | 0.009* |
[0.261] | [0.245] | (0.005) | |
Grade is below modal grade in randomization block | 0.011 | 0.014 | −0.003 |
[0.113] | [0.117] | (0.001) | |
Grade is above modal grade in randomization block | 0.021 | 0.016 | 0.005 |
[0.143] | [0.127] | (0.003) | |
Retained in same grade between previous and current years | 0.022 | 0.024 | −0.002 |
[0.146] | [0.154] | (0.003) | |
Female | 0.486 | 0.500 | −0.015 |
[0.500] | [0.500] | (0.009) | |
Black, non-Hispanic | 0.621 | 0.625 | −0.004 |
[0.487] | [0.484] | (0.008) | |
Hispanic | 0.283 | 0.277 | 0.005 |
[0.452] | [0.448] | (0.008) | |
Non-black, non-Hispanic | 0.096 | 0.098 | −0.002 |
[0.300] | [0.297] | (0.006) | |
Receives free or reduced-price lunch | 0.899 | 0.905 | −0.007 |
[0.305] | [0.293] | (0.009) | |
English-language learner | 0.080 | 0.084 | −0.004 |
[0.276] | [0.277] | (0.006) | |
Has an individualized education program | 0.064 | 0.060 | 0.004 |
[0.245] | [0.237] | (0.005) | |
Number of students | 2,290 | 2,280 |
- Note. In the columns for treatment and control means, standard deviations are in brackets; in the column for the treatment-control difference, standard errors are in parentheses. Means are regression-adjusted for randomization block fixed effects. Treatment-control differences and standard errors are based on a regression of the specified variable on a treatment dummy and randomization block dummies, accounting for sample weights and clustering at the teacher level. In accordance with NCES publication policy, sample sizes have been rounded to the nearest 10. Statistical significance at the 1, 5, and 10 percent levels is denoted by ***, **, and *, respectively.
CHARACTERISTICS OF SCHOOLS, TEACHERS, AND STUDENTS IN THE SAMPLE
Characteristics of Schools in the Sample
Schools employing TFA teachers are considerably more disadvantaged than the typical secondary school nationwide. For instance, according to data from the U.S. Department of Education's Common Core of Data, both schools in the sample and secondary schools employing TFA teachers nationwide serve predominantly students from racial and ethnic minority groups—57 percent of students in both sets of schools are Black, and approximately 32 percent are Hispanic. Close to 80 percent of students at both types of schools are eligible for free or reduced-price lunch (compared with 51 percent at the typical secondary school nationwide).
The study schools are similar to secondary schools employing TFA teachers nationwide along many dimensions. The few differences between study schools and all TFA schools nationwide are likely due to study eligibility requirements. For instance, the average study school has significantly more students per grade than the average secondary school employing TFA teachers (240 vs. 184 students per grade), consistent with the fact that schools with more students per grade were more likely to have multiple classes per subject taught during the same period to form randomization blocks. Similarly, although 23 percent of secondary schools with TFA placements nationwide are charter schools, there are no charter schools in the study sample. Charter schools are typically smaller than average and therefore less likely to have eligible randomization blocks.
Characteristics of Teachers in the Sample
The study TFA teachers differ from the comparison teachers in many ways, indicating that the program does bring a different set of candidates into teaching in high-poverty schools (Table 3). For instance, relative to comparison teachers, TFA teachers are younger (average age of 25 vs. 38) and less likely to be members of racial or ethnic minorities (89 percent of TFA teachers are White and non-Hispanic, compared with only 30 percent of comparison teachers). TFA teachers are also considerably more likely to have graduated from a selective college or university (81 vs. 23 percent) and from a highly selective college or university (30 percent vs. less than 5 percent).11
Characteristic | Teach For America teachers | Comparison teachers | Difference |
---|---|---|---|
Demographic characteristics | |||
Age (average years) | 24.5 | 37.9 | −13.4*** |
(1.3) | |||
Female | 60.9 | 79.4 | −18.4** |
(8.0) | |||
Black, non-Hispanic | 7.8 | 57.1 | −49.3*** |
(7.1) | |||
Hispanic | 4.7 | 12.7 | −8.0 |
(5.0) | |||
White, non-Hispanic | 89.1 | 30.2 | 58.9*** |
(7.0) | |||
Educational background | |||
Bachelor's degree from selective college or university | 81.3 | 22.7 | 58.5*** |
(8.1) | |||
Bachelor's degree from highly selective college or university | 29.7 | <5.0a | a*** |
Majored in math | 7.8 | 25.6 | −17.8** |
(7.5) | |||
Majored in secondary math education | 0.0 | 16.3 | –16.3*** |
(5.7) | |||
Majored in other math-related subject | 26.6 | 11.6 | 14.9** |
(7.4) | |||
Average scores on Math Content Knowledge Test | |||
Praxis II Mathematics Content Knowledge Test | 162.0 | 140.1 | 21.9** |
(7.9) | |||
Praxis II Middle School Mathematics Test | 179.8 | 158.3 | 21.6*** |
(3.7) | |||
Teaching experience at end of study year | |||
1–2 years | 82.8 | 9.5 | 73.3*** |
(6.0) | |||
3–5 years | 17.2 | 20.6 | −3.4 |
(7.0) | |||
More than 5 years | 0.0 | 69.8 | −69.8*** |
(5.8) | |||
Average years | 1.9 | 10.1 | −8.3*** |
(0.9) | |||
Coursework during school year | |||
Took coursework during school year | 50.0 | 20.6 | 29.4*** |
(8.1) | |||
Average hours of coursework during school year | 89.4 | 49.9 | 39.5* |
(23.5) | |||
Number of teachers | 60 | 60 |
- Note. Standard errors in parentheses. In accordance with NCES publication policy, sample sizes have been rounded to the nearest 10. Statistical significance at the 1, 5, and 10 percent levels is denoted by ***, **, and *, respectively. Selective colleges are those ranked by Barron's Profiles of American Colleges 2003 as being very competitive, highly competitive, or most competitive; highly selective colleges are those ranked as highly competitive or most competitive. Other math-related subjects include statistics, engineering, computer science, finance, economics, physics, and astrophysics. We have scores on the Praxis II Mathematics Content Knowledge Test for 20 TFA teachers and 10 comparison teachers. We have scores on the Praxis II Middle School Mathematics Test for 50 TFA teachers and 40 comparison teachers.
- a Exact percentage and difference not reported to protect respondent confidentiality in accordance with National Center for Education Statistics statistical standards (National Center for Education Statistics, 2000).
The TFA teachers display greater math content knowledge but are less likely to have majored in math (Table 3). TFA teachers who took the Praxis II Mathematics Content Knowledge Test outperformed comparison teachers by 22 points (or 0.93 standard deviations), and those who took the Praxis II Middle School Mathematics Test outperformed comparison teachers by 22 points (or 1.19 standard deviations). Yet TFA teachers in the sample are less likely than comparison teachers to have majored in math (8 vs. 26 percent) or secondary math education (0 vs. 16 percent), but more likely to have majored in some other math-related subject (statistics, engineering, computer science, finance, economics, physics, or astrophysics; 27 vs. 12 percent).
Not surprisingly, given the fact that TFA asks its corps members to make only a two-year commitment to teaching, TFA teachers in the study have less teaching experience than comparison teachers (Table 3). As noted above, on average TFA teachers in our sample have an average of two years of experience compared with an average experience of 10 years among the non-TFA teachers. Eighty-three percent of the TFA teachers are in their first or second year of teaching, compared with 10 percent of comparison teachers. Seventy percent of the comparison teachers have been teaching more than five years, while none of the TFA teachers have been teaching this long. Consistent with the fact that they are more likely to be in their first or second year of teaching and thus likely still fulfilling coursework requirements for certification, TFA teachers are more likely than comparison teachers to have taken coursework during the study year (50 vs. 21 percent). Fifty-nine percent of comparison teachers are from traditional education programs, while 41 percent are from alternative certification programs.12
Characteristics of Students in the Sample
Consistent with TFA's goal of serving disadvantaged students, students in the study face multiple academic and socioeconomic disadvantages (Table 2). Students in the analysis sample had baseline achievement levels that are far below the average for their peers statewide: both treatment and control group students scored, on average, about half a standard deviation below the mean achievement in their states in both reading and math prior to the study period. Mirroring the demographic characteristics of their schools, students in the analysis sample are predominantly non-white and eligible for subsidized school meals.
ESTIMATION METHODS
Main Estimation Model
where is the end-of-year math test score of student i assigned to teacher j in randomization block k, is a randomization block fixed effect, is a dummy variable for being randomly assigned to a TFA teacher, and is a vector of student-level covariates. We use Huber-White standard errors that are robust to clustering at the teacher level.
We refer to the parameter of interest, β1, as the impact of TFA teachers relative to comparison teachers. This parameter is an intent-to-treat (ITT) effect, capturing the expected net difference in end-of-year math achievement from assigning a student to a TFA teacher rather than a comparison teacher at the beginning of the school year.
Although the covariates () are not necessary to ensure unbiased impact estimates within our experimental design, we include them into equation 1 to improve precision. The covariates include all variables shown in Table 2.13 Missing values of covariates are replaced with block-specific means; we also include a vector of dummy variables (one for each covariate) indicating that we replaced the missing value with the block-specific mean for the covariate.
To ensure unbiased estimates of β1, it is necessary to account explicitly for within-block differences among students in the probability of being assigned to the treatment group. As discussed previously, late enrollees to the study classrooms typically had different probabilities of assignment to the treatment group than early enrollees did. Without any correction, differences in assignment probabilities can lead to the overrepresentation of particular types of students in the treatment group relative to the control group. We eliminate this threat to causal validity by weighting students according to the inverse of their probability of assignment to the treatment group. Horvitz and Thompson (1952) show that this method recovers unbiased estimates. We scale the weights so that, within each combination of treatment status and block, the weights sum to one-half of the total number of students in the block. In our sensitivity analysis, we show that the presence of weights does not discernibly influence the estimated effect.
There is a well-known tendency for cluster-robust standard errors, which we use in estimating equation 1, to be biased toward zero, with bias diminishing as the number of clusters increases (MacKinnon & White, 1985). This problem is unlikely to be serious in our analysis due to the large number (140) of clusters. Nevertheless, to guard against the possibility of inflating Type I error rates, we take the conservative approach to inference suggested by Donald and Lang (2007) and Angrist and Pischke (2009). Specifically, our tests of statistical significance use a t-distribution with degrees of freedom equal to the number of teachers minus the number of covariates varying only at the teacher level—namely the treatment dummy and the randomization block dummies.
Alternative Parameters of Interest
Our ITT analysis attributes to each teacher the scores of all students assigned to his or her class at the beginning of the year. Because not all students stayed in their originally assigned classes—as documented by Table 1—the ITT impacts are not equivalent to the impacts of being taught by a TFA teacher for a full school year.14 This paper focuses primarily on the ITT estimates for two main reasons. First, the ITT impact reflects the potential for mobility to dilute the effects of a student's initially assigned teacher, so it can be considered the most relevant parameter to inform a school administrator's choice between hiring different types of teachers. Second, as we discuss below, we have only imperfect measures of the amount of time for which a student was actually taught by a specified teacher, whereas a student's initial assignment is known with certainty.
Despite our primary focus on the ITT impact, we also explore the estimation of an alternative parameter: the effect of a student's actual duration of being taught by a TFA math teacher on his or her math achievement. This parameter more faithfully captures the instructional ability of TFA teachers relative to comparison teachers, independent of student mobility.
The coefficient δ1 in equation 2 is a local average treatment effect (LATE), capturing the average effect of duration with a TFA math teacher on students’ math achievement within a particular population of students: those whose duration with a TFA teacher was affected by their randomly assigned status (Angrist & Imbens, 1995; Imbens & Angrist, 1994).15 These students, known as “compliers,” either experienced a longer duration with a TFA teacher by being assigned to the treatment group than they would have experienced if assigned to the control group, or experienced a shorter duration with a TFA teacher by being assigned to the control group than they would have experienced if assigned to the treatment group.
One limitation in estimating equation 2 is that we collected a snapshot of students’ enrollment in math classes at only three points during the school year: (i) in the fall, about two to four weeks after random assignment, (ii) at the beginning of the spring semester, and (iii) toward the end of the spring semester. Given these data, we define as the fraction of enrollment snapshots in which a student is observed to be taught by a TFA teacher; the variable can take on the values of 0, 1/3, 2/3, and 1. Therefore, the LATE coefficient, δ1, represents the expected difference in math achievement from being taught by a TFA teacher at all enrollment snapshots (loosely interpretable as a full school year) rather than at no enrollment snapshots.
To construct , it is important to account for missing enrollment information. For students who left the set of study classrooms before the end of the school year—either by transferring to a nonstudy classroom or leaving the school entirely—we do not know what types of teachers they had after their departure, even if we know their spring test scores. Twelve percent of students in the analysis sample are missing information from at least one snapshot. Nevertheless, we can make extreme assumptions about the teacher assignments that mobile students had after leaving the study classrooms, which imply upper and lower bounds for the degree to which students complied with their assigned treatment status throughout the school year. First, we assume that departing students in the treatment group were subsequently taught by a TFA teacher, and departing students in the control group were subsequently taught by a non-TFA teacher, leading to an upper bound for π1 in equation 3. Second, we assume that departing students in the treatment group were subsequently taught by a non-TFA teacher, and departing students in the control group were subsequently taught by a TFA teacher, leading to a lower bound for π1. Since , the upper and lower bounds for π1 lead, respectively, to lower and upper bounds for the LATE.
EXPERIMENTAL FINDINGS
Main Estimates
The ITT estimate in column 1 in Table 4 represents our main estimate for the impact of TFA teachers. On average, TFA teachers are more effective than comparison teachers teaching the same math courses in the same schools. Students assigned to TFA teachers score 0.07 standard deviations higher on end-of-year math assessments than students assigned to comparison teachers.
Assuming upper bound for compliance | Assuming lower bound for compliance | ||||
---|---|---|---|---|---|
End-of-year math score (intent-to-treat) | Fraction of snapshots enrolled with a TFA teacher (first stage) | End-of-year math score (LATE) | Fraction of snapshots enrolled with a TFA teacher (First Stage) | End-of-year math score (LATE) | |
(1) | (2) | (3) | (4) | (5) | |
Randomly assigned to TFA teacher (=1) | 0.07*** | 0.96*** | 0.80*** | ||
(0.02) | (0.01) | (0.01) | |||
Fraction of snapshots enrolled with a TFA teacher | 0.08*** | 0.09*** | |||
(0.02) | (0.02) | ||||
Control group mean | −0.60 | 0.02 | −0.59 | 0.10 | −0.57 |
First-stage F-statistic | 20,868.8 | 3,346.8 | |||
Number of blocks | 110 | 110 | 110 | 110 | 110 |
Number of teachers | 140 | 140 | 140 | 140 | 140 |
Number of students | 4,570 | 4,570 | 4,570 | 4,570 | 4,570 |
- Note. Standard errors clustered by teacher are in parentheses. All regressions control for randomization block dummies, the variables listed in Table 2, a set of dummy variables indicating the number of years that elapsed between the baseline and outcome test score, and a set of dummy variables (one for each main covariate) indicating that a missing value of a given covariate has been replaced by a placeholder constant. Control group means listed in the 2SLS columns are control complier means calculated from the approach specified in Imbens and Rubin (1997). In accordance with NCES publication policy, sample sizes have been rounded to the nearest 10. Significance at the 1, 5, and 10 percent level is denoted by ***, **, and *, respectively.
The magnitude of the difference in effectiveness between TFA and comparison teachers can be interpreted in several ways. First, the effect size can be expressed as a change in percentiles of achievement within the statewide or national reference populations that took the same math assessment. If assigned to a comparison teacher, the average student in the study would have had a z-score of –0.60, equivalent to the 27th percentile of achievement in the reference population based on a normal distribution for test scores. If assigned to a TFA teacher, this student would, instead, have had a z-score of –0.52—equivalent to the 30th percentile. Thus, the average student in the study gains 3 percentile points from being assigned to a TFA teacher rather than a comparison teacher.
Alternatively, the effect size can be compared with educationally relevant benchmarks. An illustrative benchmark is the average one-year gain in achievement exhibited by students on nationally normed assessments in grades 6 through 11, which Hill et al. (2008) calculates to be 0.27 standard deviations. On the basis of this benchmark, TFA teachers’ effect of 0.07 standard deviations on math scores amounts to 26 percent of an average year of learning by students nationwide, or 2.6 months of learning in a 10-month school year.
The remaining columns of Table 4 show estimates using different approaches to scaling up the ITT estimate into a LATE estimate. The different approaches correspond to different assumptions about which types of teachers taught mobile students after they left the study classrooms. Under assumptions that imply the maximal level of compliance with assigned treatment status, the first-stage coefficient is 0.96; that is, assignment to the treatment group, instead of the control group, increased by 96 percentage points the fraction of enrollment snapshots at which students were taught by a TFA teacher. With this high level of compliance, there is no material difference between the LATE estimate and the ITT estimate. The alternative assumptions that imply a lower bound for compliance yield a first-stage coefficient of 0.80 (column 4) and a resulting LATE of 0.09 standard deviations. That is, among compliers, being taught by TFA teachers at all enrollment snapshots raises math achievement by 0.09 standard deviations compared with being taught by comparison teachers at all snapshots. Whatever the assumptions regarding the types of teachers who taught leavers, compliance with assigned treatment status is high, leading to little discrepancy between the ITT and LATE. Therefore, for the remainder of this paper, we focus on the ITT estimates, given that compliance is high and those estimates do not rely on assumptions about the enrollment behavior of students who left the study classrooms during the school year.
Sensitivity Analyses
The key conclusion from the main estimates—that TFA teachers are more effective than comparison teachers—is robust to several changes in the specification of the estimation model or sample. In particular, it is robust to the exclusion of covariates, omission of analysis weights, exclusion of randomization blocks with large numbers of nonrandomly assigned students, exclusion of randomization blocks in which students took supplemental math courses, and estimation of upper and lower bounds on the treatment effects to account for nonrandom student attrition. The Appendix and Table A1 provide details on the rationale for and results of these sensitivity analyses.16
Effects within Teacher Subgroups
In a given hiring decision, a school administrator may be faced with specific choices between TFA and non-TFA teachers with particular characteristics. To shed light on these choices, we estimate the effects of TFA teachers within sets of randomization blocks in which the TFA and comparison teachers exhibit particular configurations of characteristics.
First, we consider the route through which the comparison teachers entered teaching. One criticism of alternative certification programs is that they provide insufficient preparation relative to traditional teacher preparation programs. To explore the validity of this criticism as it applies to TFA teachers, we estimate the effects of TFA teachers within randomization blocks in which TFA teachers are compared with teachers from traditional routes. We find no basis for this criticism; in fact, students of TFA teachers outperform those of traditionally certified teachers by 0.06 standard deviations (row 1 of Table 5). In a parallel analysis, we also find that students of TFA teachers outperform students of alternatively certified comparison teachers by 0.09 standard deviations (row 2 of Table 5).
Sample sizes | ||||
---|---|---|---|---|
Type of comparison | Estimated effect of TFA teachers | Blocks | Teachers | Students |
(1) TFA teachers versus comparison teachers from traditional routes | 0.06** | 60 | 80 | 2,480 |
(0.03) | ||||
(2) TFA teachers versus comparison teachers from less selective alternative routes | 0.09*** | 50 | 60 | 2,100 |
(0.02) | ||||
(3) TFA teachers in their first 2 years of teaching versus comparison teachers with more than 5 years of experience | 0.07** | 70 | 90 | 2,820 |
(0.03) | ||||
(3a) TFA teachers in their first year of teaching versus comparison teachers with more than 5 years of experience | 0.01 | 30 | 50 | 1,430 |
(0.04) | ||||
(3b) TFA teachers in their second year of teaching versus comparison teachers with more than 5 years of experience | 0.13*** | 30 | 40 | 1,380 |
(0.03) | ||||
(4) TFA teachers in their first 2 years of teaching versus comparison teachers with no more than 5 years of experience | 0.07*** | 20 | 30 | 820 |
(0.02) | ||||
(5) TFA teachers versus comparison teachers within middle school grades | 0.06*** | 80 | 100 | 3,370 |
(0.02) | ||||
(6) TFA teachers versus comparison teachers within high school grades | 0.13*** | 30 | 30 | 1,200 |
(0.03) |
- Note. Standard errors clustered by teacher are in parentheses. Each row of the table represents a different regression. Estimated effects are intent-to-treat effects. In accordance with NCES publication policy, sample sizes have been rounded to the nearest 10. Significance at the 1, 5, and 10 percent level is denoted by ***, **, and *, respectively.
Another common criticism of TFA is that too many TFA teachers leave teaching before they accumulate the experience needed to be as effective as their counterparts from other routes (Heilig & Jez, 2010). We therefore consider a comparison that, based on the logic of this claim, ought to be most unfavorable to finding a positive effect of TFA teachers: inexperienced TFA teachers compared with experienced comparison teachers. From the perspective of a school administrator deciding how to fill a teaching position over the long run, this comparison mimics a worst case for hiring TFA teachers—one in which TFA teachers always leave at the end of two years and must be replaced by another inexperienced TFA teacher—versus the best case for hiring non-TFA teachers, in which non-TFA teachers will stay and become experienced.
We specify this analysis by identifying randomization blocks in which inexperienced TFA teachers, defined as those in their first two years of teaching, are compared with experienced non-TFA teachers, defined as those with more than five years of teaching experience. Estimates from these randomization blocks indicate that inexperienced TFA teachers raise student math achievement relative to experienced comparison teachers, with an estimated effect similar to that in the full sample (row 3 of Table 5). More disaggregated analyses reveal that this impact is driven by second-year TFA teachers. Whereas first-year TFA teachers are about as effective as experienced non-TFA teachers (row 3a of Table 5), second-year TFA teachers raise math achievement by a substantial increment—0.13 standard deviations—relative to experienced non-TFA teachers (row 3b of Table 5).17 Schools that decide to fill vacancies repeatedly with TFA teachers will end up with a mix of first- and second-year teachers in steady state, as in the study sample. Therefore, on net, these findings imply that high-poverty secondary schools should expect math achievement that is no lower, and likely higher, as a result of hiring TFA teachers rather than non-TFA teachers for a given position, even if frequent turnover among TFA teachers would necessitate repeatedly filling the position with an inexperienced TFA teacher.
An alternative scenario is one in which non-TFA teachers also have considerable turnover, leading them to have somewhat low experience levels in steady state. We mimic this scenario by comparing inexperienced TFA teachers with somewhat inexperienced non-TFA teachers—those in their first five years of teaching. Findings are similar to those in the full sample (row 4 of Table 5).
We also estimate effects separately within middle school grades (grades 6 to 8) and high school grades (grades 9 to 12), given the distinct contexts in the two grade spans. In particular, high school courses covered more advanced math, for which effective teaching might require different knowledge and skills than the teaching of less advanced math. Moreover, the assessments taken by middle school students in the study were high-stakes, in that they served as inputs into school accountability measures, whereas the study-administered assessments taken by high school students were low-stakes. Despite these differences, our basic conclusion holds in both grade spans: TFA teachers have positive impacts relative to comparison teachers in both middle schools and high schools. Students of TFA teachers outscore those of comparison teachers by 0.06 standard deviations in middle schools (row 5 of Table 5) and 0.13 standard deviations in high schools (row 6 of Table 5).
Placing our estimates in the context of prior literature requires considering the role of experience. As discussed earlier, a key feature of this study's design is to compare TFA and non-TFA teachers without controlling for experience, which realistically reflects a long-run choice between hiring TFA teachers who will turn over every few years versus non-TFA teachers who are expected to remain longer. Of all prior studies of TFA teachers’ impacts on secondary math achievement, we are aware of only one estimate that does not control for experience, permitting a direct comparison with our estimates. Xu, Hannaway, and Taylor (2011) find that TFA math teachers in North Carolina high schools are more effective than non-TFA colleagues by a statistically insignificant increment of 0.05 standard deviations, less than the significant impact of 0.13 that we find for high school TFA teachers. One possible reason for the smaller impact found by Xu et al. is that the TFA and non-TFA teachers in their sample may have had a smaller contrast in certain background characteristics than in our sample. For example, the between-group difference in graduation from a selective college was 43 percentage points in their sample, compared to 58 percentage points in our sample, and unobserved characteristics could have followed a similar pattern. Another possibility is that our study's experimental design could have eliminated hidden sources of bias in the nonexperimental design used by Xu et al., such as scenarios where students with unmeasured difficulties in learning math were assigned to the classrooms of TFA teachers.
All other prior estimates of TFA teachers’ impacts on secondary math achievement are based on nonexperimental designs that control for experience. Although not directly comparable to those estimates, our experimental findings are nevertheless within their range. In middle schools, our estimate of a 0.06 standard deviation impact is slightly larger than that found by Boyd et al. (2006)—0.05 effect size for first-year TFA teachers, and no impact for second-year TFA teachers; Kane, Rockoff, and Staiger (2008)—0.03 effect size; and Boyd et al. (2012)—0.04 effect size; and smaller than that found by Henry et al. (2014)—0.14 effect size. In high schools, our estimate of a 0.13 standard deviation impact is slightly smaller than that found by Henry et al. (2014)—0.19 effect size—and similar to that found by Xu, Hannaway, and Taylor (2011) when controlling for experience—0.13 effect size. On average, estimated impacts of TFA math teachers in prior literature are larger in high schools than in middle schools, consistent with our findings.
ACCOUNTING FOR THE EFFECT OF TFA TEACHERS
To what extent could the increased effectiveness of TFA teachers have been predicted solely based on differences in credentials of TFA and other teachers in the same schools? To address this question, we examine whether estimated differences in effectiveness between distinct groups of teachers can be statistically explained by differences in their credentials or other easily observed aspects of their education and training. Then we assess the degree to which TFA teachers differ from non-TFA teachers in the prevalence of characteristics that are correlated with effectiveness. This analysis could also inform the debate about whether the quality of teachers can be improved by toughening the credentials required for teaching.
We consider a set of characteristics that could be readily observable on a teacher's resume at the time that a school administrator is making a hiring decision. These characteristics belong to four broad categories: (i) teachers’ general academic ability, based on the selectivity ranking of their undergraduate institution; (ii) teachers’ exposure to and knowledge of mathematics, based on the quantity of completed math coursework, prior use of math in a nonteaching job, and scores on the Praxis II tests of math knowledge; (iii) teachers’ instructional training, including the extent of prior math pedagogy coursework, student teaching, and ongoing coursework during the school year; and (iv) length of teaching experience.
Table A2 lists the specific variables in the analysis along with their sample means and standard deviations within the student-level analysis sample.18 All of these variables are based on self-reports from the teacher survey or on Praxis II scores collected by the study team. Given abundant evidence that the gains to experience decline with total experience (Hanushek et al., 2005; Rivkin, Hanushek, & Kain, 2005), we capture teaching experience with a three-piece linear spline that allows for different marginal gains to experience in three different ranges of total experience—one to two years, three to five years, and more than five years.19
In order for any of those characteristics to account for a positive portion of the difference in effectiveness between TFA and comparison teachers, two conditions are necessary. First, the characteristic must be associated with teacher effectiveness. Second, relative to non-TFA teachers, TFA teachers must show a greater extent of the characteristic if it is positively related to effectiveness, or a lesser extent if it is negatively related to effectiveness. We assess each of these two conditions in turn.
where is the full vector of teacher characteristics and all other variables are defined as in equation 1.20 Although student math scores are the dependent variable, differences in student achievement across classrooms within randomization blocks are unbiased estimates of differences in teacher effectiveness due to random assignment. Therefore, because equation 4 includes block fixed effects, γ3 can be properly interpreted as the association between teacher characteristics and teacher effectiveness.
Consistent with prior literature that use larger and more general samples, few teacher characteristics readily observable on a resume are predictive of teacher effectiveness (Table 6). Of the 11 variables measuring teacher characteristics, only two have a statistically significant association with effectiveness. First, second-year teachers are more effective than first-year teachers. Students assigned to second-year teachers are predicted to score 0.14 standard deviations higher than those assigned to first-year teachers, consistent with previous evidence that the largest gain in effectiveness from experience occurs between the first and second years of teaching (Boyd et al., 2006; Hanushek et al., 2005; Kane, Rockoff, & Staiger, 2008).
Independent variable | Dependent variable: Student's end-of-year math score (z-score) |
---|---|
Graduated from selective college or university (=1) | 0.028 |
(0.038) | |
Number of college-level math courses taken is above sample median (=1) | −0.019 |
(0.033) | |
Used college-level math in nonteaching job (=1) | −0.054 |
(0.044) | |
Score on Praxis II Test in Math Content Knowledge (z-score) | 0.018 |
(0.037) | |
Score on Praxis II Test in Middle School Math (z-score) | 0.023 |
(0.018) | |
Number of hours of math pedagogy instruction during training is above sample median (=1) | −0.025 |
(0.034) | |
Number of days of student teaching in math during training is above sample median (=1) | −0.009 |
(0.033) | |
Hours of education-related coursework during the school year (divided by 10) | −0.003*** |
(0.001) | |
Has more than one year of teaching experience (=1) | 0.142*** |
(0.041) | |
Number of additional years of teaching experience beyond two total years (until teacher has five total years of experience) | −0.030 |
(0.021) | |
Number of additional years of teaching experience beyond five total years | −0.001 |
(0.004) | |
Number of blocks | 110 |
Number of teachers | 140 |
Number of students | 4,570 |
- Note. Standard errors clustered by teacher are in parentheses. Estimates come from a single regression that also controls for a treatment dummy, randomization block dummies, the variables listed in Table 2, a set of dummy variables indicating the number of years that elapsed between the baseline and outcome test score, and a set of dummy variables (one for each student-level covariate) indicating that a missing value of a given student-level covariate has been replaced by a placeholder constant. Missing values of the teacher-level variables shown in the table are accounted for with multiple imputation. In accordance with NCES publication policy, sample sizes have been rounded to the nearest 10. Significance at the 1, 5, and 10 percent level is denoted by ***, **, and *, respectively.
Second, teachers’ effectiveness is negatively associated with the amount of job-related coursework (for certification or advanced degrees) that they take during the school year. A teacher who takes 180 hours of coursework during the year—the average for teachers in the study who take any coursework at all—is predicted to lower student math achievement by 0.05 standard deviations relative to a teacher who takes no concurrent coursework. Although we cannot directly examine why coursework was negatively related to teacher effectiveness, the findings are consistent with the hypothesis that coursework taken during the school year diverts teachers’ energy and attention from the classes they are teaching.21 We are unable to examine the effects of previous years’ coursework on current teaching effectiveness, as we only collected data on coursework completed during the current school year.
Next, we assess the direction and extent to which TFA teachers differ from comparison teachers on each of the two characteristics found to be associated with effectiveness. We estimate variants of equation 1 in which each of the two teacher characteristics, rather than student test scores, serve as the dependent variable, producing estimates of the within-block difference in the given characteristic between TFA and comparison teachers.22 The first column of Table 7 shows estimates of the between-group differences. Consistent with the descriptive statistics discussed earlier, TFA teachers exhibit a significantly lower likelihood of having acquired a second year of teaching experience and are taking, on average, more education-related coursework during the school year (albeit not by a statistically significant margin).
(1) | (2) | (3) | |
---|---|---|---|
Teacher characteristic | Difference in characteristic between TFA and comparison teachers | Association between characteristic and teacher effectiveness (student z-score units) | Predicted difference in effectiveness between TFA teachers and comparison teachers (student z-score units) |
Hours of education-related coursework during the school year (divided by 10) | 2.64 | –0.003*** | –0.01 |
(1.86) | (0.001) | ||
Has more than one year of teaching experience (=1) | –0.33*** | 0.142*** | –0.05 |
(0.05) | (0.041) |
- Note. In columns 1 and 2, standard errors clustered by teacher are in parentheses. In column 1, differences between TFA and comparison teachers are estimated from a student-level regression in which the indicated teacher characteristic is regressed on a treatment dummy, randomization block dummies, the variables listed in Table 2, a set of dummy variables indicating the number of years that elapsed between the baseline and outcome test score, and a set of dummy variables (one for each covariate) indicating that a missing value of a given covariate has been replaced by a placeholder constant. Significance at the 1, 5, and 10 percent level is denoted by ***, **, and *, respectively.
The final column of Table 7 shows the predicted difference in effectiveness between TFA and comparison teachers—still expressed in student-level standard deviations—based solely on each of the two characteristics found to be related to effectiveness. Each predicted difference represents the impact that would be expected based on the given characteristics. The predicted difference is equal to the product of the between-group difference in the characteristic (column 1) and the characteristic's association with effectiveness (redisplayed in column 2).
The negative values in the final column of Table 7 indicate that the observed credentials do not explain why TFA teachers are more effective than comparison teachers. Although teachers who acquire a second year of teaching experience are generally more effective than those who have not yet done so, TFA teachers are less likely than comparison teachers to have acquired a second year of teaching experience. Similarly, although the amount of concurrent coursework that teachers take is negatively related to effectiveness, TFA teachers take more concurrent coursework than comparison teachers (although the difference is not statistically significant). Based on these two characteristics alone, we would have predicted that TFA teachers are less effective than their counterparts from other routes into teaching—when, in fact, they are more effective.
One reason why credentials do not explain TFA teachers’ impact is that the credentials on which TFA teachers have an advantage relative to non-TFA teachers—in particular, college selectivity and math content knowledge—are unrelated to effectiveness within our sample. The impact would continue to be largely unexplained even if we extracted estimates of the relationships between credentials and teacher effectiveness from previous literature instead of our own sample. Prior work has generally found either no association or, at best, a very small association between college selectivity and teacher effectiveness (Aaronson, Barrow, & Sander, 2007; Boyd et al., 2008; Clotfelter, Ladd, & Vigdor, 2006, 2010; Rockoff et al., 2011). More studies have found evidence that teachers make greater contributions to student math achievement if they have greater math content knowledge, as measured by math SAT scores (Boyd et al., 2008), the Praxis II subject exams (Clotfelter, Ladd, & Vigdor, 2010), or a measure of mathematical knowledge for teaching (Rockoff et al., 2011; measure based on Hill, Rowan, & Ball, 2005). The most optimistic estimate from these studies, taken from Clotfelter, Ladd, and Vigdor (2010), is that a one standard deviation increase in teachers’ math knowledge is associated with an increase in student math achievement of about 0.05 standard deviations. Given our previous finding that TFA teachers’ math knowledge is about one standard deviation above that of non-TFA teachers, the gain to TFA teachers’ effectiveness from their math knowledge (+0.05 standard deviations) would only just offset the decrement to effectiveness (–0.05 standard deviations, shown in Table 7) stemming from the TFA teachers’ relative inexperience. That is, the relationship between math knowledge and teacher effectiveness is not large enough to explain why TFA teachers are substantially more effective than non-TFA teachers.
CONCLUSIONS
Our study provides experimental evidence from multiple school districts that TFA's distinctive model of recruiting, selecting, training, and supporting its teachers is capable of raising both the quantity and quality of teachers in a hard-to-staff subject area within high-poverty schools. Across a broad sample of schools from multiple states, we find that TFA math teachers in middle and high schools with highly disadvantaged students are more effective than the other math teachers teaching the same courses in the same schools. While the difference in effectiveness is not large enough to bring end-of-year scores on math assessments of disadvantaged students to the mean for the wider population, the difference in effectiveness is meaningful. Our estimate is that the difference is equivalent to about 2.6 months of math instruction. Our finding about the effectiveness of TFA teachers is robust to different estimation specifications.
It is particularly striking that TFA teachers in their first two years of teaching are more effective than even the more experienced non-TFA teachers who are teaching in the same schools. Even TFA teachers in their first year of teaching are as effective as their more experienced counterparts. And by their second year of teaching, they are more effective on average than their experienced counterparts.
The results from this study indicate that the TFA program model for teacher recruitment and training is a viable approach to improving student achievement in disadvantaged schools. From the perspective of a principal of a high-poverty secondary school, hiring a TFA teacher would lead, on average, to higher student math achievement than hiring the typical non-TFA teacher willing to work in that type of school. Although TFA teachers currently represent only a small fraction of the teacher workforce, they can constitute a sizable fraction of an individual school's teaching staff, especially in districts that purposely cluster TFA teachers into specific schools (Hansen et al., 2015). Therefore, this type of program model may be an appealing source of teachers for turning around low-performing schools. It remains an open question whether this type of program model can be scaled up to have more systemic effects on the broader population of disadvantaged schools. Initial evidence from a recent TFA scale-up initiative indicates that elementary school teachers hired as part of the scale-up performed no better or worse than their non-TFA colleagues (Clark et al., 2015), but more evidence is needed over longer periods of time and across more grades and subjects.
Understanding why TFA teachers are more effective is of even greater policy interest as it may suggest more effective approaches to recruiting, screening, training, and supporting teachers in general. Our experimental study is unable to separate out the effect of TFA's recruiting and screening approach from the effect of its training and support. While we find TFA teachers have many different characteristics from non-TFA teachers in the same school, the reasons why TFA teachers are more effective than non-TFA teachers do not appear to lie in any of their credentials that would have been easily observed on a resume. Few credentials are associated with effectiveness at all, and for those that are—experience and not taking coursework while teaching—TFA teachers are at a distinct disadvantage relative to non-TFA teachers. There is some evidence from Dobbie (2011) that TFA's in-depth screening procedures and measures of candidate aptitude may successfully predict a teacher's effectiveness in ways that more readily observable credentials cannot. However, our own research cannot discern the extent to which TFA teachers’ effectiveness is due to TFA's recruitment strategies, its approaches to identifying and selecting candidates based on criteria that go beyond easily observed credentials, and its methods for training and supporting its teachers. Further research into the sources of TFA's effectiveness can help inform efforts to improve teacher preparation programs more generally.
ACKNOWLEDGMENTS
This paper is based on a study sponsored by the Institute of Education Sciences (IES) to examine the effectiveness of secondary math teachers from Teach For America and the Teaching Fellows programs (Clark et al., 2013). The study plans were submitted for review to the Public/Private Ventures Institutional Review Board (IRB) and deemed exempt from IRB review. We are grateful to IES project officers Elizabeth Warner and Stefanie Schmidt for their support throughout this study. Tim Silva, Kathy Sonnenfeld, Eric Zeidman, Nancy Duda, Mary Grider, Michael Puma, Alison Wellington, Daniel Player, and Philip Gleason made important contributions to the study. We also thank the study's technical working group, Julie Greenberg, Paul Holland, Tim Sass, Jeff Smith, Suzanne Wilson, and Jim Wyckoff, for valuable input on the study design and analysis. Finally, we thank the many school staff members who allowed us to implement the study in their schools. The authors are responsible for all errors.
APPENDIX
Sensitivity Analyses
The key conclusion from the main estimates—that TFA teachers are more effective than comparison teachers—is robust to several changes in the specification of the estimation model or sample (Table A1). First, we explore two simple changes to our estimation approach: excluding all covariates except randomization block dummies (row 1 of Table A1) and omitting the analysis weights that account for treatment assignment probabilities (row 2 of Table A1). The estimated effects do not discernibly change with these modifications to the estimation approach.
Sample sizes | ||||
---|---|---|---|---|
Model | Estimated effect of TFA teachers | Blocks | Teachers | Students |
Main model | 0.07*** | 110 | 140 | 4,570 |
(0.02) | ||||
Alternative estimation approaches | ||||
(1) No covariates except randomization block dummies | 0.07*** | 110 | 140 | 4,570 |
(0.02) | ||||
(2) No analysis weights | 0.07*** | 110 | 140 | 4,570 |
(0.02) | ||||
Dropping randomization blocks | ||||
(3) Drop blocks in which percentage of students assigned nonrandomly in first month >10 percent or percentage on final enrollment snapshot who had entered nonrandomly >25 percent | 0.06** | 80 | 110 | 3,430 |
(0.02) | ||||
(4) Drop blocks with supplemental math classes | 0.11*** | 60 | 70 | 2,460 |
(0.03) | ||||
Accounting for selection bias from attrition | ||||
(5) Drop blocks in which more than 10 percent of students have missing outcome data | 0.12*** | 40 | 50 | 1,590 |
(0.04) | ||||
(6) Lower bound for effect, based on Lee (2009) | 0.05*** | 110 | 140 | 4,550 |
(0.02) | ||||
(7) Upper bound for effect, based on Lee (2009) | 0.11*** | 110 | 140 | 4,550 |
(0.02) |
- Note. Standard errors clustered by teacher are in parentheses. Each row of the table represents a different regression. Estimated effects are intent-to-treat effects. In accordance with NCES publication policy, sample sizes have been rounded to the nearest 10. Significance at the 1, 5, and 10 percent level is denoted by ***, **, and *, respectively.
Next, we consider the threat to internal validity posed by the presence of students in the study classrooms who were not randomly assigned. Students in the study classrooms who had not been randomly assigned were excluded from the analysis, but their presence could, in theory, have affected the achievement of randomly assigned students via peer effects. We were largely successful in minimizing nonrandom placements into the study classrooms during the random assignment period, which lasted through the first month of school. Of students who enrolled in the study classes during this time, only 2 percent were nonrandomly placed into those classes, usually as a result of schools’ requests to exempt students from random assignment or the schools’ failure to request assignments for late-enrolling students. However, after the first month of school, schools were free to place newly enrolling students without random assignment. At the final enrollment snapshot, 20 percent of the students enrolled in the study classes had not been randomly assigned to those classes, with nearly identical percentages for treatment and control classes.
Although treatment and control classes had similar proportions of nonrandomly placed students, there is still the possibility of unobserved differences in the types of students who were nonrandomly placed into those classes, which could threaten the internal validity of estimated effects on the randomization sample. Therefore, we conduct a robustness check to drop randomization blocks with the greatest potential for this threat based on high rates of nonrandom placement. Specifically, we drop blocks in which students who entered the study classes through a method other than random assignment constituted more than 10 percent of students enrolling before the end of the first month of school or more than 25 percent of students on the final enrollment snapshot. This criterion results in the exclusion of 30 percent of the blocks in the sample. Nevertheless, in the remaining blocks, the estimated effect of TFA teachers, 0.06 standard deviations, is similar to the full-sample estimate (row 3 of Table A1).
Another complicating factor in the study design is the presence of supplemental math classes—separate from the regular math classes included in this study—that some schools offered to reinforce material taught in the regular classes. We did not include in the study any schools that made decisions on which students to assign to supplemental math classes after the start of the school year, because assignments to supplemental classes could then be an endogenous response to compensate for poor teaching by students’ regular math teachers. However, 48 percent of the randomization blocks in the study included at least some students who were assigned by their schools to supplemental math classes before we conducted random assignment to the main classes. In most cases, we did not require students to take supplemental classes with the same type of teacher (TFA or non-TFA) as the teacher who taught their main classes, so the presence of supplemental classes generally diluted the treatment-control contrast in the types of math teachers to whom students were exposed. Therefore, as a robustness check, we remove all randomization blocks with supplemental math instruction. The estimated effect of TFA teachers rises to 0.11 standard deviations (row 4 of Table A1).
Our final set of sensitivity analyses assesses the degree to which attrition may have introduced selection bias into the main estimate by restricting the estimation sample to blocks in which fewer than 10 percent of students have missing outcome data. The estimated effect of TFA teachers continues to be positive and statistically significant (row 5 of Table A1).
A more formal approach to handling attrition, developed by Lee (2009), is to estimate lower and upper bounds for the estimated effect that account for the maximal possible amount of selection bias. Recall, from Table 1, that outcome data were obtained for a slightly higher percentage of the randomization sample in the treatment group (79.5 percent) than in the control group (78.5 percent). Thus, the analysis sample in the treatment group may have a slightly different mix of students than the analysis sample in the control group. Following the monotonicity assumption in Lee (2009), we assume that any student who would have outcome data if assigned to the control group would also have outcome data if assigned to the treatment group. This implies that the treatment analysis sample includes all students who would have been in the analysis sample if assigned to the control group, plus an “excess” set of individuals who have outcome data because they were assigned to the treatment group. These excess individuals constitute 1.2 percent [= (79.5 – 78.5)/79.5] of the treatment analysis sample.
Removing the excess individuals from the treatment analysis sample would restore treatment-control equivalence in the mix of individuals examined, but it is not possible to identify the excess individuals. However, by making extreme assumptions for the rankings of the excess individuals’ outcomes within the treatment analysis sample, we estimate bounds for the effects of TFA teachers. Specifically, we employ a two-step method. First, we estimate equation 1 on the full analysis sample and obtain the residuals. We then use two alternative ways to trim the treatment analysis sample: removing either students whose residuals are in the top 1.2 percent or those whose residuals are in the bottom 1.2 percent of the treatment analysis sample. Second, we re-estimate equation 1 on the two trimmed samples, obtaining lower and upper bounds for TFA teachers’ average effect.23
Bounds for the estimated effect of TFA teachers yield the same qualitative conclusion as the main estimate: TFA teachers are more effective than comparison teachers even after accounting for attrition-induced bias. At worst, TFA teachers raise secondary students’ math achievement by an average of 0.05 standard deviations relative to comparison teachers (row 6 of Table A1); at best, TFA teachers raise achievement by an average of 0.11 standard deviations (row 7 of Table A1).24 Table A2 lists the specific variables in the analysis along with their sample means and standard deviations within the student-level analysis sample.
Teacher characteristic | Mean | SD |
---|---|---|
Measure of general academic ability | ||
Graduated from selective college or university (=1)a | 0.56 | 0.50 |
Measures of exposure to and knowledge of math | ||
Number of college-level math courses taken is above sample median (=1)b | 0.42 | 0.49 |
Used college-level math in nonteaching job (=1) | 0.23 | 0.42 |
Score on Praxis II Test in Math Content Knowledge (z-score) | 0.00 | 1.00 |
Score on Praxis II Test in Middle School Math (z-score) | 0.00 | 1.00 |
Measures of instructional training | ||
Number of hours of math pedagogy instruction during training is above sample median (=1)c | 0.41 | 0.49 |
Number of days of student teaching in math during training is above sample median (=1)d | 0.32 | 0.47 |
Hours of education-related coursework during the school year (divided by 10) | 6.35 | 12.67 |
Measures of teaching experience | ||
Has more than one year of teaching experience (=1) | 0.79 | 0.41 |
Number of additional years of teaching experience beyond two total years (until teacher has five total years of experience) | 1.40 | 1.37 |
Number of additional years of teaching experience beyond five total years | 3.63 | 6.66 |
Number of blocks | 110 | |
Number of teachers | 140 | |
Number of students | 4,570 |
- Note. Summary statistics are calculated from the student-level analysis sample. In accordance with NCES publication policy, sample sizes have been rounded to the nearest 10.
- a Selective colleges are those ranked by Barron's Profiles of American Colleges as very competitive, highly competitive, or most competitive.
- b Teacher at the median took five college-level math courses.
- c Teacher at the median had 21 to 40 hours of math pedagogy instruction.
- d Teacher at the median had 16 to 20 days of student teaching.
Biographies
HANLEY S. CHIANG is a Senior Researcher at Mathematica Policy Research, 955 Massachusetts Avenue, Suite 801, Cambridge, MA 02139 (e-mail: [email protected]).
MELISSA A. CLARK is a Senior Researcher at Mathematica Policy Research, P.O. Box 2393, Princeton, NJ 08543 (e-mail: [email protected]).
SHEENA MCCONNELL is a Vice President of Human Services Research at Mathematica Policy Research, 1100 1st Street, NE, 12th Floor, Washington, DC 20002 (e-mail: [email protected]).