Now that the hue-and-cry over the government’s proposed changes to the way schools are funded has subsided, it may be a good time to undertake a sober reflection of the issues because they are bound to re-surface in the future.
First, common sense suggests that increasing class sizes is detrimental to student performance. More students per class mean more harried teachers and therefore less attention per student. The causality here is clear.
But, teacher quality certainly plays a crucial role as well. So what the government is essentially proposing is a trade-off: larger classes (bad) but better teachers (good).
The important question, however, is does the additional value created by good teachers counter-act the bad consequences of larger class sizes? At this point the answer is not clear.
It is most likely that in supporting this change in policy the government is drawing inspiration from a recently released study undertaken by Raj Chetty and John Friedman of Harvard and Jonah Rockoff of Columbia. In this large and comprehensive study, which has been cited approvingly by many, including Barack Obama, the authors claim that students with high value added (HVA) teachers who raise their standardized test scores are “more likely to attend college, earn higher salaries, live in better neighborhoods and save more for retirement.”
But how reliable is this work? As Gary Gutting, professor of philosophy at Notre Dame, writing for the New York Times asks: does this compare with the work by biochemists on the effects of light on plant growth? No one, for example, questions the validity of the physics on which our space programs are based. But even the best-developed social sciences like economics have nothing like this status. Since humans are much more complex than plants and biochemists have far more refined techniques for studying plants, we may well expect the biochemical work to be of greater validity.
Furthermore when it comes to generating reliable scientific knowledge the most important issue is the ability to predict future events. And the only way we can make such informed predictions about teacher impact would be to run randomized controlled experiments. Suppose that you could randomly assign some students to a teacher rated HVA, and other students to a teacher rated low value-added (LVA). If the students are identical on average, you could compare the test scores of each group. If the kids with the HVA teacher do better, then we can draw reasonable conclusions about the teachers’ value added
But such controlled experiments are nearly impossible for obvious reasons. So instead Chetty and his colleagues look at data from grades three to eight for 2.5 million children in one of the largest school districts in the USA over a 20-year period (1989-2009). They then used other public records to track students after high school. They used a massive data set of nearly 20 million observations to find situations that were virtually identical to the controlled experiment that I have described above.
But here is the next problem. Chetty and his colleagues do all of this while holding class size constant. What happens when the class size changes? Now the controlled experiment required becomes more complex. For one thing now you would have to measure the impact of HVA and LVA teachers separately for large classes and small classes and then measure if the higher value added by HVA teachers counter-act the drop in performance in larger classes.
And underlining the difficulties of making meaningful predictions the authors point out that while the impact of HVA teachers may be significant, there are a whole host of other factors that affect performance including relationships with parents and peers.
Furthermore, how much faith can or should we have in the government’s ability to identify good teachers or a set of best practices in the classroom? Teaching is a multi-faceted and complex activity that defies easy quantification. If we measure teacher performance by improvements in students’ test scores as the Chetty et al study proposes, then does that not leave us open to the possibility of teachers teaching to the test? Researchers at the University of Chicago have also shown that when student test scores are the only metric, then teachers are not immune to cheating by changing student answers on standardized tests ex post.
In New Zealand we already have experience with the government’s attempts to measure the quality of university staff. This is called the Performance Based Research Funding Exercise. One would be hard pressed to find a single academic in the whole country who thinks that the PBRF actually does a good job of measuring quality; because quality is an extremely elusive concept and because university staff engage in a diverse array of activities including research, teaching and service. PBRF is a cumbersome and costly (both in terms of money and time) process that has significantly increased the administrative burden at universities. The pool of money available for division remains unchanged, except that the universities are now spending increasingly more resources in chasing that constant sum.
Given the inherent constraints in social science research and the difficulties in making meaningful predictions, it seems impulsive to use the findings of such research to usher in sweeping changes in existing policy. Certainly we should expect world class research to inform our policy decisions but they cannot really be a substitute for practical experience, empathy and common sense.