The Democratic Platform: Tying Testing to Teacher Evaluations

Despite how the Democratic education platform reads, some researchers say there’s no consensus against using test scores in teacher evaluations

by Matt Barnum
The 74
July 2016

The Democratic platform states, “We oppose … the use of student test scores in teacher and principal evaluations, a practice which has been repeatedly rejected by researchers.”

It’s not often that educational research is mentioned in a major party platform. But several researchers who study teacher evaluation say the suggestion that there is a scholarly consensus against using test scores in teacher evaluation is misleading.

The 74 contacted a number of researchers who have studied teacher evaluation or value-added measures, a common method for assessing teacher impact on student test-score growth.

“There are many ways in which the use of test scores to inform teacher evaluation and school accountability can and should be improved. But the wholesale rejection of using test scores to inform teacher evaluations is an unproductive reaction to the limitations of test-score-based evaluation metrics,” said Matthew Kraft of Brown. “A balanced reading of the literature suggests there is mixed evidence for and against using test-score-based evaluation metrics.”

Kirabo Jackson of Northwestern said he disagreed with the platform’s language and that “test scores measures are valid, albeit imperfect, measures of teacher impacts on student skills.”

“VAMs, for the teachers for whom they can be created, do provide a piece of information about teachers’ abilities to improve student test scores,” said Katharine Strunk of the University of Southern California. “I think the research suggests that we need multiple measures — test scores, observations, and others – to rigorously and fairly evaluate teachers.”

Matthew Steinberg of the University of Pennsylvania said, “My view is that there is not in fact a consensus among academic researchers, particularly economists, who do this work, that value-added scores should not be used in high stakes teacher evaluation systems.”

Jim Wyckoff (University of Virginia), Cory Koedel (University of Missouri), and Dan Goldhaber (University of Washington Bothell) all also agreed research did not support categorically rejecting test-based teacher evaluation.

Several of the researchers said that measures of test score growth had significant limitations, but also provided meaningful information about a teacher’s impact on long-run outcomes; moreover, other ways to evaluate educators, particularly classroom observations, have some of same flaws as value-added. Some studies have found that teacher evaluations that include test scores can lead to improve student outcomes.

However, Jesse Rothstein of the University of California Berkeley said that while there was not a “full consensus” on the issue, “I do think the weight of the evidence, and the weight of expert opinion, points to the conclusion that we haven’t figured out ways to use test scores in teacher evaluations that yield benefits greater than costs.”

Susan Moore Johnson of Harvard agreed, “Both standardized tests and value-added methods — widely used to calculate each teacher’s contribution to her students’ learning — fall far short of what is required to make sound, high-stakes decisions about individual teachers. Because standardized tests often are poorly aligned with state standards or a required curriculum, they fail to accurately measure what teachers teach and students learn… Combining standardized tests and VAMS for use in teacher assessment is unwise and indefensible.”

The platform may have been referring to statements from the American Statistical Association and the American Educational Research Association that raise concerns and limitations about the use of value-added measures in teacher evaluation. (Notably, though, neither statement says that such scores should not be used whatsoever in evaluation.) A 2010 position paper signed on to by several prominent scholars also raised concerns, though a response by other researchers argued that value-added had an important role in teacher evaluation.

It’s hard to say what level of agreement amounts to a consensus, and The 74’s poll of just nine researchers may not be a representative sample of expert opinion.

And while the scholarly debate has focused on value-added measures, teachers are actually more likely to be evaluated via “student learning objectives.” The 74 previously reported that such measures have limited research evidence and several teachers say they can be easily gamed.

All told, though, the researchers’ responses highlight significant disagreement — rather than clear consensus — even among scholars on this important issue.

The Democratic platform is certainly right that some researchers reject test-based teacher evaluation — but that’s hardly the full picture.

