EdReform Revived Transcript - Center For Education Reform

EdReform: Revived
Thursday, November 3, 2016
The Mayflower Hotel
Washington, D.C.

Sessions:
Isomorphism in EdReform
The Gold Standard of Research
The Connection Between Knowledge and Assessment

1) Isomorphism in EdReform (COMING SOON!) –2) The Gold Standard of Research — 3)  The Connection Between Knowledge and Assessment — 4) Regulation Stifling Innovation (COMING SOON!)–  5) Unintended Consequences of Reform Policy and Practices (COMING SOON!)

Session 1:
Isomorphism in EdReform

Session 2:
The Gold Standard of Research

Speakers:
SUSAN DYNARSKI, Professor of Education, Public Policy, and Economics, Gerald R. Ford School of Public Policy, School of Education and Department of Economics
MARCUS WINTERS, Senior Fellow, Manhattan Institute for Policy Research

MARCUS WINTERS: I want to take the opportunity to kind of think about where charter school research is and some misconceptions that are out there and what’s kind of the good studies and what’s the bad studies. I think that there’s some misconceptions out about some important questions. I think for an audience like this, it’s worth kind of thinking about, and I want to take that opportunity.

First, kind of the case for why we need high-quality work on this. The most important reason is that in absence of high-quality research, anecdotes and simple observations are going to win the day. Sometimes anecdotes and simple observations win the day, anyway, but if we don’t have high-quality work on this, that’s what’s going to win is “I don’t like charter schools. I like public schools, and I think that they’re doing bad. And my friend in the teachers union says they’re bad; therefore, they are bad.” If we can combat that with high-quality research, I think that is our most important weapon in this conversation, and all these things are really just magnified by people’s personal assumptions and their motivations.

I’d add on to that that communicating this high-quality work is also obviously of huge importance. That’s going to be kind of a theme as I kind of talk about where we are in the research on these things is that we actually have a lot of high-quality evidence on charter schools, and I think it’s often misunderstood. I think there’s a lot of people out there who think that we have a lot of evidence on charter schools and it’s saying that charter schools are not doing very well. I think that’s an odd idea, given the way the research actually comes out.

Also, really quickly, we motivated this session by thinking about random assignment. There’s kind of two different ways that we look at the effectiveness of charter schools and lots of other interventions. One is with random assignment. This is when too many kids try for too few season charter schools. We flip a coin. The coin decides who gets in and who doesn’t. The nice thing about this, the strength of this, is that it gives us the most convincing estimate of a treatment effect because if a flip of a coin determines whether you get in or not, then what doesn’t determine whether you get in or not among those kids is did your parents notice there was a problem, do they think they can transfer you to the school, do they have the informational resources to know those schools are available to them. Those are all the things we worry about when we make these kind of observation comparisons. So random assignment is the best way to kind of get out of that.

There are some weaknesses to it, though. One weakness is it can really only evaluate over-subscribed schools, so smaller schools, schools that don’t have too many kids were going for too few seats. This really can’t tell us anything about that, and we need a lot of them to be randomly assigned and a lot of them to be randomly assigned out. So there’s limitations in these studies that are worth thinking about.

The other ones we would think of is observational studies. Observational studies are if you think CREDO, kind of matching comparison groups. I look at charter school kids, and I use a statistical algorithm to find other kids in public schools who look very similar to them. So I don’t have random assignment, but I can make what looks like stronger comparisons.

A nice thing about this is that it can look at all students whether or not they’re randomly assigned. Now we can look at the undersubscribed schools also. We can maybe look at a broader picture of what’s going on.

The weakness, though, is that the assumptions that we need in order to think of these estimates as the real effect of attending a charter school are stronger, so there’s greater potential of these estimates that are impacted by unobserved differences between the kids attending these schools. That’s the broad framework of how we do this stuff.

So what should we prefer? Random assignment studies are the best. We call them the gold standards because they’re great, because they do give us the most convincing estimates of treatment effects, and so when we can do random assignment studies, we should do random assignment studies. They’re actually very important. They’re very valuable, and the nice thing about charter schools is we usually have a mechanism that forces us to do them because the fairest way to allocate these seats is randomly.

That said, I will say that I am pretty convinced by some of the well-done observational studies of these things. The well-done observational studies and probably, the best evidence of this is when we have a well-done observational study and use it, that technique and the same dataset that we use in the randomized field trials, we usually get a pretty similar answer. So that leads me to believe that overall, when this stuff is done well, it’s done strongly, we can get a pretty good answer on the effects of charter schools from the observational analyses, which is nice because it opened up some doors.

I would argue, though, so when people—the issue with that is some of the observational studies seem to be finding less positive effects than charter schools. I’d argue that’s not because of the research design. I don’t think that the real problem right now is in the research design. It has more to do with people omitting existing work from our conversations and so misunderstandings about what that is actually saying.

I am going to try to challenge some broad conceptions about what we, quote/unquote, “know” about charter schools. A lot of people in this room might already know that these are misconceptions, but I think it’s worth I am going to take the opportunity to kind of think about what some of this evidence actually says, and I have three claims that, hopefully, I can get to.

The first one that I hear a lot is that research finds that charter schools are no better than traditional public schools. I just moved to Massachusetts. I live in Brookline, which is a suburb just outside of Boston. My neighbors and the suburbanites are walking around handing out leaflets saying that the research actually shows that charter schools are no better than public schools. This is something that I hear. In a college of education, my colleagues tell me this, and they have two places they really look to for support for this. One is the CREDO national charter school study, and the other one is the Mathematica national charter school study. I am going to focus on CREDO, with some limited time here. Quickly, the Mathematica study, I think the real limitation of it is the sample of charter schools they actually have, so it’s a randomized field trial of 30 schools, 36 schools across 15 states. Around that time that they were looking at it, there were about 26 schools in Newark, and so they weren’t very representative of what was going on.

CREDO, I think, is a really interesting to think about, though, and it’s the one that we kind of hear the most of. When people say research shows that charter schools are not effective, they usually mean the CREDO national study. So the CREDO national study uses this observational approach, but I think their estimates are actually pretty convincing.

When CREDO does their estimates in Massachusetts, for instance, they get results that are pretty close to the Massachusetts results that Angrist and Sue and others have gotten. So they tend to match random assignment pretty well. Where I think it gets a little misleading, though, and what I think of as a misconception happens is that what they do is they run this analysis for a bunch of charter schools across a bunch of states, and then they combine them all together. And they say, “On average, we find a zero effect.” I think that’s actually pretty misleading, and the reason for that is we think of charter schools as a national reform, and in some ways, it is. We have these national organizations. We have the group of 300. It’s a national group that’s going to all these places. So, in that sense, it’s a national reform. We have an idea that it’s expanding across the nation.

But charter schools are really, really different from place to place. They’re different within localities, and they’re different across localities. So, when you combine them all together and say that there’s one charter school effect across the nation, I think we’re missing something really important. So, when we aggregate them all together as one national result, I think it’s not actually very helpful.

My interpretation of CREDO is that the estimates suggest that students tend to benefit from attending urban charter schools and particularly benefit from charters in some localities—and that often gets missed; that’s getting missed in Boston right now—and not from attending suburban charter schools on average, but I think there might be a reason for that, too, that I’ll throw out here.

So what I’ll show is my evidence for this. This is when CREDO looks at the results overall within cities. Break it out within cities. You see huge positive effects in a lot of these places. This is test scores, but it’s kind of what we have. What you see is that large positive effect in a lot of these places. In fact, if you look overall in the cities, the overall positive effect within the cities is positive, and in some of these cities, they’re very, very large. I mean, you can’t look at the scale there, but those are per-year effects. When you add those together over time, they’re really life-changing for a lot of these kids.

And I would argue what we need to do is focus on our locality. Focus on the estimates that really matter. In Boston, that’s that line that’s the furthest up there. That’s the conversation we should be having in Boston is about the charter school impacts in Boston and not overall nationally, and combining them up with everyone else, I think, tends to be pretty misleading.

So the other thing I think is worth thinking about is why is it that urban charter schools might be more effective than what we’re seeing for suburban schools. One reason might be that the most successful CMOs tend to be operating in urban areas. That’s one reason.

I also think—and I’d actually be interested in what Sue has to say about this. I think that there’s something going on about the comparison here. So one thing we haven’t totally thought about yet is that all we can do is compare the kids attending charter schools to the school that they would have attended otherwise. So, when we look at them in the cities, the comparison schools they would be attending are usually really bad urban public schools. The comparison schools in the suburbs are usually pretty darn good suburban public schools. You take a school who is having an impact on students in the city. Pick it up and move it to the suburbs. Even if they are doing just as well for those kids as they were for the urban kids, the comparison group is higher. The bar is higher. There is some evidence for this.

This is a recent paper in the Journal of Economic Perspectives that has some evidence on this, and so what you see, the kind of solider lines are the comparisons between charter schools and public schools in the cities, and this is in Massachusetts. The suburban lines are the dashed lines, and what you’ll see is that the top ones are proportion of kids scoring above Level 3 or whatever it is there, and then test score gains from baseline to tenth grade is what’s on the bottom. What you see is that the dashed lines actually get pretty close to each other in the suburbs. Part of what is going on here is we actually probably have some effective schools in the suburbs. They just don’t look like they’re effective because the schools they would be going to otherwise are also effective. I think part of the story here—and when we combine them all together, we’re getting kind of a misleading overall analysis of what charter schools are doing.

So I will quickly talk about a couple other claims that I think are worth thinking about and come up a lot, at least in my daily life. So Claim 2, when positive charter impacts are found, it’s only because they are systematically removing the lowest-performing kids. We have all heard this as evidence, and where is this supported from? From anecdotes from disgruntled charter school parents and the perception of high mobility rates out of charter schools.

I will also say I have talked to people who run charter schools who I respect, who say that there are schools that are pushing out the lowest-performing kids, that this happens. The question is, Is it systematic? Is it large enough to explain the positive effect that we’re seeing from the studies out of this? I would say no, and we have some research to back that up.

First off, these lottery studies that we’re looking at directly account for this. They directly account—in fact, their estimates are really conservative for estimating the effect of actually attending a charter school. Why? Because the way that we identify attending a charter school is actually being offered the chance to go to a charter school. So, if you are offered the opportunity to go to a charter school and you don’t go, the analysis treats you as if you are a charter school kid. So our estimates are taking into account the fact that some of them are not going there. So those students who win the lottery are pushed out, but they are a part of that estimate there.

Also, the push-out claim just does not seem to be consistent with any of the evidence we actually see when we look at the characteristics of kids who are leaving these schools. So I’ve done some work on this in New York City and in Denver. Ron Zimmer and Cathy Guarino have looked at the unnamed school district, place in the Midwest. And when you look at it, there really is no difference in the relationship between the probability that a kid exits out of a charter school or exits out of a public school and their test scores. So if the charter schools are systematically removing their lowest-performing kids in order to goose up their test scores, we would expect the low-performing kids to be the ones, to be more likely to exit out of charter schools than they are out of public schools. Actually, what we see is a very similar pattern of mobility of low-performing kids at charter schools and the public school system. Every time we look at this and the data, it doesn’t seem to show up.

So the Claim 3 that I’ll bring up, which is something that I’ve done some work on and is something I’ve been interested in a while is—so, three, charter schools are not serving kids with disabilities and those learning English, and so anecdotes claim—so support for this is anecdotal claims of disgruntled former charter parents, and observationally, it is true that charter schools serve lower proportions—lower proportions of students in charter schools have an IEP or are classified as ELL. That is absolutely true. The question is why.

I think that what we need to know is that there are several factors contributing to this. I’ve done some work really just kind of following cohorts of kids and mapping out, trying to explain why it is that smaller proportions of kids with these classifications are in charter schools and public schools, and we have a couple basic findings. One, the most important factor is that those students are just less likely to apply to attend a charter school in the first place. That might be for good reason for some of these kids. These are kids who need a lot of services. They are really worried about experimental charter schools. They don’t know that they’re good for them. They feel like maybe these services they receive out of the public school are going to be better, particularly for some categories. Some of them are receiving public services before they actually enroll in kindergarten, so they already have experience in the public system. If you want to address these enrollment gaps, you have to address who is applying in the first place is the first thing that’s on there. Some of that could be on the charter schools. Maybe we should be doing better at recruiting those types of kids, but it’s hard to put it all on their shoulders.

It has very little, if anything, to do with the exiting of these kids, and this is the claim that’s often made is that what happens is those kids go into the charter school. The charter school does not want them, and they put all in their effort and try pushing them out. We just don’t see that in the data. So, again, there’s anecdotes to suggest that’s true, but I looked at that in Denver and in New York City, and it turns out that in both of those places, a student disabilities or an ELL student who is attending a charter school is less likely to leave their school than are a student with disabilities or ELL student in a public school. There’s a natural mobility of these kids, and so the exiting patterns of them is not explaining these differences.

The other thing that’s going on is that there’s differences in the classifications of these things. I have some work finding that attending a charter school in Denver reduces the probability that you get a new IEP classification, which then reduces the number of kids who are classified as having an IEP. What is going on here is that there’s a complicated story to explain these gaps, but it has very little to do with the common narrative that these schools are just simply not serving these kids.

On top of that, we do have some recent evidence that when these students are attending charter schools, they are benefiting just as well as do their regular enrollment classmates. So we have some recent evidence from Boston, in particular. This is something we need more high-quality work on, but so far, the evidence seems to be suggesting that when these kids are going to charter schools, they’re doing just as well.

So where do we sit now? Look, there is lot of more work to be done on charter schools, and kind of the most exciting stuff that’s going on, I think, because that’s some of what Jay is talking about, is looking at these kind of later outcomes. One of the reasons we haven’t done that is because we haven’t had the data to do that, and these kids haven’t been old enough to do that. And now that we are starting to look at that, I think we need to take some of those results seriously if they don’t end up as positive as we want it to be. The charter people that I’ve talked to seem to be taking that seriously.

So we want to know what characteristics and operations make charter schools, more or less, effective. I think that’s an important thing to know. We can define effective in a lot of different ways.

I think it’s particularly important to measure the impact of charter schools in our own locality. I think talking about charter schools as a national reform is really misleading. It’s really hard not to do it, particularly when we have a national form to talk about charter schools, but when we are talking about them within our locality, what’s going on locally is what matters. What’s the rules on which they operate? What do we know about the effectiveness of these schools? That’s the story that I think we need to be focused on, and often we can’t.

But we actually do know, I think, a lot about charter schools and how they operate today. I think we just need to use that evidence a little bit better and understand exactly what it’s saying. I think the trick is understanding and communicating what that research is actually saying and what else we need to find out about it.

SUSAN DYNARSKI: All right. So I am going to focus on—I’m a researcher. I am going to focus on research design and how we go about understanding the effects of charter schools and dive in a bit more into the lottery methodology, in particular, to sort of explain where those results come from.

The key question that motivates me is, Do charter schools improve student outcomes? So I come at this as a researcher that’s interested in reducing inequality in educational outcomes, however we can. I don’t care if it’s a charter school or a traditional public school or a private school. If it’s doing a good job with students, yay. I have evaluated programs in all of those settings.

In all of those settings, it’s hard to understand whether a given school actually improves outcomes. Say we compare test scores of charter schools and traditional public schools, and we find that charters do better. Well, you know what you are going to hear immediately. It’s that the charters are cream skimming. So only the children with the most motivated parents are the ones who are going to the charter schools, and that’s why we see higher test scores.

If we see charters do worse, by contrast, from the other direction, charter supporters might point out that charters tend to serve the neediest who tend to have the poorest performance. With both of these, the problem with just comparing a test scores of the two types of schools is we’ve got selection bias, and that makes results of many analyses very sensitive to how you do them. When you’ve got selection bias, methods actually matter quite a bit.

Here is one of my favorite examples of research at work in the charter sector. 2004 was a very exciting year for charter research . The 2003 NAEP came out, and several researchers leapt on it. So AFT, American Federation of Teachers, did an analysis with it, and they concluded that charters do worse than traditional public schools. The U.S. Department of Education stepped in with its own analysis and came out that it was inconclusive. So their conclusion was they couldn’t—not enough evidence, couldn’t say much. Caroline Hoxby, who is now at Stanford, concluded, again, using the same data—these are all the same datasets; these are not independent datasets—that charters were doing substantially better.

Now, the critical difference between these three studies, actually, was what the comparison group was. So you’ve got the kids at the charter schools. Who do you compare them to? Do you compare them to the entire nation, or do you try to find a set of students who are more comparable? AFT and the Department of Education both chose all traditional public school students as the comparison group. Hoxby, by contrast, chose students at the nearest comparable school, so schools with students who looked like—in the observable characteristics who looked like charter students, and then they proceeded to argue with each other about what the correct comparison group was. If I recall correctly, I think it ended up even with arguments of newspaper advertisements. I think it was the first time I saw research methodology being duped out in ad copy.

It is very hard to converge on what the truth is when you’re having to argue in this way, and it’s pretty unavoidable in most settings with observational analyses. I am glad to hear it is true that in the studies that researchers who have lottery data have done, when they’ve gone on to do observational work, they tend to get similar results. The problem is that everyone can do observational stuff, and many of them will do it wrong. It’s a lot easier to do an observational study wrong than a randomized trial wrong, and I’ll show you in a second why that’s the case.

Randomized trials. We’ve already heard them called the “gold standard of research.” I just wanted to step through for a second what a randomized trial is so we can then understand how the charter studies relate to what an actual randomized trial is. Think, for example, the Food and Drug Administration requires that all new drugs to be used in humans have to go through a randomized trial. That’s how we determine whether they’re safe and effective from a medical perspective.

What happens in this setting? Well, first, people volunteer for a study. They might volunteer through their doctor’s office. They might volunteer by seeing a flier someplace or an advertisement that says join this study about some drug, and then a coin flip or Excel or something like that decides who gets the treatment and who does not among the volunteers. The treatment group gets the treatment; the control group does not. And then you compare the results. The nice thing about this is that it gets past the selection bias problem. So the coin flip means that the treatment and control groups are identical on average in every way.

if you go back to the charter example. The cream-skimming issues should be identical in the two groups. The neediness of the students, poverty, parental education, all that stuff should be the same in both groups, and that means when you compare them, we know that the difference between them is not about their underlying characteristics but about the thing that we manipulated, the treatment that we manipulated. So that’s where we get the confidence. We don’t have to argue about whether we should be controlling for sex or picking out certain students or so forth because the coin flip basically did all that work for us, and it eliminates a lot, virtually all of the discussions that we have to have about how exactly we should do it.

Those of you who have worked with data, who have done your own analyses, know that it’s quite possible to torture data until it delivers the answer you want it to, and that’s the big weakness of an observational study. The randomized trial, it’s very hands above the table. As long as the randomization was done correctly, which is verifiable by other analysts who can replicate the analysis, you’re going to get the treatment effect. So there’s less room for funny business, and when you’ve got groups of people who distressed each other, which unfortunately we have in the world of education reform, having some sort of hands above the table method for evaluating stuff helps a lot.

So what does this mean in the world of charter schools? Well, every charter school that uses a lottery to distribute scarce seats is running a little mini randomized trial. It wasn’t planned as such, but effectively, it’s a randomized trial. And the volunteers are those people who have applied to the charter school essentially. It’s not everybody in the school system is in these data. It’s the volunteers, the people who voluntarily said, “I want to attend this school. I’m going to apply,” and so researchers have been making use of these experiments, as you’ve heard already from the previous speakers.

The research to date that uses lotteries, making this research happen requires a state that is willing to cough up the data, the underlying data on test scores and where students go to school. That doesn’t happen everywhere, and it also requires that researchers have gone to charter schools to get their lottery data, and it also requires that the charter schools are willing to give up that data. All of those things have not come together in every state in the nation by far. They have come together in a few places. They’ve come together in Massachusetts; in New York City, Hoxby did some work in this area; the Harlem Children’s Zone; Mathematica’s national study that you heard about. And that’s about it so far. In theory, we could have many, many, many more lottery studies in place, but as it is right now, there’s a lot of shoe leather required to gather those lottery data.

We’re working on a research project in Michigan where we’ve gathered lottery data from about 80 charter schools, had to go school by school to collect those data. A key reform that would help with evaluations in this area would be if charter schools were integrated into the same choice mechanism that’s used in, say, New York or Boston, and then, automatically, you would have administratively the data gathered on who the winners and losers are, and you could, as a matter of course, very cheaply, be evaluating the schools. So Denver has been moving in this direction.

It also reduces the barriers to parents to applying to charter schools. I mean, it’s tough on parents to have to go school by school and submit separate applications. In Boston, there is an enforced choice, essentially, for the traditional public schools. At each transition grade, parents get a list of schools that their kids are eligible for, which is basically the whole district, and they go through and rank them. The charter schools are not on there. The parents have to step outside of that structure in order to apply to the charter schools, and bringing the charter schools in and doing the lottery centrally would mean we would automatically have that data. It would make it cheap and automatic to be able to evaluate the Boston charter schools every year automatically without the MIT and Harvard and Michigan researchers getting together and going out and finding those data.

Based on those data, what do we know? Well, you already heard it. The charter schools serving poor, nonwhite kids in urban areas appear to increase test scores quite a bit. There’s especially large effects for the subgroups who tend to enter with low test scores—ELL, SPED kids, kids from low-income families.

This is about test scores. In Boston, we have gone on to look at college attendance, and at high school graduation, the SAT effects are quite large. The AP score effects are quite large. In terms of the high school graduation effects, there’s only so much room for improvement because 100 percent is the max. Test scores can increase pretty much continuously. There’s room for huge increases in test scores. The ceiling on high school graduation, last time I checked, was 100 percent, and in Boston, as in many places, you’re kind of close to it. You can’t proportionately increase high school graduation to the same degree. Ditto, for that matter, with college attendance. The relevant margin in college postsecondary attainment right now is not getting more kids to go to college. It’s getting them to go to good colleges and to persist. We don’t know yet what that effect is on college graduation because they’re not old enough yet. So one issue with ed reform and evidence about long-term effects is that ed reform hasn’t been around that long. You can’t get the long-term effects of a young program. So I think it’s not fair to criticize the research for not being able to measure things that have not yet occurred. We haven’t advanced our methods yet sufficiently to be able to measure the future.

By contrast, charters serving non-poor, white kids in suburbs don’t appear to increase test scores and, if anything, might decrease them in some cases. Marcus talked about whenever you’re getting a treatment effect, there’s three reasons at least why—there’s three reasons why you could see differences across settings in what the treatment effect is. The treatment could be different; that is, the charter schools could be differentially good or bad in different places. The control status could be different. The traditional public schools could be different in different places in their effectiveness, and the students themselves might respond differently to the charter treatment.

You emphasized the issue with the fallback option. I think there is also a difference in who selects into charter schools in urban versus suburban areas. So the sense I get in suburban areas is the parents who are choosing charters are the ones who want to get their kids away from the test-oriented culture in their suburban schools, and the charter schools actually tend to be the sort of groovy places where you can get away from all of that test-based focus.

In Massachusetts, all of the urban schools are No Excuses schools. When we went around and talked to the charter schools in the rest of the state and we asked, “Are you a No Excuses school?” at least one of the schools, the principal said, “We’re an any excuse school.”

And this pattern holds within Massachusetts, and then, again, it’s pointed out in the Mathematica study nationwide. The positive effects, strong positive effects, are in the urban areas for nonwhite kids. These are the settings and the children for which ed reform was born. To say that the effects are strongest and most successful for what was the target population, I think, is pretty terrific.

If suburban parents want to send their kids to schools that don’t focus on test scores, I’ve got no problem with it, but then if people complain that those schools don’t increase test scores and, therefore, it affects the viability of the charter school movement, that, of course, is a problem.

Here is an example of a lottery approach in Massachusetts. First step is that we would find in the Massachusetts statewide data, which the State of Massachusetts gave us access to, anonymized, privacy, secure, all that stuff. We identify the applicants to a given set of charters, took out anybody guaranteed admission. So siblings, for example, are typically guaranteed admission, so they’re not actually in a lottery. And we end up with a list of applicants who are in the lotteries, and we separate them into those who are offered a seat, the green, and those who are not offered a seat. Even among those who are offered a seat, the greens, just 74 percent of them ended up attending a charter because some of them might win and decide to do something else, anyway. Maybe they got into a private school that they would want to attend, or maybe they got into the exam school down the street. But, in any case, not everybody who wins goes to a charter school, and by contrast, on the other side in the red, some of those who lose this lottery end up attending a charter school anyway. They go to an undersubscribed charter school, or they reapply to this same charter school in a future year. But comparing these two groups, how are they different? The only thing that’s different about them is the propensity, the likelihood that they’re going to a charter school, and that’s the essence of the lottery comparison. It’s not a comparison of those who do and don’t go to charter schools. It’s a comparison of those who are given the opportunity to go to a charter school versus those who have been denied, the opportunity to go to a charter school.

And this is a good place to point out that if charters are expanded sufficiently to meet demand, our ability to run these analyses disappears. So these are based on scarcity. You run a lottery because you have more applicants than you have seats. So if we get into a situation where there are enough charter seats for students who want them, no more lotteries. Now, it may well be that within the charter world, there will still be some schools that are sought out more than other schools. So you’d be able to compare some charter schools to other charter schools, but you sort of lose the ability to compare the charter sector to the non-charter sector.

And that’s actually—so now I’m working—I started out doing work on charters in Massachusetts, which has a cap on charter schools and their scarcity, lotteries that we could work with. Michigan, when we stepped into that setting, a lot of schools that we talked to in Detroit, for example, had been oversubscribed 5, 10 years ago, no longer were, and were no longer running lotteries because the state now has 240 charter schools. And some of the markets are saturated.

Once when we do this, when we compare the students who get an offer of a charter versus those who don’t and we sort of take test scores and give them a mean of zero in the state, so we’re looking at deviations from zero, we found that those who are offered a seat had test scores, standardized test scores, .11 standard deviations above the mean, and those who were not, .09 below the mean. And the difference between those two at the top is going to be—that’s the difference in test scores. You can just look at that difference in test scores, but you can also try to norm it to scale it in some way, scale it by how much time kids spend in the charter schools, and that’s what’s on the bottom. These are middle-schoolers. So the maximum number of years they can spend in a school is 3. That’s why we have just 1.27 versus .43.

You take the estimated effect on scores, which is .2 standard deviations, and divided it by the effect on years spent in a charter, which is .8 standard deviations, the ratio is .24 standard deviations. For each year spend in a charter school, test scores go up by .4 standard deviations, which is big as far as at-scale educational interventions go.

Using those methods, what we found in Boston was that charter schools increased math scores by .4 of standard deviations a year. The results I just showed you were for the entire state. This now is just for Boston. As context in Boston, the black-white gap in high school in test scores is .8 standard deviations. So it’s big by that scaling as well. For reading, the results were also large, but somewhat less large, still significant, .2 standard deviations, which is still huge, by the way, in the realm of education interventions.

It looks like basically each additional year increases scores. This is the difference in the scores between the people who win versus those who lose in the middle schools in Boston. So you’ve got their baseline difference, which is essentially zero, and then year by year, you see the difference going up. The math is on top, and the ELA is on the bottom. It’s not like it’s just a 1-year blip. It’s that the difference is marching up over time.

In New York City, Hoxby found overall increases across the city of about .08 standard deviations a year. HCZ, the effects were close to our Boston estimates. We also did an evaluation of the only KIPP in Massachusetts, which is in Lynn, and there, we found effects that were similarly large. And KIPP Lynn was one of the few places where we actually had a charter school with a good number of English learners. Lynn is a heavily Hispanic area, and the effects were largest among the ELL and special ed kids.

I am going to end with this. We need more research in more places on what charter schools’ effects are for the reasons that we mentioned. It’s going to depend on context. In a place with very strong traditional public schools, you’d expect to see more muted effects. That said, Massachusetts has some of the strongest traditional public schools in the country, and that the charter schools are getting the effects that they do relative to those traditional public schools means that those charter schools in Boston are just hitting it out of the park. If you dropped them someplace like Detroit, the estimates would be absolutely enormous because the counterfactual, the comparison would be much, much worse. So we do need to see these research—these lottery studies done in more places. It’s not going to happen unless it’s easier to do them. It’s just they’re too expensive, and there’s too much shoe leather involved going school by school by school and gathering up their lottery data.

Having it routinized and sort of built into the life of charter schools that these data are generated, this group might not like this idea, but the idea that the lotteries should be monitored and checked for whether they’re actually random. That the data needs to be uploaded to the state, for example, to check for whether they’re actually random, that should be fine. That’s not about school practices. That’s about you’re running a fair lottery or not, and that should be something just like fair voting should be something that I think the charter sector should embrace. And when the data is then uploaded, easy enough to do the comparison, take those kids’ test scores, those who lost, those who won, and compare them. And I think it would give the movement a lot of credibility to be pushing that forward, “Yes, please check what we’re up to and see that what we’re doing is above the board.”

Moving on from like what is the average effect of charter school nationwide, district by district, I think learning from charter schools—the initial idea was a set of laboratories, that you have a place where people can try new things and see if they work, and so the next step in the research is trying to understand why, why we see the effects we do in some places and not in other places. Is it that the students vary from place to place and some students just respond better to the charter treatment? Is it that the school practices, the charter school practices are varying from place to place? Is it the regulatory enforcement? Is it the money? So the amount of money that goes to charter schools on a student-by-student basis relative to traditional public schools, as we know, varies a lot by state, and I haven’t seen that enter into the research at all.

Massachusetts, the traditional public schools and the charter schools basically get the same average per cap. In California and Michigan, the charter schools get a lot less, so we’d expect to see smaller effects. If we see zero effect, then we’ve got evidence that the charters are getting the same done with less money. So bringing that sort of information into the analysis, I think, is an important next step.

And then the other reason to find out what’s working is to try to extrapolate the results to other settings, find practices that work and pass them on to other charter schools, pass them on to other traditional public schools, and we’ve started to see some of this experimentation. Roland Fryer has taken some of the No Excuses practices and sort of, one by one, experimented with them in Texas schools. So is it a longer school day? Well, let’s experiment with a longer school day and see if that’s it.

Now, some of these things might be complements with each other and they don’t work by themselves, so the next step is to put some together. Is it a longer school day plus tutors together? Separately, they don’t do much, but you put them together, you get something different altogether.

These can happen in both traditional public schools, but they can also happen in charter schools, and that would be another way for charter schools to be sort of in service to education more broadly is if the greater freedom that they have is used to experiment purposefully, using randomized trials to learn from specific methods and see if those methods work, and if they do, help to spread them.

Thank you.

JEANNE ALLEN: Fantastic, Susan and Marcus.

I want to kick off with one question and invite those who also want to engage up to the microphone. So you talked and you touched on this towards the end, Susan, sort of goes to the issue of quality of data. What do we know from these? What are the specific kinds of practices in a school that might be working?

I remember when the 2009 study first came out, the CREDO study you both talked about, I reached out to John Chubb and said, “What do you think? There’s just something wrong with this. I’m not understanding how we could possibly have thrown all of these data points from all these different states into a Cuisinart, shook them up, and had this result,” and he wrote me and said, “The study, ambitious as it is, covers a few years of a child’s education, three max. Given that school switching of any kind tends to yield a first-year drop from which students need to recover, it’s misleading to ask how a sector of schools is performing if one only looks at the initial years. The study finds kids learning increasingly more in charters relative to traditional public schools in years two and three. A more fair look at relative performance would have covered a school career for a kid, not just the early experience and not making matters worse, the sample of kids with one year of experience is larger than the multiyear experiences.”

So I raise that issue from John in 2009 because when the states give you data as researchers, something that I’m just learning a little bit now—and I know most of us don’t think about—is we assume that the data has all this stuff. Like we assume that when you’re looking at these schools, you know how long they’ve been in existence. You know, for example, you refer to differences in state law. Some schools start in fourth grade. Some start in ninth grade. Some are starting like in second grade. I mean, it’s weird, right? So are there those factors as part of the data, and how much does quality vary or even intensity of data vary state to state?

MARCUS WINTERS: A lot. A lot of the stuff we do know, I think, in most of the datasets, and if you don’t know it from the state kind of handing it to you, you can find out pretty easily how long the school has been around and you can see how long the student has been in there, and that’s something that their estimates do.

It would be hard for me to think a tempered result in charter schools is coming from school switching. We do see kind of year one effects that kind of grow over time. So I don’t think that that would really explain to me kind of the negative results, the less positive results we’re seeing in CREDO.

Some datasets are really rich and robust. Some are less so. Some states have been at this longer than other states, and some states have the lottery information easier to get than others. So there’s variation in that, and I think you’re right. Having a little more uniformity would be helpful to us, which is not always their main priority.

ATTENDEE: Hi. I’m from Philadelphia. I run a charter in Philadelphia. One of the things that I see is that for the high schools at least, they have so many selective admission high schools. There’s about a third of the public high schools are—in fact, a little more than that. You have to have certain qualifications to get in, and they’re pretty extensive qualifications.

If you look at the comparison between the top charters and the top public schools, the college enrollment data is not very far apart. It’s probably 5 percentage points, which isn’t too far.

So I look at these comparisons. When they kind of loaded their side and we still come up comparatively, that really shows that we’re doing a good job, but when you put that in a paper, it never looks like a good job. It looks like they’re doing better than you, and that’s all anybody ever sees. How do you address something like that? How do we get around that data problem?

SUSAN DYNARSKI: I touched on this, but I think that college enrollment, is not a very useful metric at this point. College enrollment rates are very high, and part of the college enrollment at, say, for-profit colleges, like ITT or Corinthian—and that when you’re looking at just a college attendance percentage, that’s in there with the 4-year colleges, for example. So academic—but the dropout rate on average across all college entrants is 50 percent. Fifty percent of all of those who attempt college actually never get out, and it varies enormously by type of institution. So it’s 95 percent for some community colleges and some for-profits, and it’s 5 percent for some excellent 4-year schools. Focusing on the quality of the schools attended by the students and their persistence, I think, is more important. How many of them show up for the second year or the third year or the fourth year get a degree?

Just about all states have the capacity to provide this kind of information to districts and to schools. During the recession, the feds made a condition of state fiscal stabilization funds that states determine what share of their high school graduates go to college, what share of their high school graduates complete 18 months of college, and to do this, states matched on either data from their public systems, their public university and college systems, or from the National Student Clearinghouse. And that’s what we’ve used in Massachusetts and in Michigan to track whether these students have gone to college.

ATTENDEE: I think that’s a great idea. In Philadelphia, they told us they can’t track the chain of custody.

SUSAN DYNARSKI: You can do it for your own students.

ATTENDEE: We do it. We have it, but we take our report to them, and that’s what they said: They can’t track the chain of custody. I never heard anything so ridiculous, but that’s what they said.

But the point here is this. You’re right. If it’s tracking how people go through college, then it’s meaningful. If it’s just that one first fall thing, that could be anything. That could be trade school. That could be barber college. That could be anything, and we need to track the quality of things. And that’s just not happening. Thank you.

TIM KELLY: Hi, Susan. I’m Tim Kelly, State Representative from Michigan. I chair K-12 appropriations in the House. I sit on ed policy as well. How come I have to come to Washington, D.C., to hear about a study done in Michigan charter schools?

SUSAN DYNARSKI: Come on down to Ann Arbor. No, I’m kidding. We’re still working on it. We don’t have the results yet is what’s going on, but we have worked closely with the statewide charter organization and visited every charter school in the state to get this information.

TIM KELLY: Has there been any influence in trying sitting on this study?

SUSAN DYNARSKI: No.

TIM KELLY: All right. Thank you.

RON RUSSO: Good morning. I’m Ron Russo from Caesar Rodney Institute in Delaware, and I would like to piggyback on the very last thing you had brought up about the traditional schools benefiting from charters.

In Delaware, it was the governor, who was Tom Carper at the time. It was the Department of Public Instruction, and it was a consortium of six businesses headed up by the DuPont company who actually got together and passed the charter school law in Delaware in 1995.

Now, it’s interesting in that I was hired to open the first charter school, and we’re talking about education reform this morning, but I just want you to understand, at least in Delaware, the main thing they were talking about—and this goes to—I believe you’re involved with economics. I saw your title. And if you take a look, I was hired and I was told flat out the reason for what they were doing in Delaware was for the state’s economic well-being. They said they wanted to attract business. They wanted to retain business. They were worried about real estate values, taxes, crime rates. You go on and on, and what this does is it broadens the support for educational reform.

Why? If you look at charter schools or traditional schools, if you don’t have a kid in school, what real interest do you have in what’s going on? Well, in Delaware, they were trying to give you the idea that if you lived in Delaware, if you worked in Delaware, if you had an investment in Delaware, you had an interest because I was told the existing public school system was a liability to the economic health of the state. So that helps to bring it up.

Now, we’re talking about measuring success. I just gave you some indicators I haven’t heard anybody talk about. We’re talking about how the kids do. Well, you know, they go through the elementary school. They go through middle school. They go through high school and college. That’s not what we were focusing on in Delaware in 1995. We were looking at, well, what is that going to do to people’s incomes, what’s that going to do to the taxes. That’s something that you can measure and you can measure regularly. You can see if you’re going up, down, sideways, or whatever.

So the idea for charter schools wasn’t to create a number of charter schools or a competitive school system. As a matter of fact, in Delaware, it was never intended to have a lot of charter schools. Why? And here’s where it comes. This is the big piece, if you will. The whole idea was something called “systemic change.” It had to do with changing the way the traditional schools operated. It was to customize schools, not standardize them. It was to shift operating authority and responsibility into the building with the building principal, which we would call CEO, chief education officer, working with—distribute education, working with the teachers. It was to take the authority from school boards, from the districts, and put it into the local building.

Now, how are you going to hold them accountable? We’re talking about—in charter schools, you take a look at what? Enrollment? Well, think about it. We’re saying to the parents, “You are the ones who will have control. You have say,” and by the way, the main customer of education is parents. They’re the primary clients. And I would offer for everyone’s consideration a big supporter or perhaps—perhaps an ally would be the best way to express it—would be the business community. If you get them involved—and I know we talk about—in education, you talk about how legislation is passed and how you talk to the people, and I certainly wouldn’t want anybody to think that I believe things are rigged. But I’m originally from Atlantic City.

JEANNE ALLEN: So, Ron, let me ask you: Are you asking, also, so where’s the economic piece for business? Is there impacts? Have you all studied those?

RON RUSSO: The answer is I happen to believe the economic piece is the critical piece. You said in the headline—or the heading— shocking, that economics is shocking. We ought to employ it.

SUSAN DYNARSKI: So here’s another place where you can help to make all this research more relevant. Very few states have connected their education data systems to their earnings data systems. So Massachusetts, which is, in general, good on a lot of ed reform issues, on research issues, has not done so, for example. So a demand for that kind of research might help to make it happen. We would love to in Massachusetts and in Michigan, for that matter, look at the effect of school quality, of charter schools on earnings. I agree that that is what matters, not what diploma they have, but their well-being as adults. And we’ve got a data shortage in that area.

MARCUS WINTERS: Yeah. It’s not because people like us don’t want to look at those things. In fact, there’s a huge race going on to see how many people can look at how many long-term outcomes, and part of it is because the data doesn’t match. It only matches in a few places, and part of it is because the reforms are just now kind of getting—have been around long enough that we can start to look at those things.

At AFP last year, I could not count the number of papers that looked at these kind of longer-term outcomes. It’s what we’re up to now. We’re trying to be.

JIM GOENNER: I want to go back to the base premise, and that’s the question: What’s better? Charter schools or traditional public schools? And I think Ted [Kolderie] would argue that it’s the wrong question, and he’s been putting it in the form of “What’s faster? Humans or animals?” So what’s faster? Humans or animals? And you think about that answer, it would be “It depends.” Right? Long race or short race? Lots of turns? Climbing? Can you fly? Can you use a car, et cetera?” Do you think that in some ways, we’re trying to as a question that’s just flawed on its premise?

SUSAN DYNARSKI: This comes back to—so say you do get an average effect for Massachusetts or for the country. Could you then take that and know anything about some randomly picked charter school that you then look at? Is it going to have the effect that you got on average? Probably not. Traditional public schools are hugely variable. Can you answer the question do traditional public schools work? And the answer is going to be, well, yeah, in some places, they do, and in some places, they don’t. And I do think it’s a coherent question, though, to ask in a given setting, in a given school district, in give states where you’ve got a set of traditional public schools, and you introduce charter schools. Do students at the charter schools perform better? That is a coherent question to ask, but it does need this perspective that you’re referring to that you can’t just sort of say the nationwide statistics, I would say, are not very helpful. Charters are located in different places than the rest of the country. We don’t have charters in many suburbs. You don’t have charters in many rural areas. So you have to drill in and look more closely at comparable school districts.

MARCUS WINTERS: Well, also, the only place where we have—one of the exciting things going on about charter schools as a reform, but also as far as research is concerned is we now have some pockets where the charter schools are much bigger players than they ever have been. The only place where we have a full, every school a charter school district is New Orleans, and so far, the evidence on that has been pretty positive. But it’s kind of the only example we have of that.

But now we have some cities where 30, 40 growing percentages are being—

JEANNE ALLEN: 46.

MARCUS WINTERS: So the kind of—the every school a charter school mantra is getting closer, and I think part of that—it would have been impossible. Like there was never a conversation during the early states. I doubt—like the goal could have been every school a charter school, but the idea that a whole state would become a charter schooling, I don’t think was ever really on the table. The first thing you have to do is adopt it and see how it’s going. If we give schools autonomy, are we seeing positive effects? But then the next part of that question is, well, what are they doing with that autonomy, and what is kind of, more or less, effective? And then have the conversation of can we really expand this, and that’s where we are in a lot of these places, I think, and that’s where we’re getting pushback in places like Massachusetts. And that’s, I think, where we are, and I think the conversation should be about trying to expand the sectors as we go.

JIM GUNNER: The follow-up then is that charters are—an institutional innovation is the way Ted would describe it versus a learning program, and so part of where I hear you going is we could actually compare college prep charter schools to college prep traditional schools versus the institutional structure, traditional public, versus charter. Am I hearing that right?

MARCUS WINTERS: Well, what we do is we compare students attending charter schools to the schools they would have—to how they would have performed had they gone to the school they otherwise would have attended, so that might be another college prep school. It might be a different type of school. Charter schools, because they are a choice reform, vary substantially from each other. So we can start looking at, well, where do we attend to find the, more or less, positive effects, with part of that idea, what should we bring into the traditional public school sector, assuming that it continues to exist.

So overall comparisons of charter schools and public schools, we have to do them, but they are somewhat misleading, too, because charter schools are designed to be very different from each other.

SUSAN DYNARSKI: Charter schools are a license to try something different, and the fact that we have given them this name of “charter schools” makes it sound like a more solid thing than it is. But it really is just the freedom to try something different from what the local traditional public schools are doing, so it’s not a well-formed question to say what is that thing. But you can say what is the results. So you were given a license, and you did different things, depending on where you were and what your goals were. Were the outcomes good? So I would say focus on what outcomes are. Has the license to experiment led to better results for students? That is a coherent question.

JEANNE ALLEN: And the great segue of that last question and your response is then what is it measuring, and how are those assessments measuring learning and knowledge and whether it’s the art school or the back to basics or the college prep or fancy-schmancy science and tech school? What is the impact on knowledge, and do assessments actually know it?

Session 3:
The Connection Between Knowledge and Assessment

Panelists:
ROBERT PONDISCIO, Senior Fellow and Vice President for External Affairs, Thomas B. Fordham Institute
JAY GREENE, Distinguished Professor and Head of the Department of Educational Reform, University of Arkansas
GERARD ROBINSON, Resident Fellow, Education Policy Studies, American Enterprise Institute

Discussants:
TOM VANDER ARK, CEO and Partner, Getting Smart
ROBERT JACKSON, Chief Academic Officer, Great Hearts

GERARD ROBINSON: What we’ve heard so far are two things: (1) We know that assessments play a role, and (2) We know that school choice matters. But how exactly does it matter to knowledge and to learning? With that I’m going to turn it over to Robert, who will kick us off.

ROBERT PONDISCIO: At confabs like this I tend to be the guy who says “charters,” “choice,” “data,” “teacher quality.” That’s great. Can we talk about what kids actually do all day in the classroom? I bring the perspective of having been a fifth-grade teacher in the South Bronx for many years, and more recently a teacher part-time at Democracy Prep in New York City, so I try to keep the classroom and curriculum and instruction at the center of what I do.

I have a complicated relationship with testing. On the one hand, I value it. On the other hand, I refuse to pretend that it has caused no mischief in our schools, narrowing curriculum, encouraging large amounts of ill-conceived test prep, making schooling a joyless grind for our children. But neither can I deny that there have been real, if modest, gains in our present era of test-driven accountability, especially for low-income black and brown kids, which is all I’ve ever taught, as a teacher, and particularly in the early grades. In pieces I’ve written about this I’ve likened our relationship with testing to Jefferson’s famous quote about holding a wolf by the ears, which is to say we don’t much like it but we cannot let go.

The most reliable means we have of evaluating performance of schools and teachers is deeply unpopular—you know this. The more popular means are deeply unsatisfying—very squishy, easily manipulated. So America’s relationship with testing is also complicated. More than half of us, I think, based on last year’s PDK poll agreed that standardized tests are not helpful in letting teachers know what to teach, a figure that jumped, by the way, to roughly two-thirds when you count only public school parents. At the same time, there is strong support and far less controversy, speaking of our relationship with testing, with things like college entrance exams, tests to determine promotion from one grade to the next, AP testing towards college credit to high schoolers. So it’s not as if we don’t like testing, period. We just don’t like the tests that our kids take for performance.

American support testing when it’s in the service of clear, well-defined outcomes, but they don’t seem to regard the standardized testing, going back to the No Child Left Behind era and in Common Core in the same way. I have no idea how to resolve this. Those of you who are expecting some clear vision on this, I’m going to disappoint you because this is the question that I’ve wrestled with on and off for more than 10 years and I’m no closer to a solution now than I’ve ever been.

One, without a doubt, and in the main, testing has done more good than harm in America’s schools, but it is long past time, I think, to acknowledge that reading tests, particularly the tests with stakes for individual schools and teachers, the kind of accountability Jay [Greene] insists that we’re not doing, by the way, I think it’s time to acknowledge that those tests are doing more harm than good, and I’ll speak briefly about what I mean. Again, all of this from the perspective of classroom practice.

A good test or accountability scheme should encourage instructional practices that are good, that we value. Reading tests, I would argue, do exactly the opposite. They encourage poor practice. They waste instructional time. They materially damage reading achievement, especially for our most vulnerable children. See, a test can tell you, for example, if a student has learned to add or subtract unlike fractions, can determine the hypotenuse of a triangle, understands the causes of the Civil War, and by reasonable extension, whether or not I have done a good job teaching the child that content.

But reading comprehension is not a skill or a body of content that can be taught. The annual reading tests that we administer to children in third through eighth grade are de facto tests of background knowledge. I’ve written deathlessly about this, and everything I’ve ever learned about this I’ve learned from E. D. Hirsch, whose work I assume you’re familiar with. Those reading tests are de facto tests of background knowledge and vocabulary, so they are not instructionally sensitive. Success or failure has little to do with what I do in the classroom on any given day.

There is, perhaps, a substantial body of research that says that reading comprehension relies on the reader knowing at least something, and sometimes quite a lot, about the subject he or she is reading about, so the effects of prior knowledge can be profound. A student who is ostensibly a poor reader suddenly looks like a rather good reader when he or she is reading about a subject with domain knowledge that he or she possesses. The most famous study, I think, was the Recht and Leslie study with baseball. Students who had low reading skills but high content knowledge of baseball, for example, easily outperformed ostensibly good readers with low content knowledge. And that’s a generalizable conclusion. If you have a lot of schema, as the reading experts say, about a topic, that compensates for your lack of reading comprehension skill. I’m painting, obviously, with a very, very broad brush.

But the reading tests that our children take, even the Common Core tests now, treat reading comprehension as a broad, generalized skill, again, like throwing a ball or riding a bike. So to be clear, by the way, decoding, which is the knowledge of the letter-sound relationships that enable you to pronounce words correctly, that is a skill. If I had more time I would put up nonsense words, as I’m fond of doing, and show you that we can all agree on decoding, that is a skill, and a transferable skill. Reading comprehension is not. But again, the key thing here is that reading tests treat reading comprehension as a transferable skill. So when we treat reading comprehension as a transferable skill, when we test it that way, when we incentivize teachers to teach it that way, students lose.

Math, if you like math, you can think of it as a hierarchical school-based subject, there is a logical progression of content to be taught, but reading comprehension is cumulative. Every cognitive input that a child has, from the day he or she comes home from the hospital, to the day he or she sits down for the fifth-grade reading test, builds that background knowledge and vocabulary, and not all of it, quite obviously, is school-based. So this is why affluent children who enjoy the benefits of educated parents, who speak in full sentences, who read to them, who fill their lives with concerted cultivation, why those kids do much better on reading tests, and, by the way, why it’s so difficult for schools and charter schools that serve low-income children to raise reading scores. It’s just harder to move that needle.

By treating reading as a collection of content-neutral skills, we make reading tests a minefield for both kids and for teachers. The test passages on reading comprehension tests are randomly chosen. They are not necessarily based on school-based knowledge, and even when they are, they are not necessarily pegged to any particular grade, and yet we’re using these for accountability for schools and teachers and whatnot. So, in short, the students who do well on reading tests tend to be those who have a lot of prior knowledge, read about a wide variety of subjects. That’s the wellspring of mature reading comprehension ability, not skills like finding the main idea, questioning the author, and whatnot.

As a practical matter, standards do not drive classroom practice. Assessments do. The first and perhaps only litmus test for any accountability scheme is simply does this test encourage the classroom practices we seek? In the case of annual reading tests with high stakes for kids and teachers, the answer, I’m afraid, is clearly no, they do not. Nothing in reading tests as currently conceived, or whether before or now, during Common Core, encourages schools or teachers to make the urgently needed long-term investments in knowledge and vocabulary that, again, are the wellspring of mature reading comprehension and that drive language proficiency.

So what could replace them? This is where we reach the limits of my good ideas here. Options could include testing reading annually but eliminating stakes, testing decoding up until grade four and then stopping with reading tests altogether, substituting subject matter tests for reading tests. The best and most obvious solution, frankly, is a complete and total political nonstarter, and that would be curriculum-based tests. For good and obvious reasons that I’m sure Jay will be happy to remind you of, we can’t do things like have a national curriculum, and I’m not arguing for one. I did but I’m tired of getting beaten up so I don’t make that argument anymore.

But, look. A curriculum-based tests, in other words, if third grade is the year where you’d learn the Vikings, the water cycle, photosynthesis, ancient Greeks, and ten other topics, those are the topics that we’ll call the passages from reading tests on, that would be a very elegant solution, but again, it’s not happening, for obvious reasons.

I’m going to conclude there, because this is the fundamental conundrum. On the one hand I value these tests. If there were no tests, no data, no Susan and Marcus sitting up here telling us what they’ve learned—without these tests, the moral imperative for reform goes away. We go back to neglecting the kind of students I’ve taught my entire career. But again, we can’t blind ourselves, as a reform community, to the damage that these tests are doing. They are, again—at the risk of repeating myself—incentivizing precisely the kind of literacy practices we should be actively disincentivizing. For 10 years now I’ve made fun of the way I was taught to teach reading comprehension to my struggling fifth-graders in the South Bronx, but if you tell me that I have to make a year’s growth in a year’s time, a concept I’m not even sure I understand when it comes to literacy, then I’m going to do exactly the things that I’ve criticized for the last 10 years.

I don’t know how to solve this problem. I’m hoping collectively we can, but it must be solved.

GERARD ROBINSON: I believe you can test knowledge and you can assess learning, but learning and knowledge aren’t the same thing, and I’ll give you an example of how I had this aha moment, working in three different states—California, Virginia, and Florida. In each state, we would brag, and rightfully so, the number of high school students who passed our state exam and who were able to enroll in college. We particularly cheered for those who were first-generation students, those from rural areas, and those who came from very tough situations, independent of ZIP code, race, color, or creed. And the parents were excited, the superintendent was excited, and your state chief had a chance to brag, and these were all great things.

Then they arrived at college. Now, the admissions officers were pretty excited because they met all the right metrics. They had a high school diploma, they had taken all the requisite courses in reading, mathematics, and science, and they scored pretty well on SAT or ACT. Check. And then they arrive in the classroom, and it was talking to professors who said, “Gerard, I know what you’re saying at the high school level, and I know what they’re saying in the Admissions Office, but something isn’t coming through once they reach us in the classroom. It’s not happening, and these are your best and brightest.”

A few years ago there was a realization in Florida. We were spending $185 million a year on remediation—$185 million. Now let’s put this in perspective. These are for people who graduated with a diploma that we said made you college- and-career-ready. Employers is another story. But they said they’re not. And I said, “Well, then, are we doing a disservice to taxpayers, are we lying to families, and are we letting some of our lawmakers off the hook?”

So I ended up putting together a commission, and it was a commission of higher ed leaders who, for the first time, said, “You know what? This is one of the few times I’ve ever been invited to a high school discussion.” I said, “Wait a minute. You’re accepting students and you’re not a part of the conversation?” “No.” So that was part one. Part two, it gave the superintendents an opportunity to speak to their peers across the lines, without competition. Third, for some of the superintendents who had a chance to take a look at our state exam, and many of the superintendents had been in the system for decades, it was the first time they had ever seen the FCAT. Now there are reasons for that. Number one, if anyone would have walked out of that room holding a test in his or her hand, it was a million-dollar fine, so there are factors that go into this.

But there are three things we did. Number one, we looked at international benchmarks. Number two, we looked at college benchmarks, not for how well we were doing in Florida schools, but competing against California and other states. And then third was to identify what did we use at the high school level—ACT, SAT, we even took a look at ways that we could with AP and others. And we crunched all of that and we came up with three numbers—one for reading, one for mathematics, and one for English language arts. And we identified that if you score X in reading or mathematics or English language arts, you can actually enroll into a Florida college without need for remediation.

Now immediately there was excitement from parents because this could either reduce by 1, 2, or 3 years the number of years you’re going to pay for tuition, because as we know, just because you’re in college and you’re enrolled in a college course, it doesn’t mean it’s a college-bearing course, and too many of our school choice kids—charters, private school voucher kids—have gone to college and still are need of remediation.

So for me, walking backwards—I’m an assessments guy; I think assessments matter—in the private school sector I don’t think we should force private schools to take the state test and then use that as a gauge to decide who is good or bad. That’s me speaking. I do think, in the public school sector, there should be a role for that but I’m also for multiple national norms of reference as well. For me, you can test knowledge but you can assess learning, and they’re not the same. Kids today, under I guess the age of 20, learn very differently than most of us over the age of 40, but we can test what they know.

I think we’re doing a great job in certain areas, but I think we need to broaden the conversation to talk to college professors and others who are working with our students. Don’t talk to the admissions advisors. They’re great but they’ve got boxes to check. At the end of the day, once they’re inside, it’s the professors, it’s Jay and others in the room where professors are working with the students.

Assessments matter. We need them in the public sector and we need them in the private sector, and if we’re going to say we want students to be college- and career-ready, let’s take a look at our state standards and see how many students are graduating with a public diploma and going into college with a need for remediation—that’s number one—and number two is the financial fact. How much money are we saddling on the backs of parents and taxpayers for additional 1, 2, or 3 years, because students are taking courses in college that they should have mastered in high school?

With that I will end my part and turn it over to Jay.

JAY GREENE: Thanks. Since you’ve already heard from me I’ll try not to say too much more because I largely agree with what I’ve heard, just from Robert and Gerard. My only difference, really, is Robert says we have a wolf by the ears and we better not let it go, and actually, I think we have a puppy dog by the ears, and if we let it go it will lick our face.

[Laughter.]

JAY GREENE: So let’s let go of testing, would be my argument here, and let me make a few arguments for why the choice movement and the accountability movement should part ways. And in addition to everything that Robert and Gerard just said, is that the testing is actually narrowing the curriculum the most for disadvantaged kids, so the kind of rich content that disadvantaged kids need, they’re the most likely to have that content stripped from them by schools that feel extra pressure to focus narrowly on math and reading test score performance, so they are more likely to eliminate non-tested subjects, to more narrowly teach within those subjects, they’re the most likely to eliminate out-of-school experiences like field trips that might be enriching.

Again, I don’t think the puppy is going to bite us. Let go of the puppy; it will lick our face.

The other thing I can do here is create a small bit of mischief for Tom, and he may not want to, or feel comfortable responding to this. But let me try to illustrate where I think ed reform has taken a very wrong turn on this by describing what I think is the turn that occurred at the Gates Foundation, and I could be wrong because I’m not an insider there and I may not understand what really happened, and Tom may clarify or may not wish to. The Gates Foundation used to be really into small schools of choice, and I loved the Gates Foundation then.

And I think it’s an example of the faux scientific nature of test scores. So the Gates Foundation, and Bill Gates himself, present themselves as people of science, guided by facts, data, evidence, research, and what is shocking is how much that is not the case. So small schools of choice were dropped as a priority for the foundation and emphasis was shifted to measuring effective teachers as the main priority for the education efforts in the foundation. Small schools of choice were dropped before a randomized experiment on its effects produced its results. One was commissioned by MDRC, but small schools of choice were dropped before the results came in. When the results came in they were actually very positive, and very positive on later life outcomes. At first this was a bit of an embarrassment and the Gates Foundation didn’t really embrace these results, but they have come around to embracing them with a bizarro interpretation that this somehow reaffirms, actually, the importance of measuring effective teachers, and I don’t get that. That’s kind of a “mind goes to 11” sort of thing.

But then what happened in the measuring effective teachers, which itself claims to be the ultimate kind of scientific enterprise? They had a huge number of kids taking multiple assessments, with videos of classrooms so that there could be multiple classroom rubric observations, plus student surveys, and they were going to use all of this information to identify the scientifically valid way to evaluate teachers and for teachers to teach. They were going to find the recipe for effective teaching. They were going to figure it out using science.

And what happened? What happened was that nothing correlated with anything. I mean, this was not what was expected. What happened was, in the first round of results that were released, they just released the student survey responses and the test score responses, and Vicki Phillips put out the message that was carried in both the L.A. Times and the New York Times, that they had discovered that teaching to the test and drill-and-kill were bad for test scores, because they asked kids on surveys and they got what they said were measures of drill-and-kill and teaching to the test from that, and then they looked at test scores and they said that it was hurtful.

Now, that was wrong in at least two ways. One was they actually never asked, explicitly, about drill-and-kill and teaching to the test. They asked other questions, like, “Did you prepare for the test?” or something like that, which is not the same thing as drill-and-kill or teach to the test. So they took items that didn’t ask those things and they changed what they claimed they were about. That was one source of distortion.

The second distortion is that the correlation between those items and VAM was actually positive. Now, keep in mind, all of the correlations were between Point 1 and Point 2. Everything was uncorrelated, really low, but they were all positive. So to claim that that item showed that it hurt testing, the correlation would have to at least be negative, which it wasn’t. I’m against drill-and-kill and teaching the test too, so substantively I agree with the conclusion but I just thought it was not an accurate representation of what they found. And when I tried to point this out, none of the researchers involved in the project would correct Vicki Phillips. They wouldn’t do it. So there was an intellectual corruption that occurred here that was very worrisome. This is supposed to be science, and instead what it was was power, and a corruption because of power. As much as I like money, I try very hard not to have it corrupt me, and, of course, there’s always the danger of being corrupted.

And then, in the second round of results that were released they had these three things—classroom observations, VAM, and student surveys, and the idea was that, combined, they were going to discover the right recipe for evaluating teachers, and also unpack the practices that led to better results. And again, those three things really didn’t correlate with each other very well, at all. They just don’t correlate. And if you combine them you don’t really improve the predictive power of using just the VAM alone. You make it slightly more stable but you don’t actually improve the predictive power. And this is after going through extraordinary lengths of training coders of the classroom observations, and rejecting coders who were unreliable, and lots of things that wouldn’t actually occur in real classrooms with real schools. But they go to these incredible lengths and they still can’t get anything to correlate and they can’t improve predictive power, and then they say, “We declare victory. We have scientifically proven that the correct way to evaluate teachers is one-third, one-third, one-third.” The research did not find anything resembling that. It didn’t prove that, but they said it did.

So the real danger of this reliance on test scores, in addition to how it distorts classroom practice, is that it actually distorts the policy intellectual community. We’re not really using this for science. We’re using it as a club to beat people into complying with our policy preferences, even when we don’t actually have the evidence to prove it, and that worries me. Rather than empowering the scientists, I’d rather empower parents, and let the puppy go.

GERARD ROBINSON: Before I turn it over to Tom and Robert, I want to throw out just a couple of observations and have Jay and/or Robert weigh in.

GERARD ROBINSON: Back in the late 19th century, you had Frederick Taylor and a whole big push of scientific management, and what role that would play in making us smart people, to determine what we could do about human nature, and there were some successes and there were some challenge. We’ve not fast-forwarded. We’re in 2016. There is a big push for scientific management under a different term—maybe it’s scientific reform. What can we do as intellectuals, as those interested in assessment? I think, again, assessments matter. What can we do to get the philanthropic community more in line with how to fund what works as it relates to assessment of science?

JAY GREENE: What can we do? I have never figured out how to convince donors of anything, really.

I mean, I’ve always been of the belief don’t fix old, build new, and so if our donors seem to have wrong priorities, I’d rather try to find new donors. I think that’s important for us to do, as a movement, is to actually bring in some new blood, and that will actually reduce group-think and allow for better critical thinking within the movement about itself, and probably lead to some positive developments.

ROBERT PONDISCIO: I’m going to do an end run around that question. Two things. One I think I hinted at already, maybe, and this is not an answer to the question but it’s a policy fix that I think it’s time to think about seriously, that we need to uncouple testing from accountability. Your puppy image notwithstanding, I accept it, but I don’t ever want to see us get to a point, as educators, where we just walk away from testing. There is just too much value to it. The power of sunshine, as it were, that comes from testing, is indispensable to what we do. Now once you use those mechanisms for accountability, for holding teachers and schools accountable, it just changes the function, and I think it’s time to just be candid about that and just separate those two functions.

My fix for the philanthropic community, total wildcard here—and again, to a hammer everything is a nail, and I’m a curriculum guy—why are we so incurious in K-12 education about what kids do all day? It matters. There is good, tantalizing evidence to suggest that curriculum effects are real. If I’m a philanthropist, maybe I want to spend some time studying that. The best effects that we have, I believe, are in math. We’re starting to kind of tiptoe into this. Frankly it’s not that convincing because a lot of it is more about evaluating curriculum in terms of alignment to standards, which is not without value but it’s less valuable than does it move the needle for kids. I would like to be able to stop talking about this 10 years from now. Can we please start to measure curriculum effects in classroom and say good, better, best?

GERARD ROBINSON So we’ll turn it over to Tom and then we’ll move over to Robert, or would you like to let Robert go first? Okay.

ROBERT JACKSON: Great Hearts Academies’ focus on cultural literacy from within a longstanding tradition, goes to the heart of this question of knowledge. It’s interesting as I hear the presenters, and certainly the researchers with whom I’m only familiar, really, in their writings and their scholarship, is [that] I hear them talk about attempting to lay hold of and to understand how we distinguish good schools from bad schools, good policies from bad policies. I’m reminded of the ways, in a very practical sense, our schools are affected by the apparatus of the state, specifically how assessments, be they local or regional, or in the case of national assessments how we are both beholding to the constituency of parents, which clearly are, we said earlier, voting with their feet, but also then those politicos who are looking at us closely and saying, “Can you show us the goods? Can you demonstrate that your schools, particularly those in the middle-income bracket,” where we have a majority of middle-income students, “are in fact doing what they’re supposed to be doing?” because we’ve seen the movement of choice and charter schools really focus on, and I think rightly focus on, those underserved populations.

So Great Hearts is often smeared with a critique that they’re middle-income schools, are in some way deficient, or not really holding up the moral high ground, to which I would say, given the lottery and the arrangements of the states, in fact, we are trying to reach every student who is available to us or accessible to us, and we want to make that happen. We’re being more intentional about that. But this tradition itself holds that Western values, Western ideals, self-criticism, reflection is, in fact, something that we want to bring to the largest possible population within our states, the states we serve.

So when I hear the research discussion around how to distinguish good schools from bad, I want to know if assessments can take hold of or lay hold of the kind of work that we do, where we encourage seminar activity, particularly in our high schools, around great works, conversations where Plato’s dialogues challenge the nature of justice in the Euthyphro, whether virtue is something that can be taught in the Meno, whether rhetoric and, dare we say, philosophy and certainly political life can, in fact, be addressed in the Gorgias, and so forth.

These are questions that our students wrestle with, and they do so around a table, a seminar table, such as you might find at one of these great books schools, and they are concerned with what this means for themselves, for their communities, and for the American society. Is that the kind of thing we can tap into? Can I invite researchers—I think Jay would probably be interested in this—but can we find metrics that actually lay hold of that?

Can we—and this is very important to me—can we, as we approach this sort of obsessed age, if I may, can we talk about science and mathematics? Can we talk about the nature of mathematical proof and how important it is to have students at a board actually describing how they came to a particular solution, an understanding, is much deeper than the standardized test can necessarily probe. Euclid shows up in our classrooms, not that we remain there, but Euclid shows up as a primary text, which equips our students to think in that fashion.

So I’m really interested, and I guess I’d like to hear my interlocutors here address whether assessments could be more broadly defined to consider knowledge as a tree, a tree whose branches and leaves and bark can be assessed with the kind of tests we have today, at that sort of microscopic or at the micro level, but whose fullness can only be found in the roots and the trunk and the fullness of that knowledge that is from a larger tradition, that’s interconnected.

When we teach the liberal arts, I’m sure some of you have run into this work by Brown, Roediger, and McDaniel, on making it stick. When we talk about some of the latest cognitive science that is addressing how students really know something, it focuses on this kind of relationship between things, the prior knowledge and the way things connect. Well, that is, in fact, the very heart of a liberal arts education, that mathematics and Euclidian proof informs a kind of logical, rational understanding, that will inform the way they approach a text on political philosophy, or read the rhetoric in a particular argument. That itself will be informed by perhaps appreciation of the arts, music, fine arts, that are expressions of culture. They are cultural artifacts.

Can we see this interconnectedness and can we demonstrate that that knowledge is, in fact, the end we perceive, we want our students to perceive? If so, I’m all for assessments. I don’t want to speak to the accountability stuff. I’m way out of my depth in terms of policy, and I’m not by any stretch a policy guy. But it’s interesting to me, it’s informative, and I need to know, and I would invite your thoughts, and my interlocutors here, to help me understand how assessments could be more broadly defined to capture the whole and not just the parts.

TOM VANDER ARK: This business of raising and developing human beings is complicated. I have three points about that that I’d like to make, and at Jay’s invitation I will start with my awesome 1999 ethnographic adventure. Bill Gates invited me, in June of ’99, right after they invited me to join what was then the Gates Library Foundation. He said, “Why don’t you travel for 6 months and then come back and tell us what to do?” So this is like the most awesome sabbatical that anybody could have, because there the potential for a really big checkbook in my back pocket and a lot of plane tickets.

I came back and told Bill and Melinda that I had found something that I didn’t know existed, that there was this world of schools that appeared to be changing young people’s lives, that it was engaging them in these really profound ways, in challenging work in powerful communities. I was a nontraditional superintendent, and hadn’t visited that many schools, and so this was really eye-opening for me. I told them that there are these places that—this is quite qualitative and ethnographic; it was observational, but I said, “I’m pretty sure these places are changing lives, life trajectories, and I think we should do more of that.”

Fast forward 30 years later. What I continue to learn, as now a grandparent, is that Bill Gates was one of the few autodidacts in history, and he would go away for a weekend with a box of books and would come back an expert in the subject. The Internet has made knowledge much more accessible but there’s not many people like Bill Gates. Most human beings are activated in relationship and learn and develop in community. We haven’t talked much about that today, but those things continue to be things that I seem to relearn.

The second point that I want to make is let’s acknowledge that we’ve created a summative monster in America. It really is completely out of control. It’s ironic that those of us that were interested in better assessment encouraged the addition of writing, then the test got a week long, and it’s part of why America called bullshit on summative in 2014.

What’s now ironic about a week-long summative assessment is that when you visit good schools—and I’ve probably visited more schools than anybody in America now—when you visit good schools they know exactly how Johnny is doing every day, in every subject. They’ve all cobbled together unique patterns of combining very complex formative data, both qualitative and quantitative, and they can give you a very good answer, a very thick answer to how is Johnny doing. They can tell you proficiency levels, they can tell you growth rates. More importantly, and another thing we haven’t discussed at all today, is they can tell you who Johnny is becoming as a human being.

One of my favorite projects this year has been taking people on school tours. A couple of foundations have asked us to do this, so I often bring people to Denver, a rich market of school choice, and all of the schools in Denver provide daily feedback on character strengths, on mindset, on self-regulation, on courage, integrity, collaboration. If you go to DSST, a school that one of our grants helped to start, every week every faculty member and every student is getting qualitative feedback on who they’re becoming as a human being. You go down the street to the Beacon Network and they get feedback in every class, every day, on who those young people are becoming as human beings. If you go to a new school incubated by the district, DSISD, those kids are getting thick, qualitative feedback on who they’re becoming as human beings. I’m more interested in those assessments than I am in summative test scores.

So here’s the good news, bad news. Aspen and Castle just launched a task force on social, emotional, and academic development. They’re going to spend 2 years trying to help us develop a common lexicon for what to call this stuff. They’re also going to help develop a set of measures for these things. Those of us here have to be cautious not to repeat NCLB in SEL fashion of too quickly stuffing immature measures on social-emotional learning into bad accountability systems and doing a full repeat of NCLB. On the other hand, I know these factors are important.

Listen to my podcast that I just did with Roger Weissberg. It came out about 2 weeks ago. Roger said, “Let’s stick to classroom conversations about social-emotional development and mindset. Let’s equip teachers to have those conversations with parents and kids.” That feels like the most important assessment conversation in America to me right now.

The third point that I want to make is this conversation about assessment has to start with what do we think graduates should know and be able to do. Tony Wagner taught me this as a brand new superintendent. He said, “There’s three questions you should engage your community in: how has the world changed; as a result, what should kids know and be able to do; and what learning experience will cultivate the knowledge, skills, and disposition that will result in the graduates that we want?” Tony and I have now done this in hundreds of communities across the country and helped them develop a new graduate profile that does include the knowledge that the Roberts are most interested in, but it also includes the skills that I’m most interested in. I’m the skills guy on this panel. I’m an engineer and an MBA by training, so I’m sort of less interested in the science and math and I’m more interested in the attack skills that come with engineering.

When I think about the answer to what should graduates know and be able to do, what I think about are the novelty and complexity that are going to characterize young people’s lives. This year I’ve been studying AI and machine intelligence. I’m pretty sure it’s the most important thing that’s happening in terms of factors that will change the lives and livelihoods of young people that we care about. I’ve written about 20 blogs on it, and my blog on Getting Smart this morning is on 13 onramps to novelty and complexity.

So what I’m most interested in assessing are the attack skills—how do young people approach novelty and complexity? What’s their mindset? What skills do they use? What’s the nature of their pattern recognition? Now, pattern recognition is entirely a function of knowledge, but it’s also having been introduced to novelty and complexity through great books. They’ve developed the ability to be immersed in a new situation and to be able to find their way out with a sense of meaning. So I’m interested in what are their attack skills for novelty and complexity. I’m interested in their ability to sprint through a valuable deliverable with a diverse team—so that’s project management in a project-based world. I’m interested in their ability to present and reflect on what they’ve learned.

You can tell I’m interested in performance assessment. I believe those are the most valuable ways that we can draw inference about knowledge and skill and disposition. I think they’re the most valuable ways that young people can reflect on their own knowledge and skill and disposition. The good news, like project-based learning, is that performance assessment is really easy to do. The bad news is it’s really hard to do them well. We all know of a few examples of school networks that launched performance assessments as summative. We know about states that have tried portfolio assessment. Very well-intentioned efforts that have failed quite badly because it’s so hard to do these things well.

So I’m left with the, in some ways, unenviable conclusion that I agree with Jay that I’m not currently smart enough to know how to build the accountability system that will do this really, really well. So I think as we close out this period of standards-based reform, the last 30 years that Jeanne talked about in the opening, I leave that period of time with a lot of humility, and caution, and like the other members of the panel here, I know a good school when I see one, when I visit one, and I’m all in favor of doing more of those and less of bad schools, but I think we all agree that our efforts to scale quality and to prevent mindless schools has been fraught with difficulty.

GERARD ROBINSON: I just have one broad question. We’re the third session. In the first two sessions parents were mentioned as part of the conversation. We’re now looking at the assessment factor. Is there a political, or even an academic benefit to making parents even better consumers of understanding assessment and knowledge in ways that we have not put past in years, and how do we do that?

TOM VANDER ARK: I want to talk about that quickly. What we’re just about on the verge of is, what they didn’t talk about is—the good news is assessment is ubiquitous. Now every kid in America, almost, is connected, all day long, to multiple devices, and they’re getting five forms of math feedback every week. They’re in an adaptive software at school, they’re doing a performance assessment from a teacher, they go home and they do Khan Academy. So they’re getting lots of math feedback. What hasn’t been possible today is that that’s not auto-magically combined into a great book and surfaced through data visualization that says “here is how you’re doing.” It’s not really a technology problem because it’s not that difficult. It’s been a knowledge problem. The 300 people that Jay made fun of can’t agree on quite how to do that, and the vendors that built these walled gardens haven’t agree on how to do that.

But the good news is that last Monday and Tuesday there was a big data summit of the best CMOs in the country, and they’re going to work hard to solve this problem. So I think we’re 2 years away from having really sophisticated tools that surface lots of data, not just on reading, writing, and math but also on character strengths and dispositions, to parents, teachers, and kids, to enrich the conversation about how’s Johnny doing. Number two, I think the way to manage those big learner profiles is parent-managed profiles, so that parents have some ability to manage the information in, and selectively can decide who to share those learner profiles with—an after-school provider, a teacher at the next school, SAT tutor. And so like their Facebook profile, they will be able to manage a learner profile and decide with whom they share how much data. So I think that’s the future of assessment is parent-managed learner profiles.

ROBERT PONDISCIO: If you want to see Jay’s kind of perfect paradigm in action today, go on UrbanBaby.com sometime and, at least in New York City, where I live, read parent comments when they are dissecting, like Talmudic scholars, the points of differentiation between New York City’s elite private schools. Here is something you will never hear them discuss—test scores and data. They talk about sports teams, arts programs, who is progressive, who is the academic hothouse. This is what educated consumers who are deeply attuned to what schools look and feel like, sound like. They don’t sound like us.

So, yeah, more data is great, more parent education is great, and at some point maybe you grow your way out of the data problem. But my point is that well-informed consumers aren’t talking about this stuff at all.

GERARD ROBINSON: Robert, I have to assume that your parents are pretty good consumers of knowledge. Some would say that there is a built-in selection bias, that you would get the “best prepared” parents. Even if that’s the case, because I don’t think it’s a bad thing, having great, well-prepared parents, how do you make, or even empower parents in your network to be consumers of how you define assessment?

ROBERT JACKSON: One of the things I didn’t touch on, but which I am sure you are familiar with is that within the tradition, within a liberal arts understanding, and certainly within grammar school education, [that] the training of a young person involves not just intellectual but moral excellence. The traditional grammar school understood that the ethos of a school is as important, or at least right up there with the quality of its academic offering.

I was delighted to hear you say that we need to not only inform parents and bring them into this conversation—that would have been obviously—but we need to do so in relation to their moral formation. We, in our evaluations of students, have, since the inception of Great Hearts, spelled out, sometimes to the teacher’s chagrin, in great detail this moral formative aspect. We talk about, in the tradition of the Greeks, aporia, and this idea of wonder, and to what extent the child continues to delight in seeking the truth, in seeking understanding, to what extent the child is developing in his or her mastery of a subject area, but also, as a member of a community.

So by communicating to them in these rather extensive evaluative prose segments where a child is at, in language they can understand, parents are brought into and encouraged to really come alongside of the teacher—because the parent is the first teacher, the primary, the most formative teacher—how do we cooperate with them? We’ve tried to make it clear that our work, in the school day, is in some sense ancillary and is supportive of, and we need their participation in the community if they are, in fact, to see their child grow and mature.

I think the other things that we have attempted to do, and need to continue to do, are bringing parents into a greater understanding. Many of them don’t have a liberal arts background, didn’t have, necessarily, an education within the great books, and we want to make these accessible. We don’t want parents feeling as though this is some oddity or eccentricity. It is, in some sense, in terms of the mainstream, but we want to show them that this is, in fact, accessible to all, and that’s the intent. We’re fond of citing Mortimer Adler, who would say the best education for the best, in the old, aristocratic model, is, in fact, the best education for all. This is a truly democratic offering and we want them to be able to access it, little by little. It’s kind of continuing ed, if you will, an adult educational experience for some of our parents.

I think we could do more on this score. People’s lives are full, they have busy schedules, but we want them to know this is something that is for their children but it’s for their families.

ATTENDEE: So I want to just throw out the possibility that the parents really are not against testing. What they’re against is the way government or policy-driven accountability systems use those tests, rather than the parents using them as one piece of information along with all the other pieces of information that they have on their children, in making decisions about the school.

TOM VANDER ARK: I think most of us would agree that, as has been said a lot of times, we’ve created this mania around bad, narrow measures, and that’s led to all sorts of unintended consequences. And I think everybody up here is in favor of a system of accountability. I would just like it to be smarter. I appreciate Jay’s instincts that parent choice is a good enough system of accountability. I would add at least a sort of minimum threshold, but on a broader dashboard of indicators that would include reading, writing, math, some content knowledge, but graduation rates and post-secondary, some measures of life success but safety and well-being, and a broader dashboard of quality indicators.

ROBERT PONDISCIO: At the risk of repeating myself, I can live with any accountability system whatsoever as long as it does not actively encourage bad classroom practice, period, full stop.

ATTENDEE: My name is Inez and I’m from ALEC. I just want to get, it’s kind of on similar lines, I think, but a specific piece of the, like, accountabilities, like accountability from testing, or I would say actually standardization from testing. Both Mr. Ark and Mr. Jackson, you have both alluded to a deeper idea of what education is. You don’t necessarily agree on what that is, but you’ve both alluded to something. One is character development or understanding of the knowledge and the skills to be able to process that knowledge that’s sort of built Western civilization. These are things that are, by nature, really difficult to test in a standardized way. So there’s a reason that when you take a philosophy course in college there is almost never a standardized bubble portion of it. It’s either essay or its long-form answer, and there’s just very little way to process that level of depth of answers on more than a classroom scale.

So given that in mind, my question then is does assessment of these kinds of aspects of education demand a certain kind of decentralization and un-standardization of assessments, or de-standardization? Are we thinking too much about ourselves, as researchers, having an easy metric that compares a very large group of people, when, in reality, the more localized results may be much more important and may be actually closer to what parents are looking at when they’re looking at choosing between different schools?

JAY GREENE: But, yes, I think testing is different from accountability. We’re probably not going to let go of testing. But we never really have had accountability and we probably won’t. And the notion, by the way, that you will never be able to get away from accountability with public dollars also ignores that for almost all of the history of education there have been public dollars without any accountability to a central authority. So what people really rebelled against in 2014, when it reached its critical mass, was rule by distant technocrat. There was always lots of testing and information and forms of local accountability, but by 2014, it became clear there was going to be rule by distant technocrat, using that information, and that it would affect high-advantaged schools. And as long as it was targeted as high-disadvantaged schools, yeah, whatever, but when the soccer moms discovered that they wouldn’t be teaching Romeo and Juliet in ninth grade because they were going to an information text, she flipped her lid, and that’s when Common Core died.

So I think that we’re going to continue to have testing, but we’ve never really had accountability and that movement is probably already fading.

TOM VANDER ARK: We haven’t talked about the shift to credentialing and competency, but this is a mega-trend happening around the world that could be a positive answer to this question. If we help young people curate a profile of microcredentials that signaled the knowledge, skills, and dispositions that they had developed, and then helped them prepare for careers that may require specific areas of credentialing, that would be a more thoughtful way to connect young people into particular pathways. One of those pathways could be I got a badge in philosophy and here’s how I demonstrated it, and you click on it and it opens a dissertation and an artifact that went with it. So that’s sort of a thick transcript that’s connected to artifacts that signal experiences and knowledge, I think would do a better job than a big standardized test.

ROBERT PONDISCIO: Could I just weigh in quickly? You can measure knowledge. Skills deeply mislead us. All the stuff that I said about reading comprehension, I like to say that it is not a skill you teach. It’s a condition you create, and you create it with rich knowledge and vocabulary.

Let’s not gull ourselves into thinking that we are going to teach and assess critical thinking, problem-solving, et cetera. Those are just like reading comprehension. They are not transferrable skills. They are domain-specific. There was a paper written, and I can’t remember the author, some years ago, with the title, “Could Stephen Spielberg Manage the Yankees?” and the answer is no, because yes, he is a creative genius but he is a creative genius filmmaker. There is no reason to suspect he would be successful as a baseball manager. Those skills don’t transfer.

And lastly, on the idea of assessing dispositions, no. No. I just don’t want to ever see us get into the business of—this is Jay Greene’s nightmare, I’m guessing. Like we’re going to have a government agency assess our children’s dispositions? Thank you, no.

ATTENDEE: Quick, I kind of want to point out, Robert, your case that we should have testing but not use them for accountability purposes. As a practical matter, it is not going to happen, but even if we get to the choice model that I think all of us want, where everyone goes to school where they want, that matter is not going to happen also because the way people are going to choose is, in part, based on their test scores. So that’s always going to be used as part of that kind of accountability, as far as that’s concerned.

But I do want to bring back kind of how you started your conversation, in a way that I think has gotten lost with this. I have a lot of sympathies with the kind of overall what we want out of kids and it’s not all measured by tests, and you know more about whether the tests are misleading about what they’re measuring or not. But I think some of what gets lost in this conversation is the reason that we have testing in the first place is because we have a lot of really bad schools that aren’t doing anything.

The reason I’m not going to care about the test is because my kid is going to do just fine, and I know my kid is going to do just fine because we do a lot with them, and we purposely moved to a place that has really high-quality schools. And the problem is that there are lots of people who don’t get to make that choice—we have the goal that we want them all to have that but we’re not close to that yet—and on top of that, if we go back to a system where we don’t have summative measures of how things are going, and we trust that we know what a good school is, do we trust the de Blasio administration to walk around and tell us which ones are the good schools and who is doing good? Like do we trust?

I think what gets lost is accountability reform is not really meant for the top end. It’s meant for the bottom end. And I’m not convinced that that is no longer a problem, and I don’t think it’s entirely satisfactory to say that choice is going to solve this because we’re not close to that either.

ROBERT PONDISCIO: So, Jay, what he’s saying is it is a wolf. It’s not a puppy dog.

JAY GREENE: Yeah, but isn’t accountability really double secret probation? I mean, for a little while we were fooled into thinking that something might happen to someone, and then two things happened. One, it fades, and then the other is they gained control of the machinery so that they won’t squeeze the vice on themselves. So when you say de Blasio, the whole point is you build a system of centralized control and eventually there will be a de Blasio. He will be in charge of it, not Bloomberg. That’s the pharaoh who doesn’t know Joseph.

So I agree with you that we have to be concerned about improving things for kids who are the most disadvantaged in schools that are awful, and I agree with you that choice is not going to fix that tomorrow or next year or in 5 years. But I don’t think that accountability testing has or will do that either, and, in fact, I think it can make things worse by putting perverse pressures on schools to narrow their curriculum and actually hurt their students. So they’ll satisfy the metrics but they won’t actually help their kids. They’ll actually hurt their kids. That’s my worry.

GERARD ROBINSON: Here’s an example of the politics of assessment. In one state, if a charter school earned an F in year one, the authorizer could close the school. So you’ve built in an incentive that going to serve some of the hardest-to-serve kids, there’s no incentive to do so. And so one thing we did, without changing legislation, is we changed the rule, and we basically said if you receive an F in year one, okay. F in year two, not so good. But in the third year, if you still have an F, there are some actual accountability factors and sanctions we can put in.

But it simply took looking. We have an assessment but there’s a politics dealing with accountability. We changed the metric and in some ways helped not save schools but at least changed the narrative for lawmakers who said, “We need swift action.” But you don’t get swift action for the traditional schools that get F’s every year.

ADAM PESHEK: Adam Peshek with Excellence in Education. We’ve been talking about how we don’t like the current system but we don’t know what the new system should be, and I’ve talked about this before so some of you have probably heard this. I think we have almost all that we need, in certain ways, because I think we’re talking about two different things. We’re talking about accountability to parents as one, or accountability to the people that are overseeing the schools. Those are two different things. If we think that parents know best, and that’s kind of our guiding principle, and we think that if we empower parents with information that they’re going to vote with their feet, then why aren’t we doing a better job of capturing what parents know?

An example of this is in the Departments of Health. I lived in D.C. for 5 years and never once did I go to the city Department of Health’s rating system to see if it had an A, B, whatever the rating is. I go to Yelp once a week. I go to Yelp all the time, because I intuitively know that a 5-star or 4-star Yelp-rated restaurant probably doesn’t have mice in it, probably isn’t giving people food poisoning, it probably isn’t going to be something where I’m going to put myself at risk. But I think the way that we do education policy now, there’s a little bit of hubris in it, that we can rank-order schools from 1 to 10, on these objective metrics, when, in reality, that’s like saying to an automaker, you are evaluated on your miles per gallon. Now with that evaluation you’re probably going to turn out some cars with much better miles per gallon but you’re probably going to lose a lot of the other innovations that are coming out too, you know, side mirrors that can detect cars coming by, all the other things that we take for granted.

So my question is, isn’t it a light regulatory footprint and a kind of nudging of parents to require parents to rate their satisfaction with their provider, especially in a school system of choice, an ESA program, or a voucher program where parents are voluntarily going to a school, saying that’s part of that accountability feature, at the end of the school year, after you make the payment, whatever it is, you rate on 1 through 5. You can give an even more detailed rating if you want to, but at least a 1 through 5, so we’re capturing what parents know to drive decisions, because how we do it now is parents know best, if they find out it’s a crummy school they leave and no one else knows. Right?

So how can we use rating systems—and I think it probably requires more than just a Great Schools model. I like Great Schools. But I think you need some government intervention to kind of incentivize or get parents to want to participate in this, so that we’re capturing all experiences and not just the extremes.

And another thing, with Yelp, they’re putting now, in cities, they’re putting Department of Health ratings on Yelp, because no one goes to the Department of Health. They realize, they’re smart enough to know that patrons are going to Yelp so they’re putting that data on top of it, where it’s actually going to be seen. So why couldn’t this be the system?

GERARD ROBINSON: Who wants to take it first?

ROBERT PONDISCIO: New York City does school climate surveys. I don’t have a lot of experience with them, and I’m not sure that you need every parent. If we can know who’s going to win for President next Tuesday by surveying 1,000 people, do we need everybody in the school? I’m not sure what other cities do, but I know what New York City does. And what I don’t know is what effect that has, though, which is a better question.

ROBERT JACKSON: It’s interesting. We’re metaphor-rich up here, between wolves and puppies and pharaohs, but I’ve always preferred this one of the Department of Health coming in and saying, “At least you won’t leave with trichinosis if you go to this restaurant.” But we don’t know about the food preparation, the quality relative to, you know, is it is organic and so forth, until we get to know that restaurant. And we have begun, as a network, really surveying parents extensively. It’s one of the performance indicators that we take note of, because we value that. I just don’t know how that would look, again, on a policy level, trying to bring it together, except if Tom’s idea of something like a profile emerged that included parents’ response to and experience with the school.

I think they’re very sympathetic, by the way, not just us to but to our schools, generally. I think as Americans, as parents, we love our schools. They’ve got our most treasured possession, so I think we give them a lot of latitude. That is to say if parents start to gripe, that’s a pretty good indication something may be amiss. There may be some value to that. I just don’t know how it would be put into play.

GERARD ROBINSON: In the D.C. voucher model there is actually a built-in component to evaluate parent satisfaction, and a couple of Jay’s colleagues, Tom Stewart and Patrick Wolf, have written a book about that, and there are other states who use tax credit scholarships who have built in parent satisfaction. So very good point.

MICHAEL HORN: What I’d love to know is, Robert, thinking about the curriculum-to-assessment connection, I think I understand what you’re saying but I’d love you to dig in just a little bit more deeply. The analogy I’ve always loved—put aside wolves and puppy dogs—is that if you went back in health care to the advent of looking at blood pressure in patients as a measure of health, originally doctors’ reactions to this measure was to leech patients. You took blood out and it didn’t actually help their health but it sure helped that indicator.

What I hear in your comments around the testing is that actually narrowing the curriculum and drilling on skills in ELA doesn’t help.

ROBERT PONDISCIO: It makes it worse

MICHAEL HORN: It makes it worse. So I don’t understand why the problem is with the test rather than our reaction to the test.

ROBERT PONDISCIO: Here’s why. Suppose I have a curriculum, which I don’t, but let’s, for the sake of argument, say that fifth grade is the year that I teach the water cycle, the Vikings, ancient Egypt, New Amsterdam, et cetera, and then my kids sit down for their test and the reading passages are about the 1980 hockey team, BMX bicycle racing, the Boston Red Sox, et cetera. To a degree to which American education dimly understands, reading comprehension is not a skill. It is really a function of your background knowledge about those topics, so that result will be completely misleading. It wouldn’t indicate anything about me as a teacher. This is why reading tests are socioeconomic indicators, because the kids who walk in the door with the most prior knowledge about BMX bicycle racing and whatnot do well. So they are not instructionally sensitive. But to make them instructionally sensitive would require having a curriculum, and we can’t do that.

MICHAEL HORN: Tom, I’d just love to hear you go on if we went to a world of badging and competencies, at micro levels that were based on knowledge and then skills within those knowledge domains, how that might change that picture.

TOM VANDER ARK: The good news is that assessment is moving into the background and it’s becoming embedded into powerful learning experiences. So I think this is a place Robert and I would agree, that a microcredential would help when the assessment is an integral part of a learning experience or a learning sequence.

ROBERT PONDISCIO: Sure.

TOM VANDER ARK: Right? Then it’s valid, it’s reliable. So I think that’s the way the world is headed. We both agree on the benefits of competency and I think that will mean multiple forms of assessment being surfaced and integrated in useful ways, as part of a learning sequence.

ROBERT PONDISCIO: By the way, quickly, in a computer-adaptive environment, another topic I know nothing about—it doesn’t mean I don’t have an opinion, though—I can see a way towards having like PARCC tests where, at the district or school level, you could actually control for background knowledge, say, “Hey, this is the year we teach these subjects,” and then you have reading passages that are about that subject. That could work.

TOM VANDER ARK: Here’s my point. When you visit a good school, they know everything about kids, and the notion of taking a week off to do a test is just crazy. The amount of writing that kids are doing at the school you taught in, the school you lead, tells us plenty about how they’re writing. We should just have background ways of collecting and surfacing the information that they’ve gathered about that student writing so that we don’t have to take time off and create an artificial testing environment. I think we’re close to developing ways that would combine data, from your environment and your environment, and do it with good enough comparability that we don’t have to create an artificial testing environment. I think we’ll be able to construct that.

Even if that’s challenging, now with this sort of post hoc data barrage that you can do when you take—we may not have as much data about his kids, because they’re not online as much as I’d like them to be—but kids that are online every day, all day, it’s quite easy to take their data set and compare it with another kid’s data set, and make adequate inferences about quality, and I think that is part of how accountability is going to work in the future.

GERARD ROBINSON: I’ll have both of you ask your questions and then we’ll close it out.

BOB BELLAFIORE: Bob Bellafiore from Stanhope Partners in New York. I have worked with authorizers trying to close schools and schools trying to deal with the threat of closure.

So let me ask you this. Jay, you said the accountability movement is dying, or may be dead. Tom, you said we need more different kind of accountability measurements. So let me ask you this and introduce a new non-animal metaphor. What goes on the pallet that authorizers work off of and that schools can recognize so they know what the rules are—because this is a great theoretical discussion and the most often-heard word today is “should,” as my pastor says, to suggest is to volunteer. So what does the pallet look like in a practical world, for an authorizer and a school who are just trying to do the right thing, the right way?

ATTENDEE: My question is to Tom. I’m really struck by what you said about schools knowing what’s going on with their kids, and I have two questions about that. One is, how much subjective stuff would you consider in that, and two is, how wide a frame do I need to go? Do I need to go just with parents and the school itself, or do I need to go to the community, or do I need to go to the national level? Where does that step?

TOM VANDER ARK: On the question on authorizing, we ought to have four or five, maybe six different lanes of authorizing. One ought to be a fast path. If you run good schools, you ought to get a free pass to open as many schools as you want. If you want to open an experimental school, there ought to be a sheltered form of short-cycle evaluation that might go from a summer school trial that turns into a pop-up micro-school that turns into a larger school, subject to observational and testing.

I wasted a couple million dollars trying to open schools in New York and New Jersey when it was illegal to do anything different, and then it became illegal for anybody to do a for-profit. So we ought to have an innovative schools authorizing system that encourages, doesn’t make illegal, efforts to try things differently.

On the second question, I’d like a really rich learner profile that includes qualitative and quantitative feedback. As much of that as possible ought to be against a rubric that makes it at least judgments against an agreed-upon set of desired outcomes, but that would certainly include subjective information.

JAY GREENE: I think you’re looking for the answer in the wrong place. I’d be worried about a school that was focused on trying to please its authorizer and comply with what its authorizer wants. That is, I think, the wrong way to go about it. I think the right way to go about it is to have a vision for a school, execute on that vision, do a good job, and trust that it will kind of work out. It’s a little bit like my advice to my students who go off into tenure-track jobs and I advise them, don’t work for tenure where you are; work for tenure. So if you’re a good school and you’re executing on your vision, then someone will authorize you and you will be fine. Well, look, and if it’s the case that authorizers are arbitrarily closing good schools and they can’t get authorized by someone else, then we should get rid of authorizers, because that’s stupid.

GERARD ROBINSON: Robert, do you want to weigh in on anything?

ROBERT JACKSON: I was curious whether anything like the accrediting model that we find in higher education, with its standards, broadly defined to represent best in class or guild activity, could, in some way, provide some of this accountability around, again, broadly open rubric that takes into accounts a school’s financial profile, a school’s culture, the more subjective aspects, a school’s curriculum and coherence thereof. Would that be the kind of thing, again, assuming we had a guild of some weight that could weigh in, that that might be an analog? I don’t know if there is anything quite like it in this space as of yet.

GERARD ROBINSON: We’ll end up with our friend to the left.

ROBERT PONDISCIO: I’m ending where I began. I’ve got that wolf by the ears again. Jay, I’m with you again on 99.9 percent of what you’re saying until you get to that line about you’ve got to trust that it’s going to work out. We have too long of a history—Marcus said it quite well—of schools that have not worked out for the kids who can least afford schools that don’t work out. For most of us it’s going to be fine, but for those who can least afford it, it’s not going to be fine, and I don’t have an answer.

GERARD ROBINSON: And as you’ve seen, we’ve had very diverse ideas on what assessment, knowledge, and learning looks like. The thing I’d like to leave us with is this: when we talk about students and adults, we talk about human beings, when, in fact, what we are are humans, being. And once we realize the action point, we’ll begin to look at ourselves very differently and assess ourselves very differently.

EdReform: RevivedThursday, November 3, 2016The Mayflower HotelWashington, D.C.

How We Connect

Leveraging our deep, diverse network to forge powerful alliances that accelerate innovation & opportunity

How We Engage

Activating stakeholders with physical & virtual interactions around pivotal issues, trends & opportunities

How We Inform

Providing essential news, facts, data & analysis about the state of education in the U.S.

How We Influence

Advancing & protecting education reforms that prioritize the needs of learners & families

EdReform: Revived
Thursday, November 3, 2016
The Mayflower Hotel
Washington, D.C.