|
CER Note: This document was written by Bob Embry, former head of the Maryland State Board of Education, to Nancy Grasmick, State Superintendent, in December 1996, and has just been obtained by the Center for Education Reform. MEMORANDUM TO: Dr. Nancy Grasmick FM: Robert C. Embry, Jr. DATE: December 24, 1996 RE: Maryland School Performance Assessment Program I am proud of the fact that during my years on the State Board Maryland led the country by enacting one of the first state-wide assessments of school performance. With your arrival as State Superintendent, I have also been gratified by the State’s campaign to persuade school systems of the importance of meeting the State’s standard of excellence. The Board never anticipated its overwhelming acceptance by local school systems. Their commitment to this test is proof that local school districts are indeed willing to be held to strong standards and is a credit to your leadership. The test is now five-plus years old and you are well aware that my pride has been gradually tempered by serious reservations. Given that I was one of those who approved the MSPAP, I am embarrassed to say that my knowledge of this test was not what it should have been when it was designed. What began as nagging doubts about only certain aspects of this test has expanded considerably, the more I have learned. And because I had pushed so strongly for its implementation, I have been publicly, even privately, loathe to admit to its shortcomings. It was not until we directly observed the MSPAP administered in Maryland classrooms that I realized last spring how fundamentally serious some of MSPAP’s deficiencies were. I have struggled long and hard with the need to handle this matter sensitively and fairly since we first approached you last February, offering to fund an expert review of the MSPAP. I realize that even though the Board established the MSPAP before your arrival, you will be the one person held most accountable for its content. To the extent possible, I wish I could avoid placing this burden on you. I also know that your commitment to Maryland children will, as you consistently demonstrate, take precedent over any other matter. The following observations about the MSPAP are based on: 1) observations by our staff who served as monitors in Baltimore City during the May 1996 tests in grades 3, 5, and 8; 2) a reading of the Maryland State Outcomes and Indicators, technical reports and other supporting documentation; 3) studies done elsewhere on state tests similar to the MSPAP; 4) a careful reading of publicly released MSPAP tasks and how they were scored; 5) a cursory look at Maryland performance on the CTBS and NAEP; 6) consultation with testing experts, their own research and other relevant literature; and, 7) anecdotal evidence supplied by teachers and students. In the following pages, I want to explore 12 issues which fall under three general areas: scoring, content, and efficiency. Some are raised here as concerns only, concerns which may be easily answered. Other issues relate more profoundly to policy decisions that form the theoretical underpinning of the MSPAP and should be addressed by the State Board. I. The Scoring of the MSPAP
II. The Content of the MSPAP
III. The Efficiency of the MSPAP
I. The Scoring of the MSPAP 1. Was it wise to design the MSPAP so as to make it impossible to report individual student scores? The ability to report individual student scores is a crucial element of promoting a standard of excellence among students and parents. We are also confident that if Maryland parents had to choose, their overwhelming preference would be to know the performance of their child rather than that of their school. The MSPAP was not designed to produce individual student scores; in fact if reported as currently designed, individual scores would be invalid. There are at least two reasons why individual scores cannot be reported: 1) students must participate in group activities during the MSPAP which means that their individual scores are not an accurate reflection of their own level of performance, their responses having been influenced by their peers; and 2) students answer too few questions to produce an individual score that accurately represents performance in a subject area. If a school’s performance on a test could be defined as the sum of individual student performances on the test by its individual students, then why was a test designed where this is not possible? The MSPAP contorts a simple relationship between student and school performance because of the variables introduced by the practice of brainstorming, peer review, and cooperative grouping. Group practices such as brainstorming, peer review, and cooperative grouping cannot be reconciled with the reporting of individual scores. All three of these practices, which are prevalent in the MSPAP, are useful complements to direct instruction from the teacher, but their inclusion in the MSPAP diminishes the State’s ability to report individual student scores. Brainstorming. Students are often asked to brainstorm before a task begins and the teacher jots on the blackboard whatever the class has to say on the subject. In our observations of the May 1996 administration of the test, many of the student brainstorming comments were incorrect but went up on the board anyway. Is there merit in being wrong? Should students be allowed to think that there are no wrong answers? Why is brainstorming permitted, much less required? Peer Review. Peer review is constantly required during the MSPAP. Students write a draft, and present it to a peer for review based on questions such as ‘What did you like best about my rough draft?’ While peer review may be a helpful part of the writing process, a student’s performance is unquestionably impacted by his peer’s comments. As long as MSPAP only collects school scores, this practice is harmless (though of questionable value given the time this activity consumes). However, the practice is at odds with valid individual scores. Cooperative Grouping. Many of the tasks begin with placing students in groups of about 4 to work together. After the group work is finished, students complete the test questions on their own, basing their responses on the work done by the group. The State has never demonstrated that the considerable time spent on such grouping results in any more accurate testing of student learning. Further, the State is sending a strong message to Maryland schools that cooperative grouping is always preferable to teacher-directed instruction. The research on cooperative grouping clearly shows that cooperative grouping is an excellent mode of instruction when used as a means to expand what students have learned from the teacher, not as a replacement for direct instruction. In our observations, the unfortunate reality of some cooperative grouping was all too apparent. In a grade 5 Science task, four students were each assigned a role in a science experiment. One of the students had to measure a set amount of liquid into a graduated cylinder. She was giggling so hard that she could not do it. The other three had to watch her try over and over again, time was being wasted, and they were being penalized because this girl could not perform a simple task. It is not hard to deduce from this example that the State could end up scoring this group very low in science, when in fact there may have been children in the group who were quite capable of performing the task. The State might respond that how well students work together was the point of the task, not science content. Why then is the score reported as a science score? Does the State Board, you, or the public want to test science by determining a student’s knowledge of science content or merely test how (according to the State) students ought to practice science? Has science been reduced to a set of social skills? There are not enough questions on the test for assessment of individual student performance. The MSPAP is a performance test which, as we understand it, (in part) means:
By definition it takes longer to write out an answer than it does to select an item on a multiple choice test. Almost all of the tasks on the MSPAP call for written answers, answers which seem to range from 1 sentence to 2-page essays, but which average about 2 to 3 sentences. (A performance-based assessment need not necessarily require this amount of writing). During a MSPAP exam students will on average perform 2 tasks a day and complete perhaps between 8 and 12 tasks during the week. Six subject areas are being tested using these 8 to 12 mostly interdisciplinary tasks. A student’s individual score in science, for example, would probably reflect the work done on 1, 2, or, at the most 3 of these tasks. (The score that the State now reports for each subject area reflects the collective student performance on a higher number of tasks, since all students in the same grade do not perform the same tasks.) These few tasks cannot say anything conclusive about an individual student’s knowledge or ability in a particular subject matter. Too few tasks are completed by individual students on the MSPAP for their responses to reasonably reflect a student’s knowledge of a subject area. Even if we presuppose that all of the questions are objective (they in fact are not), the MSPAP still does not ask enough questions to report accurate individual student scores. On the 1995 test, individual students completed work on an average of 16 items in a subject area, falling well short of the number of questions needed for fairness. Approximately 80 objective questions per subject area are ideally needed for a test to be a valid measure.
While we applaud the State for encouraging Maryland students to write more, we question whether the State has achieved an appropriate balance between the need for validity with the need for writing. Could the test contain fewer questions that require writing so that it then has time to ask more objective (multiple choice and short-answer) questions--and still be able to adequately assess student writing abilities? Indeed, a study by the College Entrance Examination Board shows that such a balance is not only possible, but advisable. A test that combines multiple choice questions with essays results in the most accurate and fair of tests for assessing the ability to write --more accurate than a test that included only multiple choice questions and more accurate than a test that included only essays. Moreover, it costs less to grade and can be graded more quickly. We certainly understand and appreciate the State’s motivation for wanting a test to improve whole-school performance, with the implication that pressure on a school to improve will trickle down from teachers to students. In the absence of reporting individual scores and therefore any consequences for students, the State may be overlooking its most powerful device for improving student performance, at least for older students. An economist at Cornell, John Bishop has done extensive research on the impact of external exams which have individual scores on student academic attitudes and achievement. He compared provinces in Canada which have such external exams in secondary schools with those that do not. His comparison shows statistically significant higher achievement in math and science in provinces with the external exams (along with a number of other positive benefits relating to parent involvement, higher standards in the school, less hassle over grades from parents, more resources of the school directed at instruction, to name a few). Bishop concludes that where external assessment with individual scores are not part of the package to raise academic standards, students and their parents benefit little from administrative decisions that opt for these higher standards, more qualified teachers or a heavier student work load. When student learning is not assessed externally, the positive effects of choosing academic rigor are negligible and postponed. The test is producing disproportionately lower results in schools where students are poor. Testing research has conclusively shown that in tests which carry no consequences for individual students, students will not perform as well as they would on tests which do carry consequences. This phenomena is compounded in tests which require a lot of writing, such as the MSPAP. It is most prevalent for older children who are in middle and high school and is one of the reasons why the State is designing high school assessments with individual scores. If the phenomena relating performance with motivation were equally true for all eighth graders in the state, then its downward effect on scores would be proportionate and fair: all eighth grade scores on the MSPAP would reflect lower student performance than would have been possible had the students been highly motivated. However, students who have intrinsically lower motivation to do well will try even less hard than those who have intrinsically higher motivation; students who are poor will on average try less than those in higher socioeconomic brackets. If the State is not going to provide individual student scores, then the question must be raised as to whether the State can reliably compare scores of school systems which are predominantly poor with those which are not. It must certainly be a factor contributing to the abysmally low scores reported by the city of Baltimore. In 1994, the Governor of California revoked the State’s own new state-wide performance assessments, in part because the assessment had made no provisions for individual scores. One of the original proponents of the assessment, Wilson had pushed hard to implement a test which provided individual pupil scores. But a few years later it was Wilson who also vetoed the needed reauthorization for the assessment. While there was certainly a conservative backlash towards the test, Wilson’s reasoning was not motivated by pure political expediency. In the process of designing and implementing the assessment, the State education department had basically shelved individual scores, in order that a performance-based assessment could be put in place. After reviewing a technical report that was highly critical of the test, Wilson felt he could no longer champion a test he had helped to create. 2. Are the school -wide scores on the MSPAP an accurate measure of a school’s academic performance? Given its serious consequences for schools, the reliability of the MSPAP needs to be indisputable. However, there are a number of red flags which call into question the MSPAP’s reliability: 1)we can find no evidence other than MSPAP scores to indicate that student performance in Maryland has in fact improved; 2) decisions have been made by other states not to use similar performance tests as the arbiter of school performance; and 3) the highly complicated nature of the tasks themselves make consistent and reliable implementation nearly impossible. We find no supporting evidence that student performance has in fact improved. In our view, if a state is going to design and administer its own test, external validation is not merely desirable but essential. An April issue of Education Week reports that 73 percent of Maryland’s teachers think that the rise in MSPAP scores is "not necessarily related to better instruction." A recent national comparison of student proficiency on various state tests with student proficiency on the NAEP showed that the discrepancies between state and NAEP scores were enormous. Most states (many of whom use a similar test to the MSPAP) had determined their students were a lot more proficient than indicated by the NAEP, some by as much as 70 percent. In 1994, Maryland MSPAP scores show 40 percent of its 8th grade students meet the state proficiency standards in math; the NAEP shows that 24 percent of Maryland students meet its standard of proficiency. NAEP data does nothing to elicit public trust in the MSPAP and what it purports to say about the achievement of Maryland students. Relative to other students in the nation and in other countries, the NAEP data is discouraging. For example, while MSPAP scores have steadily risen, the NAEP 4th grade reading scores register a slight decline from 1992 to 1994. A look at how much Maryland students are improving in their basic skills (as indicated on a CTBS random sampling of students) also yields unimpressive results. Of course the State is no longer emphasizing the CTBS "basic-skills" tests, but our understanding was that the MSPAP would be testing how well students can apply their knowledge of basic skills, which would indicate that students should still know the basic skills. In other words, the MSPAP presumably still requires students to know basic skills while expanding the expectations for Maryland students to include higher order thinking and application of skills. Is it not logical to assume that CTBS scores would have risen proportionately with the MSPAP? They have not.
An evaluation of Kentucky’s similar test, where a rise in scores on the state test was also reported, showed that the rise was unsubstantiated by any other proven indicator such as the NAEP or the ACT. Other states are exercising more caution. Maryland is uniquely successful in its implementation of a statewide assessment. Maryland also appears to be unique in the absence of an outside study by testing experts to educate its policy makers on the controversy of using performance-based tests. A number of states have tried or are trying to implement performance-based tests. Maryland would do well to pay close attention to the reasons why these states have not proceeded with these tests as planned, not because of some political agenda by the far right, but because of consistent cautionary advice from testing experts. The experience in California described earlier is just one of many states to reconsider its plan for testing. Kentucky uses a test similar to the MSPAP, as well as a student portfolio. A team of testing experts looked closely at the reliability of their test and strongly advised Kentucky to "cease and desist" portfolio measurements and to revisit its decision to use open-ended, performance-based questions exclusively. Our repeated offer to pay for the same or a similar group of testing experts to review Maryland’s test was discouraged by your staff, a decision we find troubling. North Carolina has recently implemented a similar test but pointedly decided not to attach any "high stakes" to the test for schools or teachers. Their testing chief (who testified before our State Board) says that "they know better than to use these types of tests for high stakes decisions because the tests are simply not reliable." When the state of Vermont considered using large-scale, performance tests of its schools, yet another testing expert, Daniel Koretz, cited five reasons against their use, foremost that "the unreliability of scoring alone was sufficient to preclude most of the intended uses of the score." Implementing the test is excessively complicated. The design of the MSPAP and the test’s heavy use of "manipulatives" (items which are used to encourage a hands’ on approach to student responses in lieu of paper and pencil) require enormous resources and energy from the test writers, scorers, teachers and students. As with any test, the State will always have to adjust and readjust test questions and add new questions. But questions using manipulatives are so complicated that unanticipated implementation problems will continually crop up. In its effort to maintain the validity of the MSPAP, the State assumes not just the role of Hercules but of Sisyphus condemned to roll the stone up the hill only to have it roll down, again and again. The most important question is this: is the State’s aversion to paper and pencil testing grounded in any educational research that says such testing is an inaccurate measure of student learning? What does the State gain from the use of manipulatives?
Though these problems could be addressed with better instructions, anytime you use manipulatives, you run a much greater risk of confusion and wasted time. The State needs to acknowledge that the use of so many variables by definition makes the test less reliable and be able to justify the use of manipulatives as a more effective means for measuring student performance. 3. Should the MSPAP have a standard of excellence for the use of proper grammar, spelling, and punctuation? The attention and priority given to proper grammar, spelling and punctuation on the MSPAP is minimal at best, a policy decision which public surveys show is not even remotely supported by the general public or businesses seeking employees with basic skills. The MSPAP gives students a separate score for Language in Use (spelling, grammar, and punctuation). The common sense reasoning for this policy is that when a student is being assessed on his abilities in science, for example, his score for science should not be influenced by his use of language. To avoid this influence, the MSPAP employs the Language in Use rule and "where appropriate" gives a student a separate score for language. This policy clearly makes sense, until one examines how the State scores student use of language. There is a range of three points that a student response can get, none of which reward the student for minimal or no errors. The following statements are extracted from the scoring guide to Language in Use:
Risk-taking is never defined. It is left up to the scorer to decide if the student was taking risks or was just making a sloppy mistake. More importantly, where is the score for the student who not only takes these risks but produces an error-free response? By the lack of a score for error-free writing, the State is sending a strong message to schools that excellence in language usage will not rewarded. Undoubtedly, Maryland schools have picked up on the State’s disinterest. Schools know that students can receive a satisfactory score in language usage while having produced work that has errors. In our observations at a school, we unhappily noticed that many of the fifth grade children were not writing complete sentences, the spelling was lacking, and the vocabulary minimal. The scoring matrix that was posted on the wall claimed that "proper spelling, grammar, punctuation, and capitalization" were not necessary for a satisfactory score. Since most schools are struggling to make it to this level, one can appreciate why teachers might not spend too much effort at achieving these skills. Through its casual approach to writing technique, MSDE is clearly endorsing the use of a "process writing" approach to teaching students writing, without citing research to support this highly debatable writing strategy or bringing this policy matter up before yourself and the State Board. This endorsement is especially distressing given the 30-year history of the failure of this approach to teach our most disadvantaged children how to write. There is good reason to be skeptical about the instructional strategies urged via the MSPAP and the tendency to jump on the latest reform bandwagon, without the research to support these strategies. In the early years of the test, MSDE correspondence to Maryland schools assured them that the MSPAP would employ "the same effective strategy for reading that the entire state of California had recently adopted." Several years later, California reported the second lowest NAEP reading scores in the nation, a scandal broke out over a decision to convert to whole language without supporting research, and suddenly the whole state and the nation is embracing phonics. This is not sound instructional practice. 4. Are tests graded objectively? Are scoring guidelines consistent and clear? Like all performance tests, student responses on the MSPAP need to be scored by a process of "subjective consensus," that is a group consensus decides the range of right answers for a particular response (there is usually no single right answer) which guides the subsequent scoring of the tests. The rules for scoring a MSPAP task are by MSDE’s own admission an enigma. The front of the scoring guide for the publicly released tasks bears the following statement: "It is likely... that some of the scores for the responses that follow will not be immediately clear to readers. They all have been selected, however, based on a consensus by a team of Maryland educators, backed by the scoring contractor’s senior staff." What is the implication of this statement? If the scoring decisions are not clear, does this lack of clarity not raise enormous doubts about the ability of the MSPAP to be objective? If the responses to MSPAP tasks are so varied and replete with nuance that the responses are not obviously right or wrong, then it is logical to conclude that the test must in fact be more subjective than objective. Here are three responses to the same question and how the above group of educators and MSDE senior staff say the question must be scored.
The score for the first response gets the maximum of 2 points even though it is unclear what the writer meant by "top"--top of the water? top of the straw? This imprecision is not noted by the scorer. The second response gets one point even though a pencil on its side is completely impractical as a tool of measure. The third response received 0 points because it was not clear enough, even though its response makes a lot more sense than the second response. Different outcomes are weighted inconsistently. When scoring the ‘writing to persuade’ task, scorers are instructed to focus on 1) development, 2) organization, 3) attention to audience, and 4) language. However, there is no indication of how each one of these four components should be weighted in determining the student’s score. For example, one essay (which received the top score of 3 points) provided good support and was very descriptive, but failed to "identify a clear position...". Another essay stated a position very clearly, but did not use proper letter format, and did not support its position as effectively as the latter. A second essay received only 2 points, which indicates that descriptiveness is more important than clarity. However, subsequent examples further muddle the question of emphasis, and in the end, the scoring appears to be relatively subjective. Correctness is not valued. In the MSPAP math tasks, student have to show their work, not just calculate a correct answer. Asking students to "show their work" appears to be a good idea, until you attempt to score their responses. A student can receive the same or more points for giving a wrong answer but a "good" explanation than a student who had the right answer, but gave a "bad" explanation of his process. How does this happen? For example, suppose Mary and John are two fifth graders taking the MSPAP. In the first question of a math task they are asked to compute the area of a room. Mary does the right calculation and gets 1 point. John does the wrong calculation and gets 0 points. In the second question of the task, Mary and John have to explain the process they used for arriving at their answers. Mary, not much of a writer but a whiz at math, gives a brief, incomplete answer and gets 0 points from the scorer. John, who cannot multiply but who remembers the formula for the area of a room, explains the process clearly and correctly. He gets not 1, but 2 points for his great explanation for a wrong answer. Now Mary has 1 point and John has 2 points. To make matters worse for Mary, because the practice of the MSPAP is to keep asking these process questions to the point of redundancy, there is a third, perhaps even a fourth question, prompting John and Mary with more process-type questions for arriving at the area of a room. By the time they have finished the task, John far exceeds Mary in points. Multiply Mary’s and John’s experience to a school full of children and you have to wonder if the State is projecting who is satisfactory in math and who is not. The National Center to Improve the Tools for Educators (NCITE) makes our point incisively:
5. Is there is a tendency towards political correctness in the scoring? In Grade 8, "Political Change," there is a statement that reads "What evidence is there that the Cherokee Nation may have developed a government similar to our own if left to their own resources?" The scoring guide gives various answers that the students could supply and receive the top number of points. All of the correct answers imply that the Cherokees would have in fact developed democracy if the white man had left them alone. Incredibly, the scoring guide does not take into consideration evidence to the contrary contained in the paragraph that the students had to read. The paragraph the children read clearly states that Sequoyah’s interest in education and democracy began with his exposure to the writing of the white men. The fact that the State ignores this information in the scoring criterion reflects a none too subtle attempt at political indoctrination. II. The Content of the MSPAP 6. Are teachers given adequate direction to effectively prepare a class for the MSPAP? You have yourself pointed out that the State Learning Outcomes which define what the MSPAP will test need serious work as they are not challenging or specific enough. Is this being done? Will there be adequate public review? Much of the anxiety some teachers feel about the MSPAP stems from the sense that they do not really know how to prepare students for the test, other than to stop their own instruction and practice the performance tasks, drill students on the vocabulary common in the MSPAP, and have students practice grading their own work using the MSPAP scoring matrix. Why does the State seem to value vagueness when it comes to telling school districts what the MSPAP is testing? Rather than preserving the autonomy of Maryland school districts, vague outcomes befuddle teachers, while also certainly restricting district-level freedom. The vagueness is not echoed in the proposed high school assessments. If the State feels that it can prescribe content in the high school grades, then it should not be so adverse to directing content in the elementary and middle grades. A teacher would be far more comfortable knowing that the MSPAP is tied to specific content that she must teach. The lack of clarity is creating anxiety and poor decisions about preparation for the test, particularly in schools eligible for reconstitution. In the 1996 report of Making Standards Matter, the American Federation of Teachers takes Maryland to task for its K-8 standards.
Preparation for the MSPAP consumes valuable instruction time. In addition to the actual time it takes to administer the tests, we frequently hear consistent anecdotal reports that schools are interrupting regular instruction weeks and sometimes months before the administration of the MSPAP to practice for the test. Some principals acknowledge discomfort in this, but, as one principal readily admitted "it’s my job and my school that is on the line and I am not prepared to do less." To see this in the extreme, look no farther than Baltimore city, where teachers in every grade (not just 3, 5, and 8) must now devote four to eight full weeks of the school year to the Baltimore Quarterly Assessments, a test designed to improve Baltimore’s performance on the MSPAP. In other words the State, which has been concerned that too little time is spent on instruction and has considered lengthening the school year has, in effect, reduced instructional time by diverting it to exercises devoted to improving performance on the State tests. 7. Should the MSPAP test knowledge in addition to testing the application of knowledge? In public literature on the MSPAP, the State never claims that the test will provide an adequate assessment of everything a student should learn and advises that local school districts will need to continue local testing. But as the MSPAP enters its sixth year, few would argue that any other accountability measure matters remotely as much as this test. The MSPAP’s power over instruction in Maryland schools is so staggering that the State can no longer say that the MSPAP is just one measure of many. With this power comes tremendous responsibility. The State needs to be entirely comfortable that the MSPAP is sending a resounding message about the importance of prior knowledge to acquire new knowledge, not joining those who derisively reduce knowledge to "factoids" or the "rote memorization of mere facts." We believe that the State has fallen prey to some serious suppositions about how knowledge is applied, how higher-level thinking skills are obtained, and how one becomes a critical thinker. As we observed in the May 1996 tests and in the publicly released tasks, the MSPAP cuts out a crucial link in the construction of knowledge: knowledge cannot be successfully applied without first being learned. Students are spoon fed any knowledge needed to answer questions on the MSPAP. We were especially disheartened by our observations on this front. One of MSPAP’s most alarming features is that it permits students in Grades 5 and 8 unlimited use of a dictionary, a calculator, and a thesaurus throughout the test. Students in grade 3 are permitted to use these tools on a limited basis. For the most part (but not entirely), the MSPAP gives students all of the content that they might need to answer a question as well as tools such as dictionaries which in former days would have been a cheating infraction. Very little content or background knowledge is presumed. The existence of any previous knowledge of a topic is by no means a requisite to successfully completing the task. Teachers do not think that teaching content is important to do well on the MSPAP; in fact, this point precisely relates back to our earlier point showing that 73 percent of Maryland teachers do not think increased learning is responsible for improved performance on the MSPAP. A popular conjecture in the education field today is that students only need to know how to access knowledge, not know it, and that this century’s information explosion makes particular knowledge unimportant for students to learn. Whether the State means to or not, it is aligning itself with this conjecture about the information explosion by administering a nearly content-void test, concurrent with the erroneous supposition that knowledge can be applied that is not first learned. While the pedagogical implications of an information explosion are well worth exploring, it is indeed unfortunate that the State has deemed the mere observation of this explosion sufficient to relegate knowledge to second-class status on the MSPAP. In his recent book, The Schools We Need & Why We Don’t Have Them, E.D. Hirsch comments on the illogic of reasoning that suggests that higher order thinking, application of knowledge, critical thinking, etc., can be taught without the knowledge to accompany it: "Usually it isn’t the logical structure of people’s inferences that chiefly causes uncritical thinking but, rather, the uninformed or misinformed faultiness of their premises." Hirsch reviews 100 years of research on whether instruction in critical thinking translates into improved real-world critical thinking. Neither logic nor research provide any credence to critical thinking skills directly, as the MSPAP encourages teachers to do. Most importantly, Hirsch states, there is considerable evidence that people who are able to think independently about unfamiliar problems and who are broad-gauged problem solvers, critical thinkers, and life-long learners are without exception well informed people. Further, a consensus body of research from the field of neuroscience indicates that some rote learning, memorization, and acquisition of "facts" is essential and integral to the brain’s ability to process knowledge. Do not misinterpret our point here. No one challenges the need to have students think critically, to analyze, to appropriately apply the lessons learned in one subject to another subject. Even though more knowledgeable students are likely to do better on the MSPAP, substantive knowledge is diminishing in importance in Maryland’s classrooms because MSPAP sends the strong message to students, teachers and schools in Maryland that knowledge of subject matter has little value. An administrator from Prince George’s County illustrates this point precisely, when he cautioned the State from including any item in the Core Learning Goals for the high school assessments that it did not then plan to use in the test. "Forget about us teaching anything you’re not testing. All anybody cares about anymore is if the material is going to be on the test."
How can anyone argue that this test meets world class standards? In Japan students are expected to know the definition of an isosceles triangle in third grade and the circumference of a circle in fifth grade.
The sad reality of the current approach of the MSPAP is not that Maryland students will not be able to independently recall dates from memory, but that they will never have learned the knowledge needed to provide and analyze the "why’s," the "why not’s" the "how come," and the "what shoulds"--the very questions the MSPAP would have students ask and answer. Could, for example, a student with no knowledge of the Reconstruction attempt to make any sensible comparison between America’s withdrawal from racial equality during Reconstruction to similar reasons motivating the current movement away from affirmative action? No.
8. Is the test truly age appropriate? Our observations have left us with a nagging concern that as a result of the MSPAP’s effort to require analytical thinking from students, the questions are not always age appropriate in the earlier grades. In no way are we using the term age appropriate to imply that the State is using age- inappropriate content (in fact we would vigorously argue that the State needs to introduce richer content in earlier grades). Our concerns are with the cognitive development of children and how they learn. Can the State show the psychological research to support the kinds of responses it wants to elicit from third and fifth graders? And, is the MSPAP asking the kinds of questions which will elicit the most information about a student’s education? In raising this issue, we are not asking if the State has theories about what third graders should be able to achieve or even what these same third graders should be able to achieve by the time they finish high school. We fear the State has identified those qualities that children should obtain by the time they finish school and is assessing students on the basis of whether or not those qualities are present in their early education years. A statement made by Lisa Delpit, a Professor of Education at Georgia State, illustrates this point.
We worry that the MSPAP has omitted a fundamental part of the equation for developing higher order thinking skills: that students first master fundamental skills and a knowledge base. If MSPAP was truly charting a course for students that would produce good analytical thinkers, then the MSPAP would not appear so identical in its form and choice of questions from grade to grade. Instead it would have Maryland students follow a path from grades 3 to 5 to 8 and on to high school which moves them from the concrete to the more abstract; from understanding to analysis; from mastery of multiplication tables to mathematical thinking where more than one answer is possible; from the structured, rule-bound formation of a sentence, then a paragraph, and finally the essay. The cart has come before the horse. 9. Does the MSPAP discourage, even penalize, independent thinkers? Forget Leonardo DaVinci, Thomas Jefferson, Albert Einstein, Christopher Columbus, Galileo, or Winston Churchill. The State appears to have decided that it is only looking for cooperative thinkers. We found probably one of the most disturbing examples of this attitude in Grade 8, "Planetary Patterns." The student is asked to conclude this task answering "Write a description of how your ideas or the ideas of others were influenced by working in a group." In the scoring guide, it shows that a student who answers "Johnny gave us his idea and when we examined the evidence he convinced us that he was right" gets 1 point, but the student who answers "When I listened to others, I found I confirmed my ideas" gets 0 points. So by this logic, can we then conclude that Johnny would have been given 0 points, whereas his fellow students, who got the answer wrong first but who had been willing to listen to the right answer, are given a point! What does this convey to students? It’s almost Orwellian. Why has the State apparently decided that the ability to work on a team is more important than any other qualities such as punctuality, honesty, or say a sense of humor? You could argue that all of these things lead to a happier or better work environment for those who work in an office. Never mind that many of the individuals who will contribute much to this world are unlikely to ever work in an office setting. In defense of its emphasis on team work, one hears asserted that the MSPAP is preparing the workers for the 21st Century. We are struck by the lack of any persuasive evidence that demonstrates that workers for the 21st Century are that much more likely to need to work in teams than in previous eras. American corporations are quite ambivalent about who is more valuable: team players or individuals possessing strong leadership who can shake up the company. Further, given employment instability, team work may not be of as much value to future employees as the ability to self-start. December’s Fortune magazine cites current Department of Labor statistics which predict that today’s college graduates are likely to have 8 to 10 jobs in the course of their lifetimes and as many as three careers. In any case, the resolution of this debate should not be a foregone conclusion in the American classroom. III. The Efficiency of the MSPAP 10. Is the test efficient? Notwithstanding whether the MSPAP tests the "right stuff," we question whether the test efficiently measures what it purports to measure. There are two issues here, the first of which is the time it takes to administer the test and secondly the efficiency of the tasks themselves. The MSPAP is time consuming. First, by design, MSPAP takes up at least one week of instruction, for a total of 9-10 hours, significantly more than other tests. The SAT takes 3 hours, the CTBS takes 5 hours. The NAEP takes 1 hour per subject. Given the time it requires, is the MSPAP more accurate at measuring student performance than these other instruments? We are unable to find any evidence that it is. To the contrary, the other tests have much higher reliability. Many of the questions lack clarity and efficiency. It appears that efficiency and focus are of little value in task design.
Distinguishing between the following three questions is difficult:
The pace of these three questions is painstakingly slow and is typical of many MSPAP questions which spend inordinate time asking a student to predict what can happen, observe what can happen, compare predictions and observations, illustrate observations, evoke a student’s thinking while predicting or observing, and how to do it differently the next time. No other area of knowledge or skills receives even remotely the same level of attention or time. In Grade 3, "Worker’s Web," the ever-popular topic of community workers is addressed. The thought process leading from one question to the next is baffling and its pace again painstakingly slow.
Another example illustrates the low expectation for any student knowledge as well as how long it takes the MSPAP to assess the simplest abilities.
Anyone who looks at the published scores would undoubtedly make an erroneous conclusion about Maryland students’ strength in social studies. From a purely practical view, one has to wonder about the page length of some of these tasks in proportion to the relatively small amount of information the State is accumulating. "Child Labor" is a 48-page task on social studies and writing in which students are given a 20-page story with illustrations to read and react to. In a Grade 8 task "Choices in Reading and Writing" the students are given the choice of four stories to read and write about, for no reason other than the task designers appear to have been worried about forcing students to write about stories that do not interest them. Does the SAT allow students to select the paragraph they want to analyze? No. The first activity in Grade 3’s "Trouble with Flowers" claims to test the students’ ability to "read to perform a task".
11. Is the MSPAP’s use as an instructional tool in conflict with its use an assessment tool? As we understand it, the MSPAP is designed to achieve two goals: improve classroom instruction and measure student achievement. There are numerous activities contained in the MSPAP which are not in the test in order to be assessed; rather they are there to encourage teachers to change their classroom instruction. A problem with this strategy is that the two goals work against each other. The more time the MSPAP spends demonstrating a certain instructional method, the less time it can use to assess student achievement. The MSPAP sets out to encourage many methods of instruction (cooperative learning, brainstorming, peer review, having students explain the process they use to arrive at an answer, application of knowledge). Some can be tested, like having students explain their thought processes, and these methods are appropriately in the MSPAP. Others cannot be tested, are distractions, and should not be in the MSPAP. As a test, the MSPAP is far less reliable and yields much less information about how much students are learning because of the State’s determination to have the MSPAP be an instructional model. Instead of assessing student achievement, the State is assessing instructional approaches. The State could easily determine if MSPAP-specific instructional practices are having an undue influence on student scores. If the test were given to two students of the same ability, one who had been exposed to the MSPAP, and the other who had not, would their scores on the MSPAP be comparable? The results would be telling. 12. Are the policies regarding test security justified, fair and politically wise? The secrecy that surrounds the MSPAP tests seems hard, if not impossible, to defend. The real issue here is not secrecy, per se, but the selectivity of the secrecy. The State has made a curious determination about which groups threaten test security and which do not. There are good reasons for tests not to be widely available. That being said, it is troubling that the State is quite willing to let teachers and principals take a good hard look at the test for up to two weeks before its administration, but parents (and outside researchers) are prohibited from examining the test after its administration. In other words those groups who could abuse the security of the test have ample and ready access to it and those who cannot are denied even minimal access. This policy is troubling for several reasons. In the first place the MSPAP is a publicly funded document so it would seem that there should be a heavy presumption in favor of permitting those in the public who ask to see the test to examine it in a secure setting. Secondly, refusal to disclose makes it appear that the State has something to hide and therefore serves to increase the suspicion of those who are skeptical of the test based on the information they do have. Third and most troubling, there does not appear to be any practical reason for the secrecy. The persons who most care about the performance on these tests and have the greatest ability to influence the outcomes are the teachers and principals. As it is teachers, and not students, who are effected by the scores then it stands to reason that teachers, not the students or parents, care most about test performance and would be most likely to violate the integrity of the test. What does the State think is going to happen if parents see the test? Since some of the questions are not retired after a year, a parent who sees the test could conceivably coach a child, but to what end? It does not help the parent or child if the child does well. Furthermore, the MSPAP questions are not multiple choice or short answer. They call for written answers of some length so knowledge of the questions beforehand is of little value. Lastly and most compelling, other comparable tests can be viewed in their entirety after the test is given. The most striking example is the CTBS which gives the same precise questions each year. The SAT retires three of seven versions of its test each year and these tests are publicly accessible immediately after the exam. Any credentialed person can get access to NAEP questions as long as they agree to sign a nondisclosure statement. In Kentucky, parents are welcome to look at their test by just going to their child’s school and signing a nondisclosure agreement. The same is true in Kansas: parents can see their child’s test before, during, or after its administration. Both of these states have many multiple choice questions so there would be more chance that cheating could take place than is true in Maryland, yet both of these States remain open to parents’ scrutiny. As a cautionary note, parents’ concern over test secrecy was one of the central issues which helped to do in the California state test. Unless Maryland can provide legitimate reasons why this test should not be available for public scrutiny, it could face a similar problem. Given the early rumblings heard in the Maryland State legislature, this concern seems to have merit. It is also worth noting that the reason for the teachers and principals having access to each year's test two weeks before its administration is the test's reliance on manipulatives. This is another indication that the problems caused by the use of manipulatives on the test outweigh the benefits. CONCLUSION Neither the Superintendent nor the Board could have anticipated many of the features of the MSPAP which have found their way into the test, some of which is only evident when one examines how this test is scored. While the structure of the MSPAP was determined before the current Board took office, a first-hand look at how this test has evolved would be more than deserving of the Board’s time. The MSPAP clearly exalts some qualities over others, which in 1990 the Board pointedly tried to avoid. I refer to the strong emphasis on working together in teams over the notion of producing independent thinkers. America was built on independent thinkers. Why has the ability to work well with others become more important than training independent, strong-minded, knowledgeable thinkers? Perhaps no issue about the MSPAP concerns us more than the clear lack of substantive knowledge required for the test. The State Board might find it instructive if the MSDE staff were to list for its review the knowledge (as opposed to skills) a student would have had to possess to score at a satisfactory level on the different subjects of the MSPAP. To argue that the MSPAP is a test which respects the value of knowledge is specious, on the simple grounds that there is no school in the state (that we have found) that prepares students for this test by teaching knowledge. To the contrary, Maryland schools which do value and teach knowledge interrupt this type of instruction weeks in advance to prepare for the MSPAP. There are many other actions that may be appropriate for you and the Board to consider:
These issues are complex and overwhelming, especially in light of the work that has gone before to garner acceptance of the MSPAP by local school districts. Given what we were able to learn with relatively little access to anything but public documents, the likelihood is good that these issues will ultimately manifest themselves in the Maryland legislature or in some other more public arena, as they did in California. Even more unfortunate will be our promotion of a test that we discover in ten years does little to produce a more educated citizenry. I strongly believe that the State has much to gain by producing its own State report card for school improvement, specifically tied to continuous improvement of the MSPAP. As a demonstration of good faith, the State could point to a comprehensive review of the MSPAP as its own self-imposed measure of accountability. Local school districts would perceive the State as a responsible partner in school improvement and reasonable changes that are carefully explained and made at an acceptable pace will simply be construed as sound practice. _____________________________________________________________________________ Endnotes: Bishop, J. "Nerd Harassment, Incentives, School Priorities and
Learning", Working Paper #96-10. Center for Advanced Human Resource
Studies, 1996. |