Free Will and Education Reform

George Henry Hall: The PomegranateThe question of free will bursts open into question upon question. What does it mean to have free will? To what degree do we exercise it? How can we know? For all the swarms of ideas on the subject, there seems to be agreement—among philosophers, theologians, poets, psychologists, and others—that whatever freedom we might have, we do not control other people or the outcomes of our actions (and if we could, it would be unwise). What a refreshing thought—and what a far cry from today’s education reform, which insists on our ability to control others’ results!

Literature from ancient Greek drama to contemporary psychology warns about illusions of control. In Aeschylus’s Agamemnon (in Robert Fagles’ translation), the Chorus sings, “And neither by singeing flesh / nor tipping cups of wine / nor shedding burning tears can you / enchant away the rigid Fury.” Rabbi Hanina states in the Gemara of Berachot (33b) of the Talmud, “Everything is in the power of heaven except the fear of heaven.” (There are numerous interpretations  of this statement.) In recent centuries, literary, philosophical, psychological, religious, and sociological writings have emphasized the futility (or danger) of trying to control others.

Yet much of education reform assumes we can and should control others–in particular, their measurable achievement. This assumption is profoundly wrong. To rate teachers on their students’ test performance is to distort the educational endeavor. Teachers influence students (and their influence is great); they do not cause students to do well or poorly. (It’s one thing to analyze the results; it’s another to convert them by formula into a rating.)

“Very well,” someone might respond, “so you’ve admitted that teachers influence students. Are you saying this influence doesn’t matter?” Of course it matters; it gives meaning to the work and helps teachers heed the alarm clock in the mornings. Still, whenever the student steps out to do something—take a test, give a presentation, or read further on the subject—this is the student’s action, not the teacher’s. The student has the credit and the dignity (or should).

“In that case, teachers might as well throw up their hands,” another might say. “If they aren’t held accountable for results, why should they bother trying?”

When you think you might influence (but not control) your students, there is all the more reason to try. You get to share in something that is not your own, something that goes beyond you. When a student does well, you have the honor of contributing to it in some way; when a student does poorly or runs into difficulties, you have sorrow and the self-questioning. Honor and sorrow and self-questioning and responsibility inspire me a great deal more than the publication of teachers’ “value-added ratings” in the newspaper.

It is not just that they inspire me more; it’s that they serve as better guides. I don’t know, and have no way of knowing, how great my influence will be or what form it will take (beyond concrete and immediate learning). That is all the more reason to put thought and effort into my lessons: I am participating in something partly knowable, partly mysterious, but in any case larger than myself. If I had wanted a predictable effect on things, I would have become a chocolatier, a producer of delight and cavities. Even then, my results would not have been uniform.

Yes, of course I want concrete learning to come out of my lessons; of course I want to see evidence of it. Even so, I do not make it happen, nor do I set its limits. Even less do I control what comes out of that learning.

Many economists would disagree. A 2011 study (by Raj Chetty, John N. Friedman, and Jonah E. Rockoff) concludes that teachers affect not only students’ performance on tests, but also their college attendance and future earnings.  Granted, they say “affect,” not “cause,” but then they extrapolate: “Replacing a teacher whose VA is in the bottom 5% with an average teacher would increase the present value of students’ lifetime income by more than $250,000 for the average classroom in our sample.”

I think of D. H. Lawrence’s  “Pomegranate”: “Do you mean to tell me you will see no fissure?”

I respect these scholars and acknowledge the care that went into the study. Still, its projections assume minimal variation among students, little that could interfere with their earnings, and little room for them to choose their directions in life. Presumably, if teachers could “increase” students’ lifetime income by more than $250,000 (a projection based on limited data), then we could boost the economy just by replacing the low-ranking teachers. We could replace our way to a better world.

But what if the students’ lifetime income didn’t increase as expected? What if these students faced layoffs, job changes, and life difficulties, or chose professions that didn’t pay especially well? What could one replace then, for better outcomes? Perhaps one could give each of their choices a value-added rating (in terms of how much income it produced) and demand that they make lucrative life choices. Someone would have to chase after them and make sure they did so.

What if illness and war and death got in the way? Well, one would have to replace those students who got sick or died, or who grieved the death of others. No room for mortality (or aging) in the picture, especially if it interferes with earnings.

We are left, then, with those select few who don’t age, fall ill, or die—and who, without fail, take actions that bring them more money.

We are down to no one—but there, in that world of none, we have attained prosperity!

Happy are those who do not inhabit that world.

Teacher Ratings and Rubric Reverence

Some seven years ago, when I was taking education courses as a New York City Teaching Fellow, we had to hand in “double-entry journals”—that is, two-column pages with a quotation or situation on one side and our response on the right. On one occasion, I needed far more room for my response than for the quotations, so I adjusted the format: instead of using columns, I simply provided the quotations and my comments below each one.

The instructor chided me in front of the class. She said that this was a masters program and that I should learn to produce masters-level work. (She wasn’t aware that I already had a Ph.D. from Yale.) If the instructions specified a double-entry journal, well, then I was supposed to provide a double-entry journal. She had no quibbles with my commentary itself, which she found insightful. She just took issue with my flouting of the instructions. I have no grudges against the instructor, who meant well and knew her stuff. But it was an eye-opener.

Up to this point, I had not encountered such rigidity regarding instructions. In high school, college, and graduate school, we were expected to use certain formats for term papers, publishable work, and dissertations. But on everyday assignments, it was substance and clarity that mattered most. The teacher or professor even appreciated it when I departed from the usual format for a good reason. I did so judiciously and rarely.

The double-entry-journal incident was part of my induction into New York City public schools. There, the rubric (which usually emphasized appearance and format) ruled supreme; if you did everything just so, you could get a good score, while if you diverged from the instructions but had a compelling idea, you could be penalized. I saw rubrics applied to student work, teachers’ lessons, bulletin boards, classroom layout, group activities, and standardized tests. I will comment on the last of these—rubrics on standardized tests—and their bearing on the recent publication of New York City teachers’ value-added ratings (their rankings based on student test score growth).

A New York Daily News editorial asserts that teachers with consistently high value-added ratings are clearly doing something right. (This is the argument put forth by many value-added proponents.) But that’s not necessarily so; all we really know is that their students are making test score gains.

In New York State, on the written portion of the English Language Arts examinations, it matters little what the students actually say or how well they argue it. What matters is that they address the question in the prompt and follow the instructions to the letter. A student may make erroneous or illogical statements and still receive a high score; a student may make subtle observations and lose points for failing to do everything exactly as specified.

Here’s an essay prompt from the 2009 grade 8 ELA exam. (For an example at the high school level, see my blog “A Critical Look at the Critical Lens Essay.”)

Bill Watterson in “Drawing Calvin and Hobbes” and Roald Dahl in “Lucky Break” discuss their approaches to their work. Write an essay in which you describe the similarities and differences between the work habits of Watterson and Dahl. Explain how their work habits contribute to their success. Use details from both passages to support your answer.  In your essay, be sure to include

  • a description of the similarities between the work habits of Watterson and Dahl
  • a description of the differences between the work habits of Watterson and Dahl
  • an explanation of how their work habits contribute to their success
  • details from both passages to support your answer 

To get a good score, a student would only have to write one paragraph about similarities, one paragraph about differences, and one paragraph about how their work habits led to their success. By contrast, a student who began by considering definitions of “success” (as G.  K. Chesterton does) would not fare so well, even though that might be the more thoughtful essay. Likewise, a student who questioned the direct link between work habits and success (as Mark Twain does) would be at a disadvantage. Students are better off if they write a predictable essay, even a bland one, that meets the criteria. Their teachers are better off, too; every point counts when it comes to value-added scores. 

I have scored ELA exams. Human judgment has little place in those scoring rooms. To maintain consistency, everyone is supposed to follow the rubric, and, if there’s any doubt, the state’s own interpretation of the rubric. It comes down, in the end, to following instructions rather than judgment. On the one hand, this is fair and justified. If teachers were to use their own judgment when scoring, two essays of similar quality could receive wildly different scores. On the other, it means that there’s no way to acknowledge the student who struggles with the question becausethe question is tricky or problematic—that is, the student who pushes beyond the obvious response. 

Now let’s consider the consequences in the classroom. Teachers A and B teach at a relatively high-performing school. Teacher A tells students that to write well, you should have something to say and should take care with words. Her students read G. K. Chesterton, Ralph Waldo Emerson, Mark Twain, Jonathan Swift, and others. They discuss these essays, look at their structures, respond to favorite passages in them, and write essays inspired by them. Teacher B, within the same school, has a different approach. She brings in reading passages like those on the tests. She teaches students how to read essay prompts and produce the expected responses. She has them do this every day. Now, arguably, one can teach students to write thoughtfully and follow directions precisely. But the latter has the greater test score payoff.

So, teacher B’s students make more test score gains than Teacher A’s students. Teacher B gets rated “high”; teacher A, “below average.” (This is a plausible scenario in an unusually high- or low-performing school, where a slight difference in points can account for a large difference in ratings.) Then the ratings appear in the New York Times and elsewhere. Many readers will assume, even with caveats galore, that teacher B does better work than teacher A. Teacher A then finds herself under pressure to do what teacher B is doing. That means ensuring that her students follow directions.

How do you get teachers to teach in this manner? Train them in education school. Impress upon them the sacrosanctity of instructions. Teach them that if the assignment is a double-entry journal, then that is what they must produce, period.

What David Brooks Doesn’t Get

In his New York Times op-ed “Testing the Teachers” (April 19), David Brooks warns that “an atmosphere of grand fragility” hangs over America’s colleges. The grandeur, he says, comes from the colleges’ increased application rates, new facilities, and international reputation; the fragility, from increased tuition combined with uncertain results. What must we do? Hold colleges accountable for results—through value-added testing. That’ll show who’s teaching and who isn’t!

Brooks is wrong. Accountability systems would drag down our colleges. The best would be made mediocre, and the worst would rise to mediocrity at most.

Having put forth the idea, Brooks waxes dreamy about it. “There has to be some way to reward schools that actually do provide learning and punish schools that don’t,” he muses. “There has to be a better way to get data so schools themselves can figure out how they’re doing in comparison with their peers.”

What Brooks doesn’t understand is the difference between accountability and responsibility. It is the latter, not the former, that will help and sustain colleges.

Responsibility is an internal sense of duty; accountability, an external show. The professor who who puts full thought into lesson preparation, corrects student work, holds office hours, challenges students in class, and takes them, day by day, into the subject—this professor has a deep sense of responsibility but may or may not “produce” test score gains. A professor who focuses on showing results to outsiders (an accountable professor) may be less immersed in the subject, less concerned about navigating tricky points—but may raise test scores. If schools must foster the latter sort of teaching, they will glide into a monotone.

But why should accountability and responsibility be at odds? They are not always opposed, but there’s ongoing friction between them. To honor one’s best thinking and conscience is not the same as to do what others want and recognize. The best instruction does not absolutely and consistently produce test score results.

For one thing, course content may not match the content of standardized tests (and it would be dreary if it did). Second, if students take especially difficult courses, they may go an entire semester without showing visible progress. A grade of “C” may be honorable in such cases. Third, each subject has its language, structure, and logic; these are not always easy to convey to those outside the field. In their presentation “Assessment on Our Own Terms,” delivered at the 2007 Annual Meeting of the National Association of Schools of Music, Mark Wait and Samuel Hope draw attention to the difficulty of translating “musical logic” into “speech logic.”  Fourth, the higher the level of study, the more complex the assessment becomes. (That’s not to say that assessing kindergarteners is a straightforward matter.)

This leads to another flaw in Brooks’s suggestion. He assumes that it is the colleges’ duty to “produce” visible signs of learning. But even today, with the tuition hikes, many students go to college to be challenged, to explore many subjects, to dedicate themselves to a major, and to work on something of beauty. Getting top grades isn’t necessarily their first priority. Some would rather take more courses, or more difficult courses, at the risk of lower grades than take easy courses and get all A’s. Some find themselves immersed in a particular course or subject and let the other ones slide a bit. Some follow an idea or a project only to discover that they are on the wrong track. This is their prerogative, and they must take the consequences.

True, not all students are so serious–many  skip class repeatedly, go to party after party, and fret over relationships. If they slip too far, a good hard “F” can shake them up. Deans and advisors should watch for students in danger of failing, but students must learn to make choices and take responsibility for them. It does not help students—especially college and graduate students—to make someone else responsible for their performance.

Now, of course I am assuming a liberal arts college or school of art (or music or drama), and a high-level one at that. I am not referring here to colleges where most of the students need remedial courses. Nor am I talking about vocational and technical schools, whose mission is to prepare students for a concrete profession or trade. These are colleges with specific, standardized goals—and they should make good on their promises, provided the students do their part.

But it is not nostalgic, romantic, or naive to insist that college also be about something else: about pursuing interests, enjoying a life of the mind, making and learning from mistakes, being around intensely knowledgeable and interesting people, studying a subject at a high level, and yes, allowing for imbalances between receiving and giving. Education is a gift in a troubling sense, a sense that recalls Robert Frost’s lines about a star, “It asks a little of us here. / It asks of us a certain height.” This is no trivial demand. Students, receiving a fine education, do not immediately show the height required. Sometimes this takes years, even decades. Sometimes we think back on something learned long ago and see how it honed our thinking and our lives. That’s a result worth defending to the end. We must not treat such learning as a lie.