The Big Five: Theory or Phenomenon?

four-leaf_and_five-leaf_cloversIn an earlier post, I suggested that the Big Five model, even as a taxonomy, contains assumptions about how personality works. Having read Sanjay Srivastava’s illuminating article “The Five-Factor Model Describes the Structure of Social Perceptions” (Psychological Inquiry, vol. 21, issue 1 [2010]), I revise my argument as follows:

The Big Five model, a taxonomy of social perception, presumes that patterns in people’s perception of others can inform our understanding of social constructs of personality. In particular, it postulates implicitly that when you have groups of correlated traits, with maximum variance between the groups, you can meaningfully label the groups and regard them as major factors of personality.

That sounds reasonable enough on the Big Five’s part–but before addressing it, I should distinguish among three concepts. (Thanks to Dr. Srivastava for distinguishing helpfully between the first two.)

First, there is the Five-Factor Theory formulated by Robert (Jeff) McCrae and Paul Costa. It offers a theoretical basis for this overall approach to personality. It contains sixteen postulates, only one of which brings up the Big Five in particular.

Next, there is the Big Five model itself–which, according to Srivastava, is best understood as a taxonomy of social perception, not of personality per se. It sets the stage for investigation of the sources, processes, and consequences of social perception. On p. 7 of the article above, he writes:

It is an interesting and worthy enterprise to study the characteristics of persons who are reliably described as extraverted, agreeable, etc.; but if you want to really understand the Five-Factor Model, you need to frame your questions in terms of perception–and in order to avoid the dead ends of previous eras, you need to study perception in a way that accounts for the entire chain of causation from the neuropsychic bases of behavior in targets to the inferential processes by which perceivers perceive (as proposed by Funder, 1995).

Finally, we have various Big Five personality tests, which people take out of sheer curiosity, as part of an experiment, or for some external purpose such as employment. It is in these tests that much of the mischief arises (in my view)–because if the Big Five are a taxonomy of social perception, they essentially say more about how others tend to perceive people who appear to share traits with you than they say about who you are. The distinction is essential, and it isn’t made often enough. Also, they presume that a person’s relationship to the Big Five can meaningfully be described on a sliding scale. This, too, merits questioning.

But let’s go back to the Big Five model. It makes sense to view it as a taxonomy of social perception. In Srivastava’s words (on p. 9), “traits are what people want to know when they get to know a person.” But clearly there are problems with grouping such traits together, even when such grouping is suggested by the data. The larger categories may obscure the distinctions between the sub-traits. (And that’s why I see the Big Five model as a hypothesis or theory: It postulates that such grouping is meaningful and informative.) Drawing on Jack Block’s critique of the various models in the Big Five framework, Srivastava writes on pp. 13-14:

As Block notes, it is difficult to come up with single words or even short phrases that adequately capture the breadth of meaning of the five factors. The single-word trait terms encoded in language are probably closest to the level of abstraction that perceivers operate at most of the time (cf. John, Hampson, & Goldberg, 1991, for a more nuanced view). At lower levels of the hierarchy–aspects, facets, and especially individual trait concepts–we will need to develop increasingly differentiated theories to account for the social concerns that these dimensions encapsulate.

Yes, this is a problem, and it exists even before we get to tests. Martha Smith once commented on Andrew Gelman’s blog (in response to one of my comments), “In other words, [the researchers] did not start with definitions of traits; this was exploratory research that gave them candidates for traits. The real definition of the traits was ‘whatever this linear combination measures.’ However, the labels they attached to these factors became ‘reified’ — that is, taken to be The Real Thing Measured, even thought the labels were fuzzy terms subject to varying interpretations.”

An associated problem is that the Big Five is a taxonomy of general tendencies in social perception; thus it does not account for exceptions and outliers, which could be every bit as informative as the tendencies, if not more so.

This needs to be shouted from the rooftops: Big Five tests–and other personality tests–do not tell you how extraverted, agreeable, conscientious, etc., you are. They tell you to what degree your self-identified traits match traits that people tend to correlate in their observations of others–and that researchers have therefore grouped in larger categories.

Now let us get to specifics. One of my qualms with personality tests is that they encourage self-revelation along the lines of “The test says I’m introverted, but I always thought I was extraverted, because I….” etc. etc. This doesn’t seem necessary or helpful. Let’s instead look at a hypothetical situation.

Someone takes a Big Five test and scores low on Agreeableness–but would be described by friends, as gentle, considerate, and kind. Of course there’s a discrepancy between how you see yourself and how others see you–but there’s also a problem of complexity. You may have many possibilities in your character; different ones come out at different times. If you come upon a statement like “I can be cold and uncaring,” you might ask yourself, “What does ‘can’ mean? How do I answer something like that? Is this asking how often I act or think in an uncaring way? Or how intense my lack of caring can be when it occurs? Is it asking about my outward affect, or about my thoughts?”

Or at a group level, what does 60% Agreeable mean? Does one person’s 60% resemble another’s, or did they score at 60% for different reasons?

Taking a taxonomy of social perception and turning it into an assessment of individual personality–even, shall  we say, social perception of individual personality–involves a few iffy leaps of reasoning. People treat those tests with much more certainty than they actually merit. But even without the tests, the taxonomy alone leaves one with questions and uncertainties. I am glad that there are researchers who look into the uncertainties and help us understand what they are.

Image credit: Wikimedia Commons.

Note: I made a few minor edits to this piece after posting it. In addition, I added a missing end quotation mark in the paragraph that begins “But let’s go back….”

A Lesson from the Power Pose Debacle

Amy Cuddy’s TED talk on power posing has thirty-seven million views. Its main idea is simple: if you adopt an expansive, authoritative pose, your actual power will increase. For evidence, Cuddy refers to a study she conducted in 2010 with Dana Carney and Andy Yap. Holes and flaws in the study have since been revealed, but Cuddy continues to defend it. Doubt fuels scientific inquiry, but in an era of TED-style certainty and snappiness, it gets short shrift. It is time to tap the reserves.

Recently TED and Cuddy appended a note to the summary of the talk: “Some of the findings presented in this talk have been referenced in an ongoing debate among social scientists about robustness and reproducibility.” In other (and clearer) words: The power pose study has not held up under scrutiny. At least two replications failed; Andrew Gelman, Uri Simonsohn, and others have critiqued it robustly; and Carney, the lead researcher, detailed the study’s flaws—and disavowed all belief in the effect of power poses—in a statement posted on her website. Jesse Singal (New York Magazine) and Tom Bartlett (The Chronicle of Higher Education) have weighed in with analyses of the controversy.

Very well, one might shrug aloud, but what should we, irregular members of the regular public, do? Should we distrust every TED talk? Or should we wait until the experts weigh in? Neither approach is satisfactory. When faced with fantastic scientific claims, one can wield good skepticism and follow one’s doubts and questions.

Before learning of any of this uproar, I found Cuddy’s talk unstable. Instead of making a coherent argument, it bounces between informal observations, personal experiences, and scientific references. In addition, it seems to make an error early on. Two minutes into her talk, Cuddy states that “Nalini Ambady, a researcher at Tufts University, shows that when people watch 30-second soundless clips of real physician-patient interactions, their judgments of the physician’s niceness predict whether or not that physician will be sued.” Which study is this? I have perused the Ambady Lab website, conducted searches, and consulted bibliographies—and I see no sign that the study exists. (If I find that the study does exist, I will post a correction here. Ambady died in 2013, so I cannot ask her directly. I have written to the lab but do not know whether anyone is checking the email.)

In separate studies, Ambady studied surgeons’ tone of voice (by analyzing subjects’ ratings of sound clips where the actual words were muffled) and teachers’ body language (by analyzing subjects’ ratings of soundless video clips). As far as I know, she did not conduct a study with soundless videos of physician-patient interactions. Even her overview articles do not mention such research. Nor did her study of surgeons’ tone of voice make inferences about the likelihood of future lawsuits. It only related tone of voice to existing lawsuit histories.

Anyone can make a mistake. On the TED stage, delivering your talk from memory before an enormous audience, you have a million opportunities to be fallible. This is understandable and forgivable. It is possible that Cuddy conflated the study of physicians’ tone of voice with the study of teachers’ body language. Why make a fuss over this? Well, if a newspaper article were to make such an error, and were anyone to point it out, the editors would subsequently issue a correction. No correction appears on the TED website. Moreover, many people have quoted Cuddy’s own mention of that study without looking into it. It has been taken as fact.

Why did I sense that something was off? First, I doubted that subjects’ responses to a surgeon’s body language predicted whether the doctor would be sued in the future. A lawsuit takes money, time, and energy; I would not sue even the gruffest surgeon unless I had good reason. In other words, the doctor’s personality would only have a secondary or tertiary influence on my decision to sue. On the other hand, it is plausible that doctors with existing lawsuit histories might appear less personable than others—if only because it’s stressful to be sued. Insofar as existing lawsuit histories predict future lawsuits, there might be a weak relation between a physician’s body language and his or her likelihood of being sued in the future. I suspect, though, that the data would be noisy (in a soundless kind of way).

Second, I doubted that there was any study involving videos of physician-patient interactions. Logistical and legal difficulties would stand in the way. With sound recordings—especially where the words are muffled—you can preserve anonymity and privacy; with videos you cannot. As it turns out, I was flat-out wrong; video recording of the doctor’s office has become commonplace, not only for research but for doctors’ own self-assessment.

It matters whether or not this study exists—not only because it has been taken as fact, but because it influences public gullibility. If you believe that a doctor’s body language actually predicts future lawsuits, then you might also believe that power pose effects are real. You might believe that “the vast majority of teachers reports believing that the ideal student is an extrovert as opposed to an introvert” (Susan Cain) or that “the whole purpose of public education throughout the world is to produce university professors” (Ken Robinson). The whole point of a TED talk is to put forth a big idea–but an idea’s size has little to do with its quality.

What to do? Questioning Cuddy’s statement, and statements like it, takes no special expertise, only willingness to follow a doubt. If TED were to open itself to doubt, uncertainty, and error—posting corrections, acknowledging errors, and inviting discussion—it could become a genuine intellectual forum. To help bring this about, people must do more than assume a doubting stance. Poses are just poses. Insight requires motion—from questions to investigations to hypotheses to more questions.  This is what makes science interesting and strong.  Science, with all its branches and disciplines, offers not a two-minute “life hack,” but rather the hike of a lifetime, full of doubt and vigor.

Update: TED has changed the title of Cuddy’s talk from “Your Body Language Shapes Who You Are” to “Your Body Language May Shape How You Are.” In addition, the talk’s page on the TED website has a “Criticism & updates” section, last updated in August 2017. Both are steps in the right direction.

Note: I made some revisions to this piece long after posting it.

Free Will and Education Reform

George Henry Hall: The PomegranateThe question of free will bursts into question upon question. What does it mean to have free will? To what degree do we exercise it? How can we know? For all the swarms of ideas on the subject, there seems to be agreement—among philosophers, theologians, poets, psychologists, and others—that whatever freedom we might have, we do not control other people or the outcomes of our actions (and if we could, it would be unwise). What a refreshing thought—and what a far cry from today’s education reform, which insists on our ability to control others’ results!

Literature from ancient Greek drama to contemporary psychology warns about illusions of control. In Aeschylus’s Agamemnon (in Robert Fagles’ translation), the Chorus sings, “And neither by singeing flesh / nor tipping cups of wine / nor shedding burning tears can you / enchant away the rigid Fury.” Rabbi Hanina states in the Gemara of Berachot (33b) of the Talmud, “Everything is in the power of heaven except the fear of heaven.” (There are numerous interpretations  of this statement.) In recent centuries, literary, philosophical, psychological, religious, and sociological writings have emphasized the futility (or danger) of trying to control others.

Yet much of education reform assumes we can and should control others–in particular, their measurable achievement. This assumption is profoundly wrong. To rate teachers on their students’ test performance is to distort the educational endeavor. Teachers influence students (and their influence is great); they do not cause students to do well or poorly. (It’s one thing to analyze the results; it’s another to convert them by formula into a rating.)

“Very well,” someone might respond, “so you’ve admitted that teachers influence students. Are you saying this influence doesn’t matter?” Of course it matters; it gives meaning to the work and helps teachers heed the alarm clock in the mornings. Still, whenever the student steps out to do something—take a test, give a presentation, or read further on the subject—this is the student’s action, not the teacher’s. The student has the credit and the dignity (or should).

“In that case, teachers might as well throw up their hands,” another might say. “If they aren’t held accountable for results, why should they bother trying?”

When you think you might influence (but not control) your students, there is all the more reason to try. You get to share in something that is not your own, something that goes beyond you. When a student does well, you have the honor of contributing to it in some way; when a student does poorly or runs into difficulties, you have sorrow and the self-questioning. Honor and sorrow and self-questioning and responsibility inspire me a great deal more than the publication of teachers’ “value-added ratings” in the newspaper.

It is not just that they inspire me more; it’s that they serve as better guides. I don’t know, and have no way of knowing, how great my influence will be or what form it will take (beyond concrete and immediate learning). That is all the more reason to put thought and effort into my lessons: I am participating in something partly knowable, partly mysterious, but in any case larger than myself. If I had wanted a predictable effect on things, I would have become a chocolatier, a producer of delight and cavities. Even then, my results would not have been uniform.

Yes, of course I want concrete learning to come out of my lessons; of course I want to see evidence of it. Even so, I do not make it happen, nor do I set its limits. Even less do I control what comes out of that learning.

Many economists would disagree. A 2011 study (by Raj Chetty, John N. Friedman, and Jonah E. Rockoff) concludes that teachers affect not only students’ performance on tests, but also their college attendance and future earnings.  Granted, they say “affect,” not “cause,” but then they extrapolate: “Replacing a teacher whose VA is in the bottom 5% with an average teacher would increase the present value of students’ lifetime income by more than $250,000 for the average classroom in our sample.”

I think of D. H. Lawrence’s  “Pomegranate”: “Do you mean to tell me you will see no fissure?”

I respect these scholars and acknowledge the care that went into the study. Still, its projections assume minimal variation among students, little that could interfere with their earnings, and little room for them to choose their directions in life. Presumably, if teachers could “increase” students’ lifetime income by more than $250,000 (a projection based on limited data), then we could boost the economy just by replacing the low-ranking teachers. We could replace our way to a better world.

But what if the students’ lifetime income didn’t increase as expected? What if these students faced layoffs, job changes, and life difficulties, or chose professions that didn’t pay especially well? What could one replace then, for better outcomes? Perhaps one could give each of their choices a value-added rating (in terms of how much income it produced) and demand that they make lucrative life choices. Someone would have to chase after them and make sure they did so.

What if illness and war and death got in the way? Well, one would have to replace those students who got sick or died, or who grieved the death of others. No room for mortality (or aging) in the picture, especially if it interferes with earnings.

We are left, then, with those select few who don’t age, fall ill, or die—and who, without fail, take actions that bring them more money.

We are down to no one—but there, in that world of none, we have attained prosperity!

Happy are those who do not inhabit that world.

Research Has Shown—Just What, Exactly?

In popular writing on psychology, science, and education, we often encounter the phrase “research has shown.” Beware of it. Even its milder cousin, “research suggests,” may sneak up and put magic juice in your eyes, so that, upon opening them, you fall in love with the first findings you see (until you catch on to the trick).

Research rarely “shows” much, for starters—especially research on that “giddy thing” known as humanity.* Users of the phrase “research has shown” often commit any of these distortions: (a) disregarding the flaws of the research; (b) misinterpreting it; (c) exaggerating its implications; or (d) cloaking it in vague language. Sometimes they do this without intent of distorting, but the distortions remain.

Let’s take an example that shows all these distortions. Certain teaching methodologies emphasize a combination of gesture, speech, and listening. While such a combination makes sense, it is taken to extremes by Whole Brain Teaching, a rapid call-and-response pedagogical method that employs teacher-class and student-student dialogue in alternation. At the teacher’s command, students turn to their partners and “teach” a concept, speaking about it and making gestures that the partner mimics exactly. Watch the lesson on Aristotle’s “Four Causes,” and you may end up dizzy and bewildered; why would anyone choose to teach Aristotle in such a loud and frenzied manner?

The research page of the Whole Brain Teaching website had little research to offer a few months ago. Now it points to a few sources, including a Scientific American article that, according to the WBT website,  describes “research supporting Whole Brain Teaching’s view that gestures are central to learning.” Here’s an instance of vague language (distortion d). Few would deny that gestures are helpful in teaching and learning. This does not mean that we should embrace compulsory, frenetic gesturing in the classroom, or that research supports it.

What does the Scientific American article say, in fact? There’s too much to take apart here, but this passage caught my eye: “Previous research has shown”—eek, research has shown!— “that students who are asked to gesture while talking about math problems are better at learning how to do them. This is true whether the students are told what gestures to make, or whether the gestures are spontaneous.” This looks like an instance of exaggerating the implications of research (distortion c); let’s take a look.

The word “told” in that passage links to the article “Making Children Gesture Brings Out Implicit Knowledge and Leads to Learning” by Sara C. Broaders, Susan Wagner Cook, Zachary Mitchell, and Susan Goldin-Meadow,  published in the Journal of Experimental Psychology, vol. 136, no. 4 (2007), pp. 539–550. The abstract states that children become better at solving math problems when told to make gestures (relevant to the problems) during the process. Specifically, “children who were unable to solve the math problems often added new and correct problem-solving strategies, expressed only in gesture, to their repertoires.” Apparently, this progress persisted: “when these children were given instruction on the math problems later, they were more likely to succeed on the problems than children told not to gesture.” So, wait a second here. They didn’t have a control group? Let’s look at the article itself.

The experimenters conducted two studies. The first one involved 106 children in late third and early fourth grade, whom the experimenters tested individually. For the baseline set, children were asked to solve six problems of the type 6 + 3 + 7 = ___ + 7, without being given any instructions on gesturing. Children who solved any of the problems correctly were eliminated from the study at the outset. (Doesn’t this bias the study considerably? Shouldn’t this be mentioned in the abstract?)

From there, the students were assigned to groups for the “manipulation phase” of the study. Thirty-three students were told to gesture; 35 were told to keep their hands still; and 38 were told to explain how they solved the problems. The students who were told to gesture added significantly more “strategies” to their manipulation than did the students in the other two groups; however, nearly all of these strategies were expressed in gesture only and not in speech. Across the groups, students added a mean number of 0.34 strategies to their repertoire, 0.25 of which were correct (the strategies, that is, not the solutions).

It is not clear how many students actually gave correct answers to the problems during the manipulation phase. The study does not provide this information.

The second study involved 70 students in late third and early fourth grade; none had participated in the first study. After conducting the baseline experiment (where no students solved the problems correctly), the researchers divided the students into two groups for the manipulation phase. Children in one group were told to gesture; children in the other group were told not to gesture. The researchers chose these two groups because they were “maximally distinct in terms of strategies added.” (How did they know this in advance? This is not clear.)

Again, the students who had been told to gesture added more strategies to their repertoire; those told not to gesture added none.  The researchers state later, in the “discussion” section of the paper: “Note that producing a correct strategy in gesture did not mean that the child solved the problems correctly. In fact, the children who expressed correct problem-solving strategies uniquely in gesture were, at that moment, not solving the problems correctly. But producing a correct strategy in gesture did seem to make the children more receptive to the later math lesson.”

After the children had solved and explained the problems in the manipulation phase, they were given a lesson on mathematical equivalence. (There was no such lesson in the first study.) The experimenter used a consistent gesture (moving a flat palm under the left side of the equation and then under the right side) for each of the problems presented. Then the students were given a post-test.

On the post-test, the students told not to gesture solved a mean of 2.2 problems correctly (out of six); those told to gesture solved a mean of 3.5 correctly. (I am estimating these figures from the bar graph.)

Why would anyone be impressed by the results? For some reason the researchers did not mention actual performance in the first study. In the second, it isn’t surprising that the students told not to gesture would fare worse on the test. A prohibition against gesturing could be highly distracting, as people tend to gesture naturally in one way or another. Again, there was no control group in the second study. Moreover, neither the overall mean performance on the test or the performance difference between the groups is particularly impressive, given that the problems all followed the same pattern and should have been easy for students who grasped the concept, provided they had their basic arithmetic down.

The researchers do not draw adequate attention to the two studies’ caveats or consider how these caveats might influence the conclusion (distortions a and b). In the “discussion” section of the paper, they state with confidence that “Children told to gesture were more likely to learn from instruction than were children told not to gesture.”

This is just one of myriad examples of research not showing what it claims to show or what others claim it shows. I have read research studies that gloss over their own gaps and weaknesses; popular articles that exaggerate the implications of this research; and practitioners who cite the  popular articles in support of their particular method. When I hear the phrase “research has shown,” I immediately suspect that it isn’t so.

*From Shakespeare’s Much Ado About Nothing; thanks to Jamie Lorentzen for reminding me of the phrase.