In This Grand Primordial Mess

notmessy

Messy people (including me) may be on the up-and-up. Behold, to the left, a desk, my desk. This is about as unmessy as it gets. At least once a week, the piles at least triple. They flow onto each other. They threaten to converge and topple. So I bring them down a little and start again. That has been my life since adulthood. In childhood and adolescence, it was much worse; my mess didn’t even organize itself into piles. But I enjoyed it in some way and did not want to become neat. Others tried to get me to organize myself; although I did, a little, over time, I also kept a good deal of messiness, since it allowed me to focus on other things.

So I was delighted to see Jesse Singal’s article on mess. Apparently there are more mess-defenders in the world than I thought. I learned about a new book, Messy:  The Power of Disorder to Transform Our Lives, by Tim Harford. Unfortunately, though, the title gave me IS (Instant Skepticism). It sounds like another “Great Secret to Creativity” book. I hope it’s not that. There’s lots to be said for a degree of messiness, but I don’t for a messy second believe that becoming messy will make you more creative or successful. (It may be that the title only flops askew over the book’s actual contents; I will wait to see.)

When and how can messiness be good? Well, first of all, it’s just the way some of us are. My students have described me as organized, but that’s probably because I have learned over time how to handle my mess. Even so, I don’t organize myself more than I have to. It takes too much time, and I have my mind on other things. I work better if I don’t have to worry all the time about putting things in their  proper places. As long as I know where to find them, and as long as I keep them in good condition, I’m fine.

I need some messiness; I need the freedom to pile book on top of book while I am looking into an idea and writing out an argument. Also, I like the look and feel of mess (up to a point); it reminds me of things I and others have been doing, and it keeps an array of materials at hand. This cannot and should not be pre-engineered; it’s just the way I work.

It may well be true that all creativity involves some messiness. This does not mean that you arrive at creativity by generating mess. Mess comes in different forms; there are people who maintain an impeccably neat exterior but allow themselves a pile of loose ends in the mind. There are those whose mess occurs in blogging, or in speaking, or in musical tastes. It’s unlikely that any “messy regime” will help anyone produce a work of brilliance.

On the other hand, it is nice to see some people questioning the despotism of neatness. Talk about hegemony. Some of us (including me) have had points taken off, throughout our lives, because we didn’t write as neatly as others, organize our notebooks clearly, take legible notes in class, or put everything away immediately after using it. For the sake of justice alone, I am happy to join in praise of limited mess.

Speaking of mess: I was delighted to come upon some videos of a 1978 concert by the Roches. I first heard them in 1982 (thanks to a friend who insisted I come hear them). I had forgotten just how beautifully messy (yet in time and in tune and inspired) they were. Here they are performing the wonderful “We.”

Oh, the title of this blog: Once upon a time, in 1989, someone’s beautiful mess, and the occasion of a tornado, inspired a sonnet from me. Here it is.

Tornado, July 10, 1989

The winds began to imitate your prance,
a rolling soda can became the lyre,
the sirens sang the lyrics, mixing fire
with something like your name. The dance grew dense,
a cat shot an accusatory glance,
and time was canceled. Wood, debris, and wire
were pulled like windowshades to curb desire,
since pagan hail had trampled down the fence.

Thinking survival hardly worth the cost,
I risked electrocution or success,
clambering over what was once a street,
with hopes that in this grand primordial mess
finding you in your element, I’d greet
what never had been had, and still was lost.

Lectures, Teams, and the Pursuit of Truth

One of these days, soon, I’ll post something about teaching. Since I’m not teaching this year, I have had a chance to pull together some thoughts about it.

In the meantime, here are a few comments I posted elsewhere. First, I discovered, to my great surprise, that Andrew Gelman seeks to “change everything at once” about statistics instruction—that is, make the instruction student-centered (with as little lecturing as possible), have interactive software that tests and matches students’ levels, measure students’ progress, and redesign the syllabus. While each of these ideas has merit and a proper place, the “change everything” approach seems unnecessary. Why not look for a good combination of old and new? Why abandon the lecture (and Gelman’s wonderful lectures in particular)?

But I listened to the keynote address (that the blog post announced) and heard a much subtler story. Instead of trumpeting the “change everything” mantra into our poor buzzword-ringing heads, Gelman asked questions and examined complexities and difficulties. Only in the area of syllabus did he seem sure of an approach. In the other areas, he was uncertain but looking for answers. I found the uncertainty refreshing but kept on wondering, “why assume that you need to change everything? Isn’t there something worth keeping right here, in this very keynote address about uncertainties?”

Actually, the comment I posted says less than what I have said here, so I won’t repeat it. I have made similar points elsewhere (about the value of lectures, for instance).

Next, I responded to Drake Baer’s piece (in New York Magazine’s Science of Us section), “Feeling Like You’re on a Team at Work Is So Deeply Good for You.” Apparently a research team (ironic, eh?) lead by Niklas Steffens at University of Queensland found that, in Baer’s words, “the more you connect with the group you work with—regardless of the industry you’re in—the better off you’ll be.”

In my comment, I pointed out that such associations do not have to take the form of a team—that there are other structures and collegial relations. The differences do matter; they affect the relation of the individual to the group. Not everything is a team. Again, no need to repeat. I haven’t yet read the meta-study, but I intend to do so.

Finally, I responded to Jesse Singal’s superb analysis of psychology’s “methodological terrorism” debate. Singal points to an underlying conflict between Susan Fiske’s wish to protect certain individuals and others’ call for frank, unbureaucratic discussion and criticism. To pursue truth, one must at times disregard etiquette. (Tal Yarkoni, whom Singal quotes, puts it vividly.) There’s much more to Singal’s article; it’s one of the most enlightening new pieces I have read online all year. (In this case, by “year” I  mean 2016, not the past twelve days since Rosh Hashanah.)

That’s all for now. Next up: a piece on teaching (probably in a week or so). If my TEDx talk gets uploaded in the meantime (it should be up any day now), I’ll post a link to it.

What Does “Predict” Mean in Research?

In her TED talk, Amy Cuddy says,

Nalini Ambady, a researcher at Tufts University, shows that when people watch 30-second soundless clips of real physician-patient interactions, their judgments of the physician’s niceness predict whether or not that physician will be sued.

This is quoted all over the place, yet I have been unable to track down the study. Maybe it is unpublished or in press, or maybe it is published under a title that doesn’t mention physicians or videos.

In the meantime, I wonder whether Cuddy might have conflated two separate studies by Ambady: one of surgeons’ tone of voice (2002), and another of soundless clips of teachers (1993).

But my greater concern is with the word “predict.” As Cuddy puts it, the “judgments of a physicians’ niceness” actually predict whether or not that same physician will be sued in the future. To determine this, a researcher would have to follow the physicians over the long term and compare their subsequent lawsuit patterns (if any) to the initial ratings.

That would be terrifically difficult to accomplish. First, doctors with malpractice litigation histories are a small percentage of the whole, so the sample size would be tiny. Second, how long do you wait for a doctor to be sued? Two years? Five? Ten? Indefinitely?

Instead, I suspect the study followed a procedure similar to that of “Surgeons’ Tone of Voice.” (If I find out I am wrong, I will post a correction.) That is, the ratings of the videos were related to doctors’ existing lawsuit history. If there was any prediction, it was retrospective; as the authors state in the surgeon study, “Controlling for content, ratings of higher dominance and lower concern/anxiety in their voice tones significantly identified surgeons with previous claims compared with those who had no claims” (emphasis mine).

Why does this matter? There’s a big difference between predicting past and future events. People are easily dazzled by the idea that a thirty-second video clip (or sound clip, or whatever it may be) can predict a future lawsuit. The idea that it might predict a person’s existing history is perhaps interesting (if it holds up under scrutiny) but less dazzling.

When it comes to doctors, I imagine those with a lawsuit history might be a little grumpier than the others. I can also see how tone of voice could affect patients’ sense of trust and comfort. But I know of no study that demonstrates that ratings of soundless video clips of physicians predict whether they will one day be sued.

To avoid turning scientific research into a magic show, use the word “predict” carefully and precisely. Also, give a little more detail when referring to research, so that those interested can look up the study in question.

Update: I just learned that Ambady died in 2013. See the comments below. Thanks to Shravan Vasishth for the information.

Another update: Thanks to Martha Smith for explaining various  kinds of “prediction” in statistics. (See here and here.)

Time and Happiness Again

What do people want: more money or more time? Who is happier: those who want money, or those who want time? Do these questions mean the same things to different people? Do they mean the same thing to the same person at different times? Do we know what we’re doing when we rate our own happiness?

A few weeks ago I commented on a study by Hal E. Hershfield, Cassie Mogilner, and Uri Barnea, “People Who Choose Time Over Money Are Happier” (Social Psychological and Personality Science, vol. 7, no. 7 [2016], 697-706; see also the authors’ NYT article). I saw possible problems with it but did not have time to read it closely. My criticism was a bit caustic and uninformed; I ended up disliking and deleting the post. I regret the tone but not the critical impulse.

Now looking at the actual study again, I find it both stronger and weaker than I previously thought.

It is stronger in its versatility. The authors considered many possibilities; they were continually revising and refining their hypotheses and tests.

But that’s also a problem. The paper’s seven studies go in somewhat different directions; in my reading, they don’t point together to a conclusion.

Here they are:

Study 1a: 1,301 participants (1,226 in the final sample) were recruited through Mechanical Turk and asked about their preference for time or money. They were also asked to rate their happiness and life satisfaction. The order of these questions was balanced among the participants (I missed this point the first time around).

More people chose money than time, but those who chose time reported greater happiness than those who chose money. The difference does not seem great to me, regardless of statistical significance (M = 4.65, SD = 1.32 vs. M = 4.18, SD = 1.38), but I may be wrong here.

Study 1b: The authors do not describe this in detail, but they claim to have replicated the results of 1a while controlling for materialism. Participants (N = 1,021) were again recruited through Mechanical Turk.

Study 2: This time, 535 participants were recruited in the train station of a major East Coast city and offered a Granola bar to complete the survey. 429 actually did complete it. They reported substantially higher income than the participants in 1a and 1b; also, a majority (55%) chose time over money, unlike the MTurk participants, who tended to choose money over time. (Did the train station setting affect this in any way, I wonder?) Those who chose time were again happier, by their own rating, than those who chose money (M = 5.28, SD = 0.93 vs. M = 4.91, SD = 1.10).

Study 3a: This time, the researchers sought to find out why people preferred what they did.  So they recruited participants through  MTurk, asked them which they preferred (time or money), asked them to explain why, and then asked  them to rate their happiness. This time, the order of the questions was fixed.  They saw a split between using the resource to cover needs and using it to cover wants, as well as a split between using the resource for others and using it for  oneself. Something curious appears here: participants indicated whether they wanted more time in their days or in their lives. While the desire for more time (generally) correlated with happiness, the desire for more time in one’s day did not, nor did the desire for more time in one’s life. I wonder what this means.

Study 3b: This time, 1,000 participants were recruited through Qualtrics for a nationally representative sample. 943 ended up participating. As in most of the previous studies, the majority indicated a preference for more money over more time, but those who chose time rated themselves as happier. In addition, the ones who indicated that they  would spend the resource on wants were happier , by their own rating, than those who said they would spend it on needs; those who said they would spend it on others were happier than those who said they would spend it on themselves. There were some additional findings. (One interesting detail: The Qualtrics participants were on average 15-2o years older than the MTurk and train station participants; also, a much lower percentage were employed.)

Study 4a: This was the first of two manipulation checks. Participants were recruited through MTurk and assigned randomly to one of three conditions: a “wanting time” condition, in which they were instructed to write about why they wanted more time, a “wanting money” condition (likewise with a writing task), and a control condition, for which they had to write down 10 facts. Then they were asked to rate their happiness. Finally, they were to indicate which they would rather have, more time or more money.

Those in the “want time” condition (randomly assigned) tended to indicate a preference for more time;  those in the “want money” condition, for more money. The difference in happiness was marginal across the groups, but those in the “want time” condition were slightly happier by their own rating than those in the “want money” condition.

Study 4b: This was the last of the studies and the second manipulation check. This time, participants (again recruited through MTurk) were assigned randomly to a happy condition (instructed to write about why they were happy), an unhappy condition (instructed to write about why they were unhappy), and a control condition (without a writing task). They were then asked to rate their happiness. Finally, they were asked questions about their resource preference. Those in the happy condition reported greater happiness (and a greater preference for time) than those in the unhappy condition.

There are some details I have left out for brevity’s sake:  for instance, the researchers included some questions about subjective and objective income and controlled for these.  But this is the gist.

Now for some thoughts:

First of all, these seem like pre-study experiments rather than complete studies, in that they deal with different populations, questions, and methodologies. It is good that the researchers were refining their questions and analyses along the way, but in the process they may have come up with explanations that they did not rigorously test. For instance, the relation between an emphasis on wants (rather than needs) and happiness seems hypothetical, even if it makes intuitive sense. There’s a flipside: people can drive themselves into a tizzy by thinking about things they want but don’t have.

Second—and this concerns me more—studies 4a and 4b suggest that participants’ preferences and happiness ratings can be manipulated by something as simple as a writing task. It’s possible that most people want more money and more time; what they think they want at a given moment may have a lot to do with what’s going on around them.

Also, I suspect that the MTurk participants, especially those completing surveys for the money, might be a financially stressed bunch. That could influence the findings considerably.

In addition, money and time are not easily separable. That is my greatest qualm. I wonder how many participants thought: “Well, I’d like to have both, but I think the money would allow me to buy more time, so I’ll choose money.”

Who, then, would choose time? Maybe people who have something important in their lives. People may desire money for all sorts of things—leisure, power, luxury, relief from debt, etc.—but those who wish for more time probably have something in the works that they enjoy or value. That in itself could explain why they rate their happiness a little higher than the others do.

But then, how accurate is my assessment of my happiness? How accurate is it ever? It can fluctuate throughout the day;  moreover, it can grow (or shrink) in retrospect. Forsan et haec olim meminisse iuvabit (Virgil, Aeneid); in the translation of Robert Fagles, “A joy it will be one day, perhaps, to remember even this.”

Research Has Shown … Just What, Exactly? (Reprise)

A few years ago, I wrote a piece with this title, minus the “(Reprise).” (And here’s a piece from 2011.)

It seems apt today (literally today) in light of Dana Carney’s statement, released late last night, that she no longer believes  “power pose” effects are real. She explains her reasons in detail. I learned about this from a comment on Andrew Gelman’s blog; an hour and a half later,  an article by Jesse Singal appeared in Science of Us (New York Magazine).

Dana Carney was one of three authors of the 2010 study, popularized in Amy Cuddy’s TED Talk, that supposedly found that “power poses” can change your hormones and life. (Andy Yap was the third.)

The “power pose” study has been under criticism for some time; a replication failed, and an analysis of both the study and the replication turned up still more problems.  (For history and insight, see Andrew Gelman and Kaiser Fung’s Slate article.) Of the three researchers involved, Carney is the first to acknowledge the problems with the original study and to state that the effects are not real.

Carney not only acknowledges her errors but explains them clearly. The statement is an act of courage and edification. This is how research should work; people should not hold fast to flawed methods and flimsy conclusions but should instead illuminate the flaws.

 

Update: Jesse Singal wrote an important follow-up article, “There’s an Interesting House-of-Cards Element to the Fall of Power Poses.” He discusses, among other things, the ripple effect (or house-of-cards effect) of flawed studies.

 

Research Has Shown—Just What, Exactly?

In popular writing on psychology, science, and education, we often encounter the phrase “research has shown.” Beware of it. Even its milder cousin, “research suggests,” may sneak up and put magic juice in your eyes, so that, upon opening them, you fall in love with the first findings you see (until you catch on to the trick).

Research rarely “shows” much, for starters—especially research on that “giddy thing” known as humanity.* Users of the phrase “research has shown” often commit any of these distortions: (a) disregarding the flaws of the research; (b) misinterpreting it; (c) exaggerating its implications; or (d) cloaking it in vague language. Sometimes they do this without intent of distorting, but the distortions remain.

Let’s take an example that shows all these distortions. Certain teaching methodologies emphasize a combination of gesture, speech, and listening. While such a combination makes sense, it is taken to extremes by Whole Brain Teaching, a rapid call-and-response pedagogical method that employs teacher-class and student-student dialogue in alternation. At the teacher’s command, students turn to their partners and “teach” a concept, speaking about it and making gestures that the partner mimics exactly. Watch the lesson on Aristotle’s “Four Causes,” and you may end up dizzy and bewildered; why would anyone choose to teach Aristotle in such a loud and frenzied manner?

The research page of the Whole Brain Teaching website had little research to offer a few months ago. Now it points to a few sources, including a Scientific American article that, according to the WBT website,  describes “research supporting Whole Brain Teaching’s view that gestures are central to learning.” Here’s an instance of vague language (distortion d). Few would deny that gestures are helpful in teaching and learning. This does not mean that we should embrace compulsory, frenetic gesturing in the classroom, or that research supports it.

What does the Scientific American article say, in fact? There’s too much to take apart here, but this passage caught my eye: “Previous research has shown”—eek, research has shown!— “that students who are asked to gesture while talking about math problems are better at learning how to do them. This is true whether the students are told what gestures to make, or whether the gestures are spontaneous.” This looks like an instance of exaggerating the implications of research (distortion c); let’s take a look.

The word “told” in that passage links to the article “Making Children Gesture Brings Out Implicit Knowledge and Leads to Learning” by Sara C. Broaders, Susan Wagner Cook, Zachary Mitchell, and Susan Goldin-Meadow,  published in the Journal of Experimental Psychology, vol. 136, no. 4 (2007), pp. 539–550. The abstract states that children become better at solving math problems when told to make gestures (relevant to the problems) during the process. Specifically, “children who were unable to solve the math problems often added new and correct problem-solving strategies, expressed only in gesture, to their repertoires.” Apparently, this progress persisted: “when these children were given instruction on the math problems later, they were more likely to succeed on the problems than children told not to gesture.” So, wait a second here. They didn’t have a control group? Let’s look at the article itself.

The experimenters conducted two studies. The first one involved 106 children in late third and early fourth grade, whom the experimenters tested individually. For the baseline set, children were asked to solve six problems of the type 6 + 3 + 7 = ___ + 7, without being given any instructions on gesturing. Children who solved any of the problems correctly were eliminated from the study at the outset. (Doesn’t this bias the study considerably? Shouldn’t this be mentioned in the abstract?)

From there, the students were assigned to groups for the “manipulation phase” of the study. Thirty-three students were told to gesture; 35 were told to keep their hands still; and 38 were told to explain how they solved the problems. The students who were told to gesture added significantly more “strategies” to their manipulation than did the students in the other two groups; however, nearly all of these strategies were expressed in gesture only and not in speech. Across the groups, students added a mean number of 0.34 strategies to their repertoire, 0.25 of which were correct (the strategies, that is, not the solutions).

It is not clear how many students actually gave correct answers to the problems during the manipulation phase. The study does not provide this information.

The second study involved 70 students in late third and early fourth grade; none had participated in the first study. After conducting the baseline experiment (where no students solved the problems correctly), the researchers divided the students into two groups for the manipulation phase. Children in one group were told to gesture; children in the other group were told not to gesture. The researchers chose these two groups because they were “maximally distinct in terms of strategies added.” (How did they know this in advance? This is not clear.)

Again, the students who had been told to gesture added more strategies to their repertoire; those told not to gesture added none.  The researchers state later, in the “discussion” section of the paper: “Note that producing a correct strategy in gesture did not mean that the child solved the problems correctly. In fact, the children who expressed correct problem-solving strategies uniquely in gesture were, at that moment, not solving the problems correctly. But producing a correct strategy in gesture did seem to make the children more receptive to the later math lesson.”

After the children had solved and explained the problems in the manipulation phase, they were given a lesson on mathematical equivalence. (There was no such lesson in the first study.) The experimenter used a consistent gesture (moving a flat palm under the left side of the equation and then under the right side) for each of the problems presented. Then the students were given a post-test.

On the post-test, the students told not to gesture solved a mean of 2.2 problems correctly (out of six); those told to gesture solved a mean of 3.5 correctly. (I am estimating these figures from the bar graph.)

Why would anyone be impressed by the results? For some reason the researchers did not mention actual performance in the first study. In the second, it isn’t surprising that the students told not to gesture would fare worse on the test. A prohibition against gesturing could be highly distracting, as people tend to gesture naturally in one way or another. Again, there was no control group in the second study. Moreover, neither the overall mean performance on the test or the performance difference between the groups is particularly impressive, given that the problems all followed the same pattern and should have been easy for students who grasped the concept, provided they had their basic arithmetic down.

The researchers do not draw adequate attention to the two studies’ caveats or consider how these caveats might influence the conclusion (distortions a and b). In the “discussion” section of the paper, they state with confidence that “Children told to gesture were more likely to learn from instruction than were children told not to gesture.”

This is just one of myriad examples of research not showing what it claims to show or what others claim it shows. I have read research studies that gloss over their own gaps and weaknesses; popular articles that exaggerate the implications of this research; and practitioners who cite the  popular articles in support of their particular method. When I hear the phrase “research has shown,” I immediately suspect that it isn’t so.

*From Shakespeare’s Much Ado About Nothing; thanks to Jamie Lorentzen for reminding me of the phrase.