Psychological Statistics: What's up with social psychology?

... or to be more precise, what's up with experimental social psychology?

A number of high profile cases of suspected (in some cases admitted) fraud have been highlighted in psychology recently - my own discipline - but they've (nearly all arisen in experimental social psychology. If you are unaware, the best known cases seem to be Diederik Stapel, Dirk Smeesters and now Lawrence Sanna. Another high profile case, Marc Hauser, is in a somewhat related field (but a stretch to call it experimental social psychology). The not so recent case of Karen Ruggiero could also be included.

A separate problem - set aside from the problem of deliberate fraud - are some controversies over specific studies which have apparently hard to replicate effects. The discussion here is around Bargh's priming study and Bem's ESP study. There is ample discussion of this elsewhere, but the main point is that the standard practices in experimental social psychology may encourage publication of spurious effects.

Fraud and other kinds of academic misconduct are rare and far from confined to psychology - see retraction watch (though the scale of Stapel's fraud may have raise psychology's profile on its own). However, the spotlight is focusing heavily on social psychology the moment. My initial view was that experimental social psychology was coming up purely by coincidence, but the recent cases have made me wonder. In the rest of this post I'm going to sketch out some thoughts on what might be going on.

(1) Coincidence. There remains quite good evidence for the whole thing being coincidence. Psychology and social psychology are popular fields with lots of researchers so there will (sadly) be a few frauds. Deliberate fraud is a rare event and (to quote the late, great Robert Abelson) "probability is lumpy". Discrete random events are appear to be evenly or smoothly distributed in the long run (averaging over many events) so rare events are usually clustered in a given sample of small, fixed n. So if you look a 10 or 100 fraud cases there are bound to be clusters among certain disciplines.

(2) Deep discipline-specific flaws. Is there something fundamentally wrong with experimental social psychology research itself? Perhaps. The Bem and Bargh cases point to problems such as lack of replication, pressure to publish, over-emphasis on p values, intolerance of messy data and desire for surprising or counter-intuitive effects. The problem with most of these arguments is that they are not discipline-specific and they are often cited as factors leading to fraud in other disciplines. On the other hand it may be that one or more of these factors are particularly pronounced in experimental social psychology (and I'll come back to this point later).

(3) Enhanced scrutiny. There are three strong reasons to suspect enhanced scrutiny contributes to the recent cases. First, the reports of fraud or other problems are not independent. A case - particularly a big one such as the Diederik Stapel case - necessarily draws further scrutiny to particular journals, groups of researchers and perhaps the whole of a field or discipline. Second, several of the cases were uncovered by the same person: Uri Simonsohn. As Simonsohn works broadly in the area of experimental social psychology, it isn't that surprising that he applied his fraud detection tools to suspicious studies in his own field. Third, findings in experimental social psychology compete for explanatory power more closely with folk psychology explanations than most other fields. To put it another way, just about everyone can assess the plausibility of a study that looks at whether exposing people to old age related material in the lab makes walk faster when they leave the lab or whether eating meat makes you more aggressive. Moreover, they often have strong opinions on these kinds of findings.

At the moment, I think the enhanced scrutiny explanation looks like a strong contender to me. I wouldn't rule out coincidence, but I think we can expect to see a few more dodgy studies unearthed. I think we can also expect to see the label of social psychology expanded to include suspect research in related areas (such as Marc Hauser's work).

Nevertheless, I do think there is one area in which experimental social psychology may be particularly vulnerable to fraud or questionable research practices. High status journals often seek interesting (aka surprising) effects and large effect sizes in the papers they publish. Such findings are more likely to be false (e.g., see here). This is part of a general problem with statistical significance which acts a filter (see Andrew Gelman's blog for lots of discussion on this). A single small experiment can usually only detect relatively 'big' effects - hence it overestimates the size of effects. When you add an implicit requirement for 'big' effects you are biasing your journal or discipline to spurious and fraudulent results. Thus far experimental psychology isn't so different from other fields where small studies are common (e.g., much of medicine, health, neuroscience, biology, and education). The problem may be that effects are inherently smaller in experimental social psychology than other areas of psychology.

I've put the label 'big' in italics because what we're really talking about is the detectability or discriminability of an effect (standardized effect size) - which is its size relative to the noise or error in the data. Experiments with social stimuli are inherently noisy because there are so many variables to control for and because it is often difficult to use big manipulations (as they tend to be pretty obvious to participants). Of course many of the effects may truly be tiny. For example the age priming effect seems plausible to me but I can't believe it would be a large absolute effect in terms of walking speed (easily swamped by other factors or exaggerated by them) - thus my guess is that the original Bargh study over-estimated the effect size (as most early studies tend to).

I think that social psychology and psychology will learn from these cases and the increased scrutiny that seems to be around. I hope we will improve our statistical work, place greater value on replication and reduce the ridiculous pressure to publish ground-breaking, surprising, counter-intuitive work with high frequency. Ground-breaking work will get published, but you can't really tell what research will have real scientific impact until years later (at least two or three years and often much longer, in my view). I hope that psychologists (particularly editors and reviewers) will be more tolerant of messy data (see here) and not quite perfectly watertight conclusions. Many fraudulent studies are detected because of data that are far too clean (real data tend to be messy).

Psychological Statistics

Saturday, August 04, 2012

What's up with social psychology?

No comments:

Post a Comment