The companion web site for Serious Stats is now live:
http://www.palgrave.com/psychology/baguley/
The web site includes:
- a free sample chapter (Chapter 15: Contrasts)
- data sets
- R scripts
- 5 online supplements (for meta-analysis, multiple imputation, replication probabilities, pseudo-R squared and loglinear models)
Also don't forget the Serious stats blog to accompany the book.
Friday, March 23, 2012
Monday, March 19, 2012
Graphing between-subject confidence intervals for ANOVA
This is a quick follow up to my earlier post that discussed how to graph CIs for within-subjects (repeated measures) ANOVA designs. My forthcoming book Serious stats describes how to do this for between-subjects designs (a much simpler problem). The blog that accompanies the book now has a post summarizing the main options and explaining how to plot difference-adjusted CIs (95% CIs constructed so that non-overlapping intervals correspond to a statistically significant difference between means at p < .05). In addition, the post includes R functions to calculate and plot difference-adjusted CIs (though the calculations are not difficult to reproduce by hand).
UPDATE: I've now added functions for two-tiered CIs for between-subjects designs on the book blog. More generally my functions for the book, CIs for ANOVA and a few other things are all available here. I plan to update these functions regularly to add functionality and deal with any undocumented features.
Thursday, March 15, 2012
p curves revisited
I finally found some time to take a closer look at p curves. I haven't had a chance to follow-up my simulations (and probably won't for a few weeks if not months), but I have had time to think through the ideas the p curve approach raises based on some of the comments I've received and a brief exchange with Uri Simonsohn (who has answered a few of my questions).
First, I got a couple of things at least partly wrong.
i) how p curves work
ii) the potential for correlated p values
How p curves work
I made the (I think) reasonable assumption that p curve analysis involved focusing on a bump just under the p = .05 threshold. Other work (Wicherts et al., 2011) has shown that there is indeed some distortion around this value. My crude simulation suggested that p curves could maybe be used to detect this kind of bump - but that the method was noisy and required large N.
All good so far except my assumption was completely wrong. This isn't what Simonsohn and colleagues are proposing at all. They are focusing on the whole of the distribution between p = 0 and p = .05. This is a very different kind of analysis because it uses all the available p value information about 'p hacking' (if you accept the highly plausible premise that p hacking is concentrated on statistically significant p values).
Null effects will therefore produce a flat p curve (because the distribution of p under the null is uniform). Simonsohn argues that non-null effects should produce downward sloping p curves. He and his colleagues have simulated p curves under various ranges of effect size to confirm this - and there is also an analytic proof for the normal case (Hung et al., 1997).* I also (inadvertently) confirmed this in my original simulations - which show the downward sloping trend (but note that I include p values up to p = .10 in my plots).
However, mixing in p hacked studies to a flat curve will produce an upward sloping curve - the feature that Simonshohn and his colleagues are focusing on. I haven't simulated this directly - but it seems sensible because p hacking is (in essence) a flavour of optional stopping (adding data or iterating analyses until you squeeze a statistically significant effect out). Certainly, an upward sloping curve would be a signal of something wierd going on.
This approach uses more information than my mistaken 'p bump' approach and so should be much more stable.
* It is far from unreasonable to treat the distribution of effects as approximately normal - as is common in meta-analysis (and see also Gillett, 1994), but I don't think the pattern depends strongly on this assumption.
Correlated p values
It is well known that p values are inherently extremely noisy 'statistics' - they jump around all over the place for identical replications. Geoff Cumming and colleagues have published some good work on this (e.g., Cumming & Fidler, 2009). Thus the same effect in different studies or different effects of similar sizes will in general not tend to have correlated p values. However, the noise that causes this jumping around will be crystalized if you use the same data to re-calculate the p value. This could cause correlated p values where data is re-used or where variables are very highly correlated. For example, this could happen if you add a covariate that is a modest predictor of Y and uncorrelated with and report p values with and without the covariate. It could also happen if you report essentially the same analysis twice with a very similar variable (e.g., X correlated with children's age or X correlated with years of schooling).
There are two main solutions here: a) just filter out p values that re-use data or use highly-correlated data, or b) model the correlations in some way by accounting for within-study clustering - as you might in a multilevel model and some forms of meta-analysis (itself a form of multilevel model).
In summary, I think the p curve approach looks very interesting, and I'd certainly like to see more work on it (and hope to see the full version published some time soon).
References
Cumming, G., & Fidler, F. (2009). Confidence Intervals. Zeitschrift für Psychologie / Journal of Psychology, 217(1), 15-26.
Gillett, R. (1994). Post hoc power analysis. Journal of Applied Psychology, 79(5), 783-785. doi:10.1037//0021-9010.79.5.783
Hung, H. M., O’Neill, R. T., Bauer, P., & Köhne, K. (1997). The behavior of the P-value when the alternative hypothesis is true. Biometrics, 53(1), 11-22.
Wicherts, J. M., Bakker, M., & Molenaar, D. (2011). Willingness to share research data is related to the strength of the evidence and the quality of reporting of statistical results. PloS one, 6(11), e26828. doi:10.1371/journal.pone.0026828
Wednesday, March 14, 2012
R code for p curves
I have finally got around to posting the R code for my p curve simulation. Those familiar with R will realize how crude it is (I've been caught up with other urgent stuff and had no time to explore further).
You are welcome to play with (and improve!) the code. Changing delta will alter the (at present) fixed effect size. It would be more realistic to vary this (and the sample sizes). A good starting point for the effect size distribution (in the population) might be a normal distribution with say a mean of zero and a variance of 1 (see Gillett, 1994).
delta <- 0.5 m1 <- 10 sd <- 2 m2 <- m1 + sd*delta n1 <- n2 <-25 n.sims <- 500 p.data <- replicate(n.sims, t.test(rnorm(n1, m1,sd), rnorm(n2, m2,sd))$p.val, simplify=T) par(mfrow=c(5,3)) for (i in 1:15) { p.data <- replicate(n.sims, t.test(rnorm(n1, m1,sd), rnorm(n2, m2,sd))$p.val, simplify=T) hist(p.data, xlim=c(0,0.1), breaks = 99, col = 'gray') }
References
Gillett, R. (1994). Post Hoc Power Analysis. Journal of Applied Psychology, 79, 783-785.
Subscribe to:
Posts (Atom)