I finally found some time to take a closer look at p curves. I haven't had a chance to follow-up my simulations (and probably won't for a few weeks if not months), but I have had time to think through the ideas the p curve approach raises based on some of the comments I've received and a brief exchange with Uri Simonsohn (who has answered a few of my questions).

First, I got a couple of things at least partly wrong.

i) how p curves work
ii) the potential for correlated p values

How p curves work

I made the (I think) reasonable assumption that p curve analysis involved focusing on a bump just under the p = .05 threshold. Other work (Wicherts et al., 2011) has shown that there is indeed some distortion around this value. My crude simulation suggested that p curves could maybe be used to detect this kind of bump - but that the method was noisy and required large N.

All good so far except my assumption was completely wrong. This isn't what Simonsohn and colleagues are proposing at all. They are focusing on the whole of the distribution between p = 0 and p = .05. This is a very different kind of analysis because it uses all the available p value information about 'p hacking' (if you accept the highly plausible premise that p hacking is concentrated on statistically significant p values).

Null effects will therefore produce a flat p curve (because the distribution of p under the null is uniform). Simonsohn argues that non-null effects should produce downward sloping p curves. He and his colleagues have simulated p curves under various ranges of effect size to confirm this - and there is also an analytic proof for the normal case (Hung et al., 1997).*  I also (inadvertently) confirmed this in my original simulations - which show the downward sloping trend (but note that I include p values up to p = .10 in my plots).

However, mixing in p hacked studies to a flat curve will produce an upward sloping curve - the feature that Simonshohn and his colleagues are focusing on. I haven't simulated this directly - but it seems sensible because p hacking is (in essence) a flavour of optional stopping (adding data or iterating analyses until you squeeze a statistically significant effect out). Certainly, an upward sloping curve would be a signal of something wierd going on.

This approach uses more information than my mistaken 'p bump' approach and so should be much more stable.

* It is far from unreasonable to treat the distribution of effects as approximately normal - as is common in meta-analysis (and see also Gillett, 1994), but I don't think the pattern depends strongly on this assumption.

Correlated p values

It is well known that p values are inherently extremely noisy 'statistics' - they jump around all over the place for identical replications. Geoff Cumming and colleagues have published some good work on this (e.g., Cumming & Fidler, 2009). Thus the same effect in different studies or different effects of similar sizes will in general not tend to have correlated p values. However, the noise that causes this jumping around will be crystalized if you use the same data to re-calculate the p value. This could cause correlated p values where data is re-used or where variables are very highly correlated. For example, this could happen if you add a covariate that is a modest predictor of Y and uncorrelated with and report p values with and without the covariate. It could also happen if you report essentially the same analysis twice with a very similar variable (e.g., X correlated with children's age or X correlated with years of schooling).

There are two main solutions here: a) just filter out p values that re-use data or use highly-correlated data, or b) model the correlations in some way by accounting for within-study clustering - as you might in a multilevel model and some forms of meta-analysis (itself a form of multilevel model).

In summary, I think the p curve approach looks very interesting, and  I'd certainly like to see more work on it (and hope to see the full version published some time soon).

References

Cumming, G., & Fidler, F. (2009). Confidence Intervals. Zeitschrift für Psychologie / Journal of Psychology, 217(1), 15-26.
Gillett, R. (1994). Post hoc power analysis. Journal of Applied Psychology, 79(5), 783-785. doi:10.1037//0021-9010.79.5.783
Hung, H. M., O’Neill, R. T., Bauer, P., & Köhne, K. (1997). The behavior of the P-value when the alternative hypothesis is true. Biometrics, 53(1), 11-22.
Wicherts, J. M., Bakker, M., & Molenaar, D. (2011). Willingness to share research data is related to the strength of the evidence and the quality of reporting of statistical results. PloS one, 6(11), e26828. doi:10.1371/journal.pone.0026828

0

Add a comment

I have been thinking to write a paper about MANOVA (and in particular why it should be avoided) for some time, but never got round to it. However, I recently discovered an excellent article by Francis Huang that pretty much sums up most of what I'd cover. In this blog post I'll just run through the main issues and refer you to Francis' paper for a more in-depth critique or the section on MANOVA in Serious Stats (Baguley, 2012).
2

I wrote a brief introduction to logistic regression aimed at psychology students. You can take a look at the pdf here:  

A more comprehensive introduction in terms of the generalised linear model can be found in my book:

Baguley, T. (2012). Serious stats: a guide to advanced statistics for the behavioral sciences. Palgrave Macmillan.

I wrote a short blog (with R Code) on how to calculate corrected CIs for rho and tau using the Fisher z transformation.

I have written a short article on Type II versus Type III SS in ANOVA-like models on my Serious Stats blog:

https://seriousstats.wordpress.com/2020/05/13/type-ii-and-type-iii-sums-of-squares-what-should-i-choose/

I have just published a short blog on the Egon Pearson correction for the chi-square test. This includes links to an R function to run the corrected test (and also provides residual analyses for contingency tables).

The blog is here and the R function here.

Bayesian Data Analysis in the Social Sciences Curriculum

Supported by the ESRC’s Advanced Training Initiative

Venue:           Bowden Room Nottingham Conference Centre

Burton Street, Nottingham, NG1 4BU

Booking information online

Provisional schedule:

Organizers:

Thom Baguley   twitter: @seriousstats

Mark Andrews  twitter: @xmjandrews

The third and (possibly) final round of the workshops of our introductory workshops was overbooked in April, but we have managed to arrange some additional dates in June.

There are still places left on these. More details at: http://www.priorexposure.org.uk/

As with the last round we are planning a free R workshop before hand (reccomended if you need a refresher or have never used R before).

In my Serious Stats blog I have a new post on providing CIs for a difference between independent R square coefficients.

You can find the post there or go direct to the function hosted on RPubs. I have been experimenting with knitr  but can't yet get the html from R Markdown to work with my blogger or wordpress blogs.
1
Links
Blog Archive
Subscribe
Subscribe
About Me
About Me
Loading
Dynamic Views theme. Powered by Blogger. Report Abuse.