One of the great things about writing a statistics book was finding an excuse to read about dozens of topics that I knew a little about but hadn't got around to studying in depth. Even so, there were a number of topics I ended up missing out on completely (apparently once the book gets to over a 900 pages or so they make you leave stuff out). One of those topics is partial least squares (PLS).
I knew a bit about the technique (but it turns out even less than I thought). I recently came across an excellent paper on partial least squares by Mikko Rönkkö, Cameron McIntosh and John Antonakis. The main thrust of the paper is simple - partial least squares is a widely used technique outside psychology, and it has been suggested should be more widely used within psychology. Rönkkö et al., however argue that this is probably a bad idea. A very bad idea. Their argument rests on two main arguments. First, that partial least squares is equivalent to a regression model using indicator variables to create weighted composite predictors. Second, that the benefits of partial least squares have been greatly overstated. In particular the claim that PLS can deal with measurement error seems simply to be be false (as just creating composites from indicator variables can't do this). Worryingly, some implementations of PLS seem to have dangerous properties (notably one with a 100% false positive rate) and PLS generally seems to inflate Type I error for small effects. The latter property may give the impression of attenuating measurement error (but merely provides a bias that that may sometimes counteract attenuation arising from measurement error).
Rönkkö et al. paper is, I think, a model of clarity and implies that PLS is going to be of limited value to psychologists. I found the paper particularly interesting because I have mostly seen PLS advocated as a way of dealing with multicollinearity. This makes sense as multicollinearity can reasonably be handled by replacing predictors with composites. The main drawback of PLS, however, is that the composites are derived automatically by the PLS algorithm. This sort of 'black box' solution produces good prediction but can overcapitalise on quirks in the sample and thus may not generalise (especially for small samples). More importantly, the composites may well be uninterpretable. For most psychological applications I'd rather use an interpretable but 'non-optimal' composite (e.g., a simple average of highly correlated predictors) than go down this route.
For the same reason I'd generally rather not use MANOVA (which finds an optimum linear combination of DVs in your sample). Of common analytic methods MANOVA is one of the least well understood techniques in psychology (and I have rarely seen a published application of MANOVA that wouldn't be enhanced using a different, often simpler, technique).
Aug
17
I Will Not Ever, NEVER Run a MANOVA
I have been thinking to write a paper about MANOVA (and in particular why it should be avoided) for some time, but never got round to it. However, I recently discovered an excellent article by Francis Huang that pretty much sums up most of what I'd cover. In this blog post I'll just run through the main issues and refer you to Francis' paper for a more in-depth critique or the section on MANOVA in Serious Stats (Baguley, 2012).
Jan
19
A brief introduction to logistic regression
I wrote a brief introduction to logistic regression aimed at psychology students. You can take a look at the pdf here:
A more comprehensive introduction in terms of the generalised linear model can be found in my book:
Baguley, T. (2012). Serious stats: a guide to advanced statistics for the behavioral sciences. Palgrave Macmillan.
A more comprehensive introduction in terms of the generalised linear model can be found in my book:
Baguley, T. (2012). Serious stats: a guide to advanced statistics for the behavioral sciences. Palgrave Macmillan.
May
18
Serious Stats: Obtaining CIs for Spearman's rho or Kendall's tau
I wrote a short blog (with R Code) on how to calculate corrected CIs for rho and tau using the Fisher z transformation.
May
13
Serious stats: Type II versus Type III Sums of Squares
I have written a short article on Type II versus Type III SS in ANOVA-like models on my Serious Stats blog:
https://seriousstats.wordpress.com/2020/05/13/type-ii-and-type-iii-sums-of-squares-what-should-i-choose/
https://seriousstats.wordpress.com/2020/05/13/type-ii-and-type-iii-sums-of-squares-what-should-i-choose/
Sep
5
Egon Pearson correction for Chi-Square
I have just published a short blog on the Egon Pearson correction for the chi-square test. This includes links to an R function to run the corrected test (and also provides residual analyses for contingency tables).
The blog is here and the R function here.
The blog is here and the R function here.
Sep
15
Provisional programme: ESRC funded conference: Bayesian Data Analysis in the Social Sciences Curriculum (Nottingham, UK 29th Sept 2017)
Bayesian Data Analysis in the Social Sciences Curriculum
Supported by the ESRC’s Advanced Training Initiative
Venue: Bowden Room Nottingham Conference Centre
Burton Street, Nottingham, NG1 4BU
Booking information online
Provisional schedule:
Organizers:
Thom Baguley twitter: @seriousstats
Mark Andrews twitter: @xmjandrews
Supported by the ESRC’s Advanced Training Initiative
Venue: Bowden Room Nottingham Conference Centre
Burton Street, Nottingham, NG1 4BU
Booking information online
Provisional schedule:
Organizers:
Thom Baguley twitter: @seriousstats
Mark Andrews twitter: @xmjandrews
Jun
13
STOP PRESS Introductory Bayesian data analysis workshops for social scientists (June 2017 Nottingham UK)
The third and (possibly) final round of the workshops of our introductory workshops was overbooked in April, but we have managed to arrange some additional dates in June.
There are still places left on these. More details at: http://www.priorexposure.org.uk/
As with the last round we are planning a free R workshop before hand (reccomended if you need a refresher or have never used R before).
There are still places left on these. More details at: http://www.priorexposure.org.uk/
As with the last round we are planning a free R workshop before hand (reccomended if you need a refresher or have never used R before).
May
25
Serious Stats blog: CI for differences in independent R square coefficients
In my Serious Stats blog I have a new post on providing CIs for a difference between independent R square coefficients.
You can find the post there or go direct to the function hosted on RPubs. I have been experimenting with knitr but can't yet get the html from R Markdown to work with my blogger or wordpress blogs.
You can find the post there or go direct to the function hosted on RPubs. I have been experimenting with knitr but can't yet get the html from R Markdown to work with my blogger or wordpress blogs.
Add a comment