A while ago I wrote a co-wrote chapter for an introductory psychology text book Essential Psychology: A Concise Introduction. This is a book edited and written by members of the department where I work. My contribution was the chapter on human memory (cunningly titled Memory).

I produced several plots for the chapter (some of which got cut due to severe space restrictions). One that stayed in was a serial position curve. For this plot I used data from Postman and Phillips (1965).

I feel particular proud of this plot because I was just beginning to use and learn R at the time (as opposed to dabbling) and because I had had a really hard time getting hold of the data. I first tried google, but had no joy (for some reason I thought someone would have put the raw data online, as it is a classic study - though maybe I just missed it). Then I searched for alternative data sets  (as around that period there were quite a few similar studies). I was probably being too picky, but whatever the reason I had no luck.

It would have been trivial to make up fake data, but that didn't feel right. What I eventually did (and wished I'd done straight away) was print out the original figure and measure all the points by hand. I then entered these values into a spreadsheet and tweaked and remeasured until all the summary statistics matched those in the original paper to about one decimal place. This was a lot quicker than I had thought. I cheated slightly because I only needed data from the 20 word conditions (so I could leave out the 10 and 30 word conditions).

(I'm pretty sure I could have used computer software to capture the raw data from an image file, but I'd have had to find the software, learn how to use it and do all the checking anyway. For a single figure I'm reasonably sure measuring by hand would be faster.)

In re-plotting it I noticed a few things that I hadn't paid much attention to before. The main one was the authors report frequency of recalls for 18 participants with 6 lists each. This means all scores are out of 108 and I suspect lots of casual readers would (like me) assume they were percentages. For re-plotting I rescaled the data as percentages.

The plot itself just uses basic R functions. I'm writing about it because:  i) I think it is a fairly clear illustration of how basic plot functions in R can produce what I think is a rather nice Figure. (The published version has been edited by the publisher, adding colour and making the style match figures in other chapters), ii) people may find it useful for teaching purposes. So please feel free to use and adapt the R code for non-commercial (e.g., teaching use).

First load the data from this .csv file (you will need to specify the path or change the working directory if the file is saved elsewhere).

pp65 <- read.csv("pp65.csv")

Then paste the following:

plot(pp65$SP, pch=NA, ylim=c(0,80), xlab= "Serial position", ylab= "Mean percentage recall", main = "Postman & Phillips (1965)", sub = '(20 word conditions only)')

points(pp65$C0, pch=19, col='black', cex=.7)
lines(pp65$C0, lty=3)
points(pp65$C15, pch=24, col='black', cex=.7)
lines(pp65$C15, lty=2)
points(pp65$C30, pch=22, col='black', cex=.7)
lines(pp65$C30, lty=5)
legend(3, 80, legend=c("No delay","15 second delay","30 second delay"), lty=c(3,2,5))

If you are new to R you can find out more about these plotting functions by using R help: ?par, ?plot, ?points and so on ...


Baguley, T., & Edmonds, A. J. (2010). Memory. In P. Banyard, M. N. O. Davies, C. Norman, & B. Winder (Eds.) Essential Psychology: A Concise Introduction (pp. 65-82). London: Sage.

Postman, L. & Philips, L. W. (1965). Short-term temporal changes in free recall. Quarterly Journal of Experimental Psychology, 17, 132-138.


Add a comment

I have been thinking to write a paper about MANOVA (and in particular why it should be avoided) for some time, but never got round to it. However, I recently discovered an excellent article by Francis Huang that pretty much sums up most of what I'd cover.
I wrote a brief introduction to logistic regression aimed at psychology students. You can take a look at the pdf here:  

A more comprehensive introduction in terms of the generalised linear model can be found in my book:

Baguley, T. (2012).
I wrote a short blog (with R Code) on how to calculate corrected CIs for rho and tau using the Fisher z transformation.
I have written a short article on Type II versus Type III SS in ANOVA-like models on my Serious Stats blog:

I have just published a short blog on the Egon Pearson correction for the chi-square test. This includes links to an R function to run the corrected test (and also provides residual analyses for contingency tables).

The blog is here and the R function here.
Bayesian Data Analysis in the Social Sciences Curriculum

Supported by the ESRC’s Advanced Training Initiative

Venue:           Bowden Room Nottingham Conference Centre

Burton Street, Nottingham, NG1 4BU

Booking information online

Provisional schedule:


Thom Baguley   twitter: @seri
The third and (possibly) final round of the workshops of our introductory workshops was overbooked in April, but we have managed to arrange some additional dates in June.

There are still places left on these.
In my Serious Stats blog I have a new post on providing CIs for a difference between independent R square coefficients.

You can find the post there or go direct to the function hosted on RPubs.
The third and (possibly) final round of the workshops is open for booking. As with the last round we are planning a free R workshop before hand (reccomended if you need a refresher or have never used R before), but can't offer bursaries for this.
This blog post was written for undergraduate research methods teaching. I have therefore tried to keep everything relatively simple and equation-free. The content is based loosely on more detailed material in my book Serious stats.
One fascinating thing about working in the area of psychological statistics is how hard it is to move people away from reliance on bad, inefficient or otherwise problematic methods.
It never occurred to me until today to write a post about why faking data is bad. However, I noticed an interesting exchange on Andrew Gelman's blog (see the comments on this post about Marc Hauser).
This article from my other blog may be of interest to readers of this blog: http://seriousstats.wordpress.com/2013/04/18/using-multilevel-models-to-get-accurate-inferences-for-repeated-measures-anova-designs/
There has been quite a bit of buzz recently about the Button et al. Nature Reviews Neuroscience paper on statistical power. Several similar reviews have been published in psychology and other disciplines and come to broadly the same conclusion - that most studies are underpowered.
The British Journal of Mathematical and Statistical Psychology has published a target article (with commentaries and reply) by Andrew Gelman and Cosma Shalizi on philosophy and the practice of Bayesian statistics.
I wasn't going to post on this ... but couldn't resist. A recent QJEP paper reports suspicious patterns in p values across three psychology journals.

This has been blogged elsewhere (see here and here), so I haven't got too much to add.
My serious stats book is officially published (in the UK at least). The US release date is next month (August 7th). I'm not sure why the release is later (possibly extra shipping time for the books).
Neuroskeptic has just blogged on a new paper by Judd, Westfall and Kenny on Treating stimuli as a random factor in social psychology: A new and comprehensive solution to a pervasive but largely ignored problem.
The companion web site for Serious Stats is now live:


The web site includes:

- a free sample chapter (Chapter 15: Contrasts)

- data sets

- R scripts

- 5 online supplements (for meta-analysis, multiple imputation, replication probabilities, pseudo-R s
This is a quick follow up to my earlier post that discussed how to graph CIs for within-subjects (repeated measures) ANOVA designs. My forthcoming book Serious stats describes how to do this for between-subjects designs (a much simpler problem).
I finally found some time to take a closer look at p curves.
I have finally got around to posting the R code for my p curve simulation. Those familiar with R will realize how crude it is (I've been caught up with other urgent stuff and had no time to explore further). You are welcome to play with (and improve!) the code.
