Guest post: Visualizing data using a 3D printer

In a break from my usual obsessions and interests here is a guest blog post by Ian Walker. I'm posting it because I think it is rather cool and hope it will be of interest to some of my regular readers. Ian is perhaps best known (in the blogosphere) for his work on transport psychology - particularly cycling - but is also an expert on psychological statistics.

Some time ago, I had some data that lent themselves to a three-dimensional surface plot. The problem was, the plot was quite asymmetrical, and finding the right viewing angle to see it effectively on a computer screen was extremely difficult. I spent ages tweaking angles and every possible view seemed to involve an unacceptable compromise.

Of course, displaying fundamentally three-dimensional items in two dimensions is an ancient problem, as any cartographer will tell you. That night, as I lay thinking in bed, a solution presented itself. I had recently been reading about the work of a fellow University of Bath researcher, Adrian Bowyer, and his RepRap project, to produce an open-source three-dimensional printer. The solution was obvious: I had to find a way to print R data on one of these printers!

I managed to meet up with Adrian back in May 2012, and he explained to me the structure of the STL (stereolithography) files commonly used for three-dimensional printing. These describe an object as a large series of triangles. I decided I'd have a go at writing R code to produce valid STL files.

I'm normally a terrible hacker when it comes to programming; I usually storm in and try to make things work as quickly as possible then fix all the mistakes later. This time, I was much more methodical. As a little lesson to us all, the methodical approach worked: I had the core code producing valid STL files in under 3 hours.

Unfortunately, it then took until September 2012 before I could get hold of somebody with a 3D printer who'd let me test my code. A few days ago the first prototype was produced, as you can see in this photograph:

So now I'd like to share the code under a Creative Commons BY-NC-SA licence, in case anybody else finds it useful. You can download the code here, in a file called r2stl.r. One day, when I learn how, I might try to make this a library, but for now you can just call this code with R's source() command. All that is in the file is the function r2stl(), and having once called the file with source(), you can then use the r2stl function to generate your STL files. The command is:

r2stl(x, y, z, filename='3d-R-object.stl', object.name='r2stl-object', z.expand=FALSE, min.height=0.008, show.persp=FALSE, strict.stl=FALSE)

x, y and z should be vectors of numbers, exactly as with R's normal persp() plot. x and y represent a flat grid and z represents heights above this grid

filename is pretty obvious, I hope

object.name The STL file format requires the object that is being described to have a name specified inside the file. It's unlikely anybody will ever see this, so there's probably no point changing it from the default

z.expand By default, r2stl() normalizes each axis so it runs from 0 to 1 (this is an attempt to give you an object that is agnostic with regard to how large it will eventually be printed). Normally, the code then rescales the z axis back down so its proportions relative to x and y are what they were originally. If, for some reason, you want your 3D plot to touch all six faces of the imaginary cube that surrounds it, set this parameter to TRUE

min.height Your printed model would fall apart if some parts of it had z values of zero, as this would mean zero material is laid down in those parts of the plot. This parameter therefore provides a minimum height for the printed material. The default of 0.008 ensures that, when printed, no part of your object is thinner than around 0.5 mm, assuming that it is printed inside a 60 mm x 60 mm x 60 mm cube. Recall that the z axis gets scaled from 0 to 1. If you are printing a 60mm-tall object then a z-value of 1 represents 60mm. The formula is min.height=min.mm/overall.mm, so if we want a minimum printed thickness of 0.5mm and the overall height of your object will be 60mm, 0.5/60 = 0.008, which is the default. If you want the same minimum printed thickness of 0.5mm but want to print your object to 100mm, this parameter would be set to 0.5/100 = 0.005

show.persp Do you want to see a persp() plot of this object on your screen as the STL is being generated? Default is FALSE

strict.stl To make files smaller, this code cheats and simply describes the entire rectangular base of your object as two huge triangles. This seems to work fine for printing, but isn't strictly proper STL format. Set this to TRUE if you want the base of your object described as a large number of triangles and don't mind larger files

To view and test your STL files before you print them, you can use various programs. I have had good experiences with the free, open-source Meshlab, which even has iPhone and Android versions so you can let people interact with your data even when you're in the pub. Even if all you ever do is show people your 3D plots using Meshlab, I believe r2stl() still offers a useful service, as it makes viewing data far more interactive than static persp() plots. To actually get your hands on a printer, you might try your local school - apparently lots of schools have got rapid prototypers these days.

Demo


source('r2stl.r')

# Let's do the classic persp() demo plot, as shown in the photograph above

x <- seq(-10, 10, length= 100)

y <- x

f <- function(x,y) { r <- sqrt(x^2+y^2); 10 * sin(r)/r }

z <- outer(x, y, f)

z[is.na(z)] <- 1

r2stl(x, y, z, filename="lovelyfunction.stl", show.persp=TRUE)



# Now let's look at R's Volcano data

z <- volcano

x <- 1:dim(volcano)[1]

y <- 1:dim(volcano)[2]

r2stl(x, y, z, filename="volcano.stl", show.persp=TRUE)

I hope you might find this code useful. Any questions or suggestions, then please get in touch.

September 2012 - Ian Walker, Department of Psychology, University of Bath.

This post is based on an interdisciplinary collaboration (between anthropologists and psychologists) with Kate Ellis-Davies and Sheina Lew-Levy (who initiated the project), Eleanor Fleming and Adam Boyette. The work was recently published in Field Methods and is available here: Demonstrating the Utility of Egocentric Relational Event Modeling Using Focal Follow Data from Congolese BaYaka Children and Adolescents Engaging in Work and Play. I led on the statistical modeling of the data including adapting the implementation for focal follow data (which I'll explain more about in part 2) with the other authors contributing knowledge of the theory and literature (as we were keen to ensure that we demonstrated the approach using real data addressing a substantive research question). My only regret about the project is that, as of writing, I've only met been able to meet one of my co-authors in person.

The purpose of this blog post is to give a bit more context to the approach and make the it easier to learn about the egocentric relational event model (EREM) by providing access to the R code and data. A relational event is a discrete event involving an actor and one or more targets. For example you might observe children in a play group and code their interactions as discrete events in time such as child 1 approaching child 2, child 2 offering child 1 a toy and so on. Relational event models are one approach to modeling such data (and fall under the broader umbrella of network methods). However, not all relational event data involves such complete and detailed network data. In some cases you are only interested in or have access to data involving one actor at a time in relation to their environment (which may include interactions with other actors). Such data lend themselves to analysis using the egocentric relational event model. Although this sort of data might seem limiting it is often very rich. In particular it lends itself to analysing patterns of sequences among discrete non-overlapping events.

The model comes in two flavours: ordinal and interval. In the ordinal version you only have access to information about the order of events. In the interval version you have (or can infer) start and finish times for each event. This means that you can model not only patterns in the sequences of events but their duration. For example, does a particular type of event increase the frequency or duration of another event?

The Marcum and Butts example uses the American Time Use Survey in which "measures the amount of time people spend doing various activities, such as paid work, childcare, volunteering, and socializing". Using these data we can answer questions such as how often does sleep get interrupted by other activities, which activities cause more sleep interruptions (or more prosaic questions such as how much time do people spend doing a particular activity). We can also look at how covariates impact the frequency or duration of events. For example, are some sleep interruptions more common for men than women?

Please note that the following examples assume you have some familiarity with linear regression models (and ideally generalised linear models for discrete data such as logistic or Poisson regression). You'll also need a working knowledge of R (or at least similar statistical software programming environments). If R is new to you I'd suggest finding a tutorial on using R with RStudio first. (There are a lot of these online - including many videos.)

In this first part I'm actually going to focus not on our data but a paper by Marcum and Butts (2015). This introduces the egocentric relational event model and is a fantastic resource for a lot of the technical details of the model. You can run an EREM with the relevent package in R, but setting up and running EREMs is more than a little bit fiddly. Marcum and Butts' have made this easier with the informR package in R. This is essentially a helper package to make running EREM models easier. They include a really useful tutorial with R code in the paper. So if you want to learn about the EREM I'd suggest working through the relevant parts of the paper (no pun intended). To make this easier (I hope) I've added some commentary and made some minor tweaks to their example.

You can access the Marcum and Butts (2015) worked example here.

In part 2 I'll focus on our Field Methods paper.

References

Ellis-Davies, K., Lew-Levy, S., Fleming, E., Boyette, A. H., & Baguley, T. (2021). Demonstrating the Utility of Egocentric Relational Event Modeling Using Focal Follow Data from Congolese BaYaka Children and Adolescents Engaging in Work and Play. Field Methods. 33(3):287-304.

Marcum, C. S., & Butts, C. T. (2015). Constructing and Modifying Sequence Statistics for relevent Using informR in R. Journal of Statistical Software, 64(5).

I have been thinking to write a paper about MANOVA (and in particular why it should be avoided) for some time, but never got round to it. However, I recently discovered an excellent article by Francis Huang that pretty much sums up most of what I'd cover. In this blog post I'll just run through the main issues and refer you to Francis' paper for a more in-depth critique or the section on MANOVA in Serious Stats (Baguley, 2012).

I have three main issues with MANOVA:

1) It doesn't do what people think it does

2) It doesn't offer Type I error protection for subsequent univariate tests (even though many text books say it does)

3) There are generally better approaches available if you really are interested in multivariate research questions

Let's start with the first point. People think MANOVA analyses multiple outcome variables (DVs). This isn't really correct. It creates a composite DV by combining the outcome variables in an atheoretical way. Then analysis proceeds on the composite DV. The composite is in a sense 'optimal' because weights are selected to maximise the variance explained from the set of predictors in the model. However, this optimisation will capitalise on chance. Furthermore it will be unique to your sample – invalidating (or at least making difficult) comparisons between studies. It will also be hard to interpret. This has implications knock-on implications for things like standardised effect sizes as generally effect size metric for MANOVA relate to the composite DV rather than the original outcome variables. For further discussion see Grayson (2004).

In relation to the second point the issue is one that is fairly well known in other contexts. In ANOVA one can use an omnibus test of a factor to decide whether to proceed with post hoc pairwise comparisons. This is the logic behind the Fisher LSD test and it is well known that this test doesn't protect Type I error very well if there are more than 3 means being compared – specially it protects against the complete null hypothesis and not the partial null hypothesis (see Serious Stats p. 495-501). For adequate Type I error protection it would be better to use something like the Holm or Hochberg correction (the latter having greater statistical power if the univariate test statistics are correlated – which they generally are if MANOVA is being considered). That said if you do just want a test of omnibus null hypothesis – that there are no effects on any of the DVs – MANOVA may be a convenient way to summarise a large set of univariate tests that are non-significant.

Last but not least, there exist multivariate regression (and other) approaches that are more appropriate for multivariate research questions (see also Huang, 2019). However, I've rarely seen MANOVA used for multivariate research questions. In fact, I've rarely if ever seen a MANOVA reported that actually aided interpretation of the data.

References

Baguley, T. (2012). Serious stats: A guide to advanced statistics for the behavioral sciences. Palgrave Macmillan. (see pages 647-650)

Grayson, D. (2004). Some Myths and Legends in Quantitative Psychology. Understanding Statistics, 3(2), 101–134. https://doi.org/10.1207/s15328031us0302_3

Huang, F. L. (2020). MANOVA: A Procedure Whose Time Has Passed? Gifted Child Quarterly, 64(1), 56–60. https://doi.org/10.1177/0016986219887200

Huberty, C. J., & Morris, J. D. (1989). Multivariate analysis versus multiple univariate analyses. Psychological Bulletin, 105(2), 302–308. https://doi.org/10.1037/0033-2909.105.2.302

I wrote a brief introduction to logistic regression aimed at psychology students. You can take a look at the pdf here:

A more comprehensive introduction in terms of the generalised linear model can be found in my book:

Baguley, T. (2012). Serious stats: a guide to advanced statistics for the behavioral sciences. Palgrave Macmillan.

I wrote a short blog (with R Code) on how to calculate corrected CIs for rho and tau using the Fisher z transformation.

Serious Stats blog post on CIs for rho and tau

I have written a short article on Type II versus Type III SS in ANOVA-like models on my Serious Stats blog:

https://seriousstats.wordpress.com/2020/05/13/type-ii-and-type-iii-sums-of-squares-what-should-i-choose/

I have just published a short blog on the Egon Pearson correction for the chi-square test. This includes links to an R function to run the corrected test (and also provides residual analyses for contingency tables).

The blog is here and the R function here.

Bayesian Data Analysis in the Social Sciences Curriculum

Supported by the ESRC’s Advanced Training Initiative

Venue: Bowden Room Nottingham Conference Centre

Burton Street, Nottingham, NG1 4BU

Booking information online

Provisional schedule:

*Time*	*Speaker*	*Title*
9.30		Registration (and coffee!)
9.50	Thom Baguley	Introduction And Welcome
10.00	Mark Andrews Thom Baguley	Teaching Bayesian Data Analysis To Social Scientists
10.50	Zoltan Dienes	Principles For Teaching And Using Bayes Factors
11.40		Coffee
12.00	Colin Foster	Bayes Factors Show Equivalence Between Two Contrasting Approaches To Developing School Pupils’ Mathematical Fluency
12.20	Helen Hodges	Towards A Bayesian Approach In Criminology: A Case Study Of Risk Assessment In Youth Justice
12.40		Lunch
1.40	Jayne Pickering Matthew Inglis Nina Attridge	Does Pain Affect Performance On The Attentional Networking Task?
2.00	Oliver Clark	First Steps Towards A Bayesian Model Of Video Game Avatar Influence
2.20		Coffee
2.40	Richard Morey	The Fallacy Of Placing Confidence In Confidence Intervals
3.30	Daniel Lakens	Learning Bayes As A Frequentist: A Personal Tragedy In Three Parts
4.20		Close and farewell

Organizers:

Thom Baguley twitter: @seriousstats

Mark Andrews twitter: @xmjandrews

I am pleased to announce booking is now open for the end of Prior Exposure grant conference on Bayesian Data Analysis in the Social Sciences Curriculum on 29th September. We are still finalising the programme but have confirmed contributions from Richard Morey (University of Cardiff), Zoltan Dienes (University of Sussex) and Daniel Lakens (Eindhoven University of Technology) as well as presentations from Mark Andrews and myself.

We are also inviting seeking submissions from PhD students and others - on using and teaching Bayes. Booking for the final round of workshops will also open shortly.

https://www.ntu.ac.uk/about-us/events/events/2017/09/esrc-conference-bayesian-data-analysis-in-the-social-sciences-curriculum

You can book via this link:

http://onlinestore.ntu.ac.uk/conferences-events/school-of-social-sciences/events/bayesian-data-analysis-in-the-social-sciences-curriculum

Delegate Registration: Registration is £20 (Early Bird fee of £10 if booked before Friday 1 September 2017). The cost includes lunch and coffee in the Nottingham Conference Centre (Newton building, Nottingham Trent University).

The third and (possibly) final round of the workshops of our introductory workshops was overbooked in April, but we have managed to arrange some additional dates in June.

There are still places left on these. More details at: http://www.priorexposure.org.uk/

As with the last round we are planning a free R workshop before hand (reccomended if you need a refresher or have never used R before). Unfortunately we can't offer bursaries for these additional workshops (as this wasn't part of the original ESRC funding).

They are primarily (but not exclusively) aimed at UK social science PhD students (so not just Psychology or Neuroscience, but very much also Sociology, Criminology, Politics and other social science disciplines). We hope the workshops will also appeal to early career researchers and others doing quantitative social science research (but with little or no Bayesian experience).

The registration cost for each workshop is £20 (for postgrads) and £30 (or others).

In my Serious Stats blog I have a new post on providing CIs for a difference between independent R square coefficients.

You can find the post there or go direct to the function hosted on RPubs. I have been experimenting with knitr but can't yet get the html from R Markdown to work with my blogger or wordpress blogs.

Guest post: Visualizing data using a 3D printer

Add a comment

Psychological Statistics

A quick overview of the egocentric relation event model (EREM) with linked examples in R: part 1

Add a comment

I Will Not Ever, NEVER Run a MANOVA

View comments

A brief introduction to logistic regression

Add a comment

Serious Stats: Obtaining CIs for Spearman's rho or Kendall's tau

Add a comment

Serious stats: Type II versus Type III Sums of Squares

Add a comment

Egon Pearson correction for Chi-Square

Add a comment

Provisional programme: ESRC funded conference: Bayesian Data Analysis in the Social Sciences Curriculum (Nottingham, UK 29th Sept 2017)

Add a comment

Announcement: ESRC funded conference: Bayesian Data Analysis in the Social Sciences Curriculum (29th Sept 2017)

Add a comment

STOP PRESS Introductory Bayesian data analysis workshops for social scientists (June 2017 Nottingham UK)

Add a comment

Serious Stats blog: CI for differences in independent R square coefficients

View comments