mixed effect models

The Structure of Multilevel Data

The term “multilevel data” refers to data organized in a hierarchical structure, where units of analysis are grouped into clusters. For example, in a cross-sectional study, multilevel data could be made up of individual measurements of students from different schools, where students are nested within schools. In a longitudinal study, multilevel data could be made up of multiple time point measurements of individuals, where time points are nested within individuals.

I’ve heard something frightening from practicing statisticians who frequently use mixed effects models. Sometimes when I ask them whether they produced a [semi]variogram to check the correlation structure they reply “what’s that?” -Frank Harrell

Occasionally we are asked to help students or faculty implement a mixed-effect model in SPSS. Our training and expertise is primarily in R, so it can be challenging to transfer and apply our knowledge to SPSS. In this article we document for posterity how to fit some basic mixed-effect models in R using the lme4 and nlme packages, and how to replicate the results in SPSS.

In this article we work with R 4.2.0, lme4 version 1.1-29, nlme version 3.1-157, and SPSS version 28.0.1.1.

If you have ever used the R package lme4 to perform mixed-effect modeling you may have noticed the “Correlation of Fixed Effects” section at the bottom of the summary output. This article intends to shed some light on what this section means and how you might interpret it.

Generalized estimating equations, or GEE, is a method for modeling longitudinal or clustered data. It is usually used with non-normal data such as binary or count data. The name refers to a set of equations that are solved to obtain parameter estimates (i.e., model coefficients). If interested, see Agresti (2002) for the computational details. In this article we simply aim to get you started with implementing and interpreting GEE using the R statistical computing environment.

Binomial generalized linear mixed models, or binomial GLMMs, are useful for modeling binary outcomes for repeated or clustered measures. For example, let’s say we design a study that tracks what college students eat over the course of 2 weeks, and we’re interested in whether or not they eat vegetables each day. For each student, we’ll have 14 binary events: eat vegetables or not.