Getting Started with Generalized Estimating Equations

Generalized Estimating Equations, or GEE, is a method for modeling longitudinal or clustered data. It is usually used with non-normal data such as binary or count data. The name refers to a set of equations that are solved to obtain parameter estimates (ie, model coefficients). If interested, see Agresti (2002) for the computational details. In this article we simply aim to get you started with implementing and interpreting GEE using the R Statistical Computing Environment.

We often model longitudinal or clustered data with mixed-effect or multilevel models. So how is GEE different?

  1. The main difference is that it’s a marginal model. It seeks to model a population average. Mixed-effect/Multilevel models are subject-specific, or conditional, models. They allow us to estimate different parameters for each subject or cluster. In other words, the parameter estimates are conditional on the subject/cluster. This in turn provides insight into the variability between subjects or clusters. We can also obtain a population-level model from a mixed-effect model, but it’s basically an average of the subject-specific models.

  2. GEE is intended for simple clustering or repeated measures. It cannot easily accommodate more complex designs such as nested or crossed groups; for example, nested repeated measures within a subject or group. This is something better suited for a mixed-effect model.

  3. GEE computations are usually easier than mixed-effect model computations. GEE does not use the likelihood methods that mixed-effect models employ, which means GEE can sometimes estimate more complex models.

  4. Because GEE doesn’t use likelihood methods, the estimated “model” is incomplete and not suitable for simulation.

  5. GEE allows us to specify a correlation structure for different responses within a subject or group. For example, we can specify that the correlation of measurements taken closer together is higher than those taken farther apart. This is not something that’s currently possible in the popular lme4 package.

Let’s work through an example to compare and contrast GEEs and mixed-effect models. The data we’ll use comes from Table 11.2 of Agresti (2002) and concerns a longitudinal study comparing two drugs (“new” versus “standard”) for treating depression. Below we read in the data from a CSV file, set “standard” as the reference level for the drug variable, and look at the first three rows. We see subject 1 (id = 1) had a “mild” diagnosis and was treated with the “standard” drug at times 0, 1, and 2. At each time point, the subject’s depression response was a “1”, which in this case means “Normal”. A response of 0 means “Abnormal.”

URL <- ""
dat <- read.csv(URL, stringsAsFactors = TRUE)
dat$id <- factor(dat$id)
dat$drug <- relevel(dat$drug, ref = "standard")
head(dat, n = 3)

##   diagnose     drug id time depression
## 1     mild standard  1    0          1
## 2     mild standard  1    1          1
## 3     mild standard  1    2          1

Altogether we have data on 340 subjects.


## [1] 340

Let’s look at the proportion of “Normal” responses (depression == 1) over time within the diagnosis and drug categories. Below we create a table of means by diagnosis, drug, and time. Recall that the mean of zeroes and ones if the proportion of ones. Then we “flatten” the table with ftable and round the results to two digits using the pipe operator courtesy of the magrittr package. This reproduces Table 11.3 in Agresti (2002). Notice the proportion of “Normal” increases over time for all four combinations of diagnosis and treatment, and that the increase is more dramatic for the “new” drug.

# install.packages("magrittr")
with(dat, tapply(depression, list(diagnose, drug, time), mean)) %>% 
  ftable() %>% 
##                     0    1    2
## mild   standard  0.51 0.59 0.68
##        new       0.53 0.79 0.97
## severe standard  0.21 0.28 0.46
##        new       0.18 0.50 0.83

Now let’s use GEE to estimate a marginal model for the effect of diagnosis, drug and time on the depression response. We’ll also allow an interaction for drug and time. (This means we’re allowing the response trajectory to vary depending on drug.) We’ll use the gee function in the gee package to carry out the modeling. If you don’t have the gee package, uncomment and run the first line of code below.

Notice the formula syntax is the same we would use with a mixed-effect model in R. We also have the familiar data and family arguments to specify the data frame and the error distribution, respectively. The id argument specifies the grouping factor. In this case it’s the “id” column in the data frame. (It’s important to note that gee assumes the data frame is sorted according to the id variable.) The last argument, corstr, allows us to specify the correlation structure. Below we have specified “independence” which means we’re not allowing responses within subjects to be correlated. Some other possible options include “exchangeable”, “AR-M”, and “unstructured”, which we’ll discuss shortly.

# install.packages("gee")
dep_gee <- gee(depression ~ diagnose + drug*time,
               data = dat, 
               id = id, 
               family = binomial,
               corstr = "independence")

## Call:
## gee(formula = depression ~ diagnose + drug * time, id = id, data = dat, 
##     family = binomial, corstr = "independence")
## Summary of Residuals:
##         Min          1Q      Median          3Q         Max 
## -0.94844242 -0.40683252  0.05155758  0.38830952  0.80242231 
## Coefficients:
##                   Estimate Naive S.E.    Naive z Robust S.E.   Robust z
## (Intercept)    -0.02798843  0.1627083 -0.1720160   0.1741865 -0.1606808
## diagnosesevere -1.31391092  0.1453432 -9.0400569   0.1459845 -9.0003423
## drugnew        -0.05960381  0.2205812 -0.2702126   0.2285385 -0.2608042
## time            0.48241209  0.1139224  4.2345663   0.1199350  4.0222784
## drugnew:time    1.01744498  0.1874132  5.4288855   0.1876938  5.4207709
## Estimated Scale Parameter:  0.9854113
## Number of Iterations:  1
## Working Correlation
##      [,1] [,2] [,3]
## [1,]    1    0    0
## [2,]    0    1    0
## [3,]    0    0    1

In the Coefficients section we see the estimated marginal model. The coefficients are on the logit scale. We interpret these coefficients the same way we would any other binomial logistic regression model. The time coefficient is 0.48. If we exponentiate we get an odds ratio of 1.62. This says the odds of responding as Normal on the standard drug increase by about 62% for each unit increase in time. The drug:time coefficient is 1.02. We add this to the time coefficient to get the effect of time for the new drug: 0.48 + 1.02 = 1.5. Exponentiating gives us about 4.5. This says the odds of responding as Normal on the new drug increase by 4.5 times for every unit increase in time. We feel pretty confident in the direction of these effects given how much bigger the coefficients are than their associated standard errors.

Speaking of standard errors, notice we get two of them: Naive and Robust. We’ll almost always want to use the Robust estimates because the variances of coefficient estimates tend to be too small when responses within subjects are correlated (Bilder and Loughin, 2015). In this case, there isn’t much difference between the Naive and Robust estimates, suggesting the independence assumption of the correlation structure seems realistic (Hothorn and Everitt, 2014).

At the bottom we see the “Working Correlation” structure. Since we specified corstr = "independence", it is simply the identity matrix with zeroes in the off-diagonal positions. This indicates we assume no correlation between responses within subjects. The matrix is 3 x 3 since we have 3 observations per subject.

Finally, the “Estimated Scale Parameter” is reported as 0.9854. This is sometimes called the dispersion parameter. This is similar to what is reported when running glm with family = quasibinomial in order to model over-dispersion. This gets calculated by default. If we don’t want to calculate it (ie, set it equal to 1) we can set scale.fix = TRUE.

Now let’s try a model with an exchangeable correlation structure. This says all pairs of responses within a subject are equally correlated. To do this we set corstr = "exchangeable".

dep_gee2 <- gee(depression ~ diagnose + drug*time,
               data = dat, 
               id = id, 
               family = binomial,
               corstr = "exchangeable")

## Call:
## gee(formula = depression ~ diagnose + drug * time, id = id, data = dat, 
##     family = binomial, corstr = "exchangeable")
## Summary of Residuals:
##         Min          1Q      Median          3Q         Max 
## -0.94843397 -0.40683122  0.05156603  0.38832332  0.80238627 
## Coefficients:
##                   Estimate Naive S.E.    Naive z Robust S.E.   Robust z
## (Intercept)    -0.02809866  0.1625499 -0.1728617   0.1741791 -0.1613205
## diagnosesevere -1.31391033  0.1448627 -9.0700418   0.1459630 -9.0016667
## drugnew        -0.05926689  0.2205340 -0.2687427   0.2285569 -0.2593091
## time            0.48246420  0.1141154  4.2278625   0.1199383  4.0226037
## drugnew:time    1.01719312  0.1877051  5.4191018   0.1877014  5.4192084
## Estimated Scale Parameter:  0.985392
## Number of Iterations:  2
## Working Correlation
##              [,1]         [,2]         [,3]
## [1,]  1.000000000 -0.003432732 -0.003432732
## [2,] -0.003432732  1.000000000 -0.003432732
## [3,] -0.003432732 -0.003432732  1.000000000

The “Working Correlation” matrix shows an estimate of -0.003433 as the common correlation between pairs of observations within a subject. This is tiny and it appears we could get way with treating this data as 1020 independent observations instead of 340 subjects with 3 observations each. We also notice the coefficient estimates are almost identical to the previous model with the independent correlation structure.

Another possibility for correlation is an autoregressive structure. This allows correlations of measurements taken closer together to be higher than those taken farther apart. The most common autoregressive structure is the AR-1. In our case this would mean measures 1 and 2 have a correlation of \(\rho^1\), while measures 1 and 3 have a correlation of \(\rho^2\). Like the exchangeable structure, we only need to estimate one parameter. To fit an AR-1 correlation structure we need to set corstr = "AR-M" and Mv = 1. This time we’ll extract the Working Correlation from the fitted model object instead of printing the entire summary.

dep_gee3 <- gee(depression ~ diagnose + drug*time,
               data = dat, 
               id = id, 
               family = binomial,
               corstr = "AR-M", Mv = 1)


##              [,1]        [,2]         [,3]
## [1,] 1.000000e+00 0.008477443 7.186704e-05
## [2,] 8.477443e-03 1.000000000 8.477443e-03
## [3,] 7.186704e-05 0.008477443 1.000000e+00

Once again the Working Correlation shows tiny values. The correlation parameter is estimated as 0.008477. That’s the estimated correlation between measures 1-2, and 2-3. The estimated correlation between measures 1-3 is \(0.008477^2\).

One other correlation structure of interest is the “unstructured” matrix. This allows all correlations to freely vary. If you have t time points (or t number of observations in a cluster), you will have \(t(t-1)/2\) correlation parameters to estimate. For example, just 5 time points results in \(5(4)/2 = 10\) parameters. It’s probably best to avoid unstructured matrices unless you have strong reasons to doubt a simpler structure will do.

How to choose which correlation structure to use? The good news is GEE estimates are valid even if you misspecify the correlation structure (Agresti, 2002). Of course this assumes the model is correct, but then again no model is exactly correct. Agresti suggests using the exchangeable structure as a start and then checking how the coefficient estimates and standard errors change with other correlation structures. If the changes are minimal, go with the simpler correlation structure.

How does the GEE model with the exchangeable correlation structure compare to a Generalized Mixed-Effect model? Let’s fit the model and compare the results. For this we’ll use the glmer function in the lme4 package. Notice the formula syntax stays the same except for the specification of the random effects. Below we allow the intercept to be conditional on subject. This means we’ll get 340 subject-specific models with different intercepts but identical coefficients for everything else.

# install.packages("lme4)
dep_glmer <- glmer(depression ~ diagnose + drug*time + (1|id), 
               data = dat, family = binomial)
summary(dep_glmer, corr = FALSE)

## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: binomial  ( logit )
## Formula: depression ~ diagnose + drug * time + (1 | id)
##    Data: dat
##      AIC      BIC   logLik deviance df.resid 
##   1173.9   1203.5   -581.0   1161.9     1014 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -4.2849 -0.8268  0.2326  0.7964  2.0181 
## Random effects:
##  Groups Name        Variance Std.Dev.
##  id     (Intercept) 0.00323  0.05683 
## Number of obs: 1020, groups:  id, 340
## Fixed effects:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    -0.02796    0.16406  -0.170    0.865    
## diagnosesevere -1.31488    0.15261  -8.616  < 2e-16 ***
## drugnew        -0.05968    0.22239  -0.268    0.788    
## time            0.48273    0.11566   4.174 3.00e-05 ***
## drugnew:time    1.01817    0.19150   5.317 1.06e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The model coefficients listed under “Fixed effects” are almost identical to the GEE coefficients. Instead of a working correlation matrix we get a “Random effects” section that summarizes the variability between subjects. Technically we’re assuming each subject’s random effect is drawn from a Normal distribution with mean 0 and some fixed standard deviation. The model estimates this standard deviation to be about 0.057 (under “Random effects”). To see the 340 different subject-specific models, enter coef(mod_glmer) in the R console. You’ll notice each subject receives their own intercept. (Actually, in this case, it’s each unique combination of predictor variables that receives a different intercept.)

Effect plots help us visualize models and see how predictors affect the response variable at various combinations of values. Let’s create effect plots for “dep_gee2” (GEE model with exchangeable correlation) and “dep_glmer” and see how they compare. To do this we’ll use the ggemmeans function from the ggeffects package. The ggemmeans function calls the emmeans function from the package of the same name. The emmeans function calculates marginal effects and works for both GEE and mixed-effect models.

Here’s the effect plot for the mixed-effect model with diagnose set to “severe”:

# install.packages("ggeffects")
plot(ggemmeans(dep_glmer, terms = c("time", "drug"), 
               condition = c(diagnose = "severe"))) + 
  ggplot2::ggtitle("GLMER Effect plot")

We see the probability of a “Normal” response to the new “drug” increases from less than 25% at time = 0 to over 75% at time = 3. We might consider this plot a population-level estimate since it uses just the fixed effect coefficients.

Next we create an effect plot for the GEE model with exchangeable correlation.

plot(ggemmeans(dep_gee2, terms = c("time", "drug"),
               condition = c(diagnose = "severe"))) + 
  ggplot2::ggtitle("GEE Effect plot")

The GEE plot is virtually identical to the GLMER plot! For this particular data set it doesn’t appear to make a difference which model we use.

Hopefully you now have a better understanding of how to run and interpret GEE models. There is much more to learn about GEE models and this article does not pretend to be comprehensive. If you would like to go further, you may wish to start with the “geepack” R package which includes a few more GEE functions, a brief vignette, and an introductory article in the Journal of Statistical Software. According to the geepack vignette, “GEEs work best when you have relatively many relatively small clusters in your data.”


  • Agresti, A. (2002). Categorical Data Analysis. Wiley, 2nd Edition.
  • Bilder, C. and Loughin, T. (2015). Analysis of Categorical Data with R. CRC Press.
  • Carey, V. (2019) gee: Generalized Estimation Equation Solver. R package version 4.13-20.
  • Harrell, F. (2015). Regression Modeling Strategies. Springer, 2nd Edition.
  • Højsgaard, S., Halekoh, U. & Yan J. (2006) The R Package geepack for Generalized Estimating Equations. Journal of Statistical Software, 15, 2, pp1–11.
  • Hothorn, T. and Everitt, B. (2014). A Handbook of Statistical Analyses using R. CRC Press, 3rd Edition.
  • Lenth, R. (2021). emmeans: Estimated Marginal Means, aka Least-Squares Means. R package version 1.5.5-1.
  • Liang, K.Y. and Zeger, S.L. (1986) Longitudinal data analysis using generalized linear models. Biometrika, 73 13-22.
  • Lüdecke D (2018). ggeffects: Tidy Data Frames of Marginal Effects from Regression Models. Journal of Open Source Software, 3(26), 772. doi: 10.21105/joss.00772 (URL:
  • R Core Team (2021). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL

For questions or clarifications regarding this article, contact the UVA Library StatLab:

View the entire collection of UVA Library StatLab articles.

Clay Ford
Statistical Research Consultant
University of Virginia Library
April 22, 2021