AUC

When it comes to summarizing the association between two numeric variables, we can use Pearson or Spearman correlation. When accompanied with a scatterplot, they allow us to quantify association on a scale from -1 to 1. But what if we have two ordered categorical variables with just a few levels? How can we summarize their association? One approach is to calculate Somers’ Delta, or Somers’ D for short.

Let’s say we fit a logistic regression model for the purpose of predicting the probability of low infant birth weight, which is an infant weighing less than 2.5 kg. Below we fit such a model using the birthwt data set that comes with the MASS package in R. (This is an example model and not to be used as medical advice.)

We first subset the data to select four variables:

This article assumes basic familiarity with the use and interpretation of logistic regression, odds and probabilities, and true/false positives/negatives. The examples are coded in R. ROC curves and AUC have important limitations, and I encourage reading through the section at the end of the article to get a sense of when and why the tools can be of limited use.