statistical methods

Logistic Regression Four Ways with Python

What is Logistic Regression? Logistic regression is a predictive analysis that estimates/models the probability of an event occurring based on a given dataset. This dataset contains both independent variables, or predictors, and their corresponding dependent variable, or response. To model the probability of a particular response variable, logistic regression assumes that the log-odds for the […]

Getting Started with Bootstrap Model Validation

Let’s say we fit a logistic regression model for the purposes of predicting the probability of low infant birth weight, which is an infant weighing less than 2.5 kg. Below we fit such a model using the “birthwt” data set that comes with the MASS package in R. (This is an example model and not […]

Detecting Influential Points in Regression with DFBETA(S)

In regression modeling, influential points are observations that, individually, exert large effects on a model’s results—the parameter estimates (\(\hat{\beta_0}, \hat{\beta_1}, …, \hat{\beta_j}\)) and, consequently, the model’s predictions (\(\hat{y_1}, \hat{y_2}, …, \hat{y_i}\)). Influential points aren’t necessarily troublesome, but observations flagged as highly influential warrant follow-up. A large value on an influence measure can signal anything from […]

ROC Curves and AUC for Models Used for Binary Classification

This article assumes basic familiarity with the use and interpretation of logistic regression, odds and probabilities, and true/false positives/negatives. The examples are coded in R. ROC curves and AUC have important limitations, and I encourage reading through the section at the end of the article to get a sense of when and why the tools […]

Comparing the accuracy of two binary diagnostic tests in a paired study design

There are many medical tests for detecting the presence of a disease or condition. Some examples include tests for lesions, cancer, pregnancy, or COVID-19. While these tests are usually accurate, they’re not perfect. In addition, some tests are designed to detect the same condition, but use a different method. A recent example are PCR and […]

Correlation of Fixed Effects in lme4

If you have ever used the R package lme4 to perform mixed-effect modeling you may have noticed the “Correlation of Fixed Effects” section at the bottom of the summary output. This article intends to shed some light on what this section means and how you might interpret it. To begin, let’s simulate some data. Below […]

Getting Started with the Kruskal-Wallis Test

What is it? One of the most well-known statistical tests to analyze the differences between means of given groups is the ANOVA (analysis of variance) test. While ANOVA is a great tool, it assumes that the data in question follows a normal distribution. What if your data doesn’t follow a normal distribution or if your […]

A Beginner’s Guide to Marginal Effects

What are average marginal effects? If we unpack the phrase, it looks like we have effects that are marginal to something, all of which we average. So let’s look at each piece of this phrase and see if we can help you get a better handle on this topic. To begin we simulate some toy […]

The Intuition Behind Confidence Intervals

Say it with me: An X% confidence interval captures the population parameter in X% of repeated samples. In the course of our statistical educations, many of us had that line (or some variant of it) crammed, wedged, stuffed, and shoved into our skulls until definitional precision was leaking out of noses and pooling on our […]