StatLab Articles

List Comprehensions in Python

List comprehensions are a topic a lot of new Python users struggle with. This article seeks to explain the benefits of list comprehensions and how list comprehensions work in a digestible manner. Single for loop list comprehension The following code uses a traditional for loop to change each string in a for loop from upper […]

Getting Started with the Kruskal-Wallis Test

What is it? One of the most well-known statistical tests to analyze the differences between means of given groups is the ANOVA (analysis of variance) test. While ANOVA is a great tool, it assumes that the data in question follows a normal distribution. What if your data doesn’t follow a normal distribution or if your […]

A Beginner’s Guide to Marginal Effects

What are average marginal effects? (If you’re reading this, chances are you just asked this question.) If we unpack the phrase, it looks like we have effects that are marginal to something, all of which we average. So let’s look at each piece of this phrase and see if we can help you get a […]

The Intuition Behind Confidence Intervals

Say it with me: An X% confidence interval captures the population parameter in X% of repeated samples. In the course of our statistical educations, many of us had that line (or some variant of it) crammed, wedged, stuffed, and shoved into our skulls until definitional precision was leaking out of noses and pooling on our […]

Post Hoc Power Calculations are Not Useful

It is well documented that post hoc power calculations are not useful (Goodman and Berlin 1994, Hoenig and Heisey 2001, Althouse 2020). Also known as observed power or retrospective power, post hoc power purports to estimate the power of a test given an observed effect size. The idea is to show that a “non-significant” hypothesis […]

Understanding Ordered Factors in a Linear Model

Consider the following data from the text Design and Analysis of Experiments, 7 ed (Montgomery, Table 3.1). It has two variables: power and rate. Power is a discrete setting on a tool used to etch circuits into a silicon wafer. There are four levels to choose from. Rate is the distance etched measured in Angstroms […]

Ask Better Code Questions (and Get Better Answers) With Reprex

Note: This article was written about version 2.0.0 of the reprex package. In the forums and Q&A sections of websites like Stack Overflow, GitHub, and community.rstudio.com, there is a volunteer force of data science detectives, code consultants, and error-fighting emissaries ready to offer assistance to programmers who find themselves staring down unhappy code that’s resisting […]

Getting Started with Generalized Estimating Equations

Generalized Estimating Equations, or GEE, is a method for modeling longitudinal or clustered data. It is usually used with non-normal data such as binary or count data. The name refers to a set of equations that are solved to obtain parameter estimates (ie, model coefficients). If interested, see Agresti (2002) for the computational details. In […]

Getting Started with Binomial Generalized Linear Mixed Models

Binomial Generalized Linear Mixed Models, or binomial GLMMs, are useful for modeling binary outcomes for repeated or clustered measures. For example, let’s say we design a study that tracks what college students eat over the course of 2 weeks, and we’re interested in whether or not they eat vegetables each day. For each student we’ll […]