# Clay Ford

## Power and Sample Size Analysis using Simulation

The power of a test is the probability of correctly rejecting a null hypothesis. For example, let’s say we suspect a coin is not fair and lands heads 65% of the time. The null hypothesis is the coin is not biased to land heads. The alternative hypothesis is the coin is biased to land heads. […]

## Post Hoc Power Calculations are Not Useful

It is well documented that post hoc power calculations are not useful (Goodman and Berlin 1994, Hoenig and Heisey 2001, Althouse 2020). Also known as observed power or retrospective power, post hoc power purports to estimate the power of a test given an observed effect size. The idea is to show that a “non-significant” hypothesis […]

## Understanding Ordered Factors in a Linear Model

Consider the following data from the text Design and Analysis of Experiments, 7 ed (Montgomery, Table 3.1). It has two variables: power and rate. Power is a discrete setting on a tool used to etch circuits into a silicon wafer. There are four levels to choose from. Rate is the distance etched measured in Angstroms […]

## Getting Started with Generalized Estimating Equations

Generalized Estimating Equations, or GEE, is a method for modeling longitudinal or clustered data. It is usually used with non-normal data such as binary or count data. The name refers to a set of equations that are solved to obtain parameter estimates (ie, model coefficients). If interested, see Agresti (2002) for the computational details. In […]

## Getting Started with Binomial Generalized Linear Mixed Models

Binomial Generalized Linear Mixed Models, or binomial GLMMs, are useful for modeling binary outcomes for repeated or clustered measures. For example, let’s say we design a study that tracks what college students eat over the course of 2 weeks, and we’re interested in whether or not they eat vegetables each day. For each student we’ll […]

## Understanding Multiple Comparisons and Simultaneous Inference

When it comes to confidence intervals and hypothesis testing there are two important limitations to keep in mind. The significance level1, $$\alpha$$, or the confidence interval coverage, $$1 – \alpha$$, only apply to one test or estimate, not to a series of tests or estimates. are only appropriate if the estimate or test was not […]

## Understanding Robust Standard Errors

What are robust standard errors? How do we calculate them? Why use them? Why not use them all the time if they’re so robust? Those are the kinds of questions this post intends to address. To begin, let’s start with the relatively easy part: getting robust standard errors for basic linear models in Stata and […]

## Getting Started with Multinomial Logit Models

Multinomial logit models allow us to model membership in a group based on known variables. For example, operating system preference of a university’s students could be classified as “Windows”, “Mac”, or “Linux”. Perhaps we would like to better understand why students choose one OS versus another. We might want to build a statistical model that […]

## Understanding Empirical Cumulative Distribution Functions

What are empirical cumulative distribution functions and what can we do with them? To answer the first question, let’s first step back and make sure we understand “distributions”, or more specifically, “probability distributions”. A Basic Probability Distribution Imagine a simple event, say flipping a coin 3 times. Here are all the possible outcomes, where H […]

## Getting Started with Rate Models

Let’s say we’re interested in modeling the number of auto accidents that occur at various intersections within a city. Upon collecting data after a certain period of time perhaps we notice two intersections have the same number of accidents, say 25. Is it correct to conclude these two intersections are similar in their propensity for […]