statistical methods

Theil-Sen Regression: Programming and Understanding an Outlier-Resistant Alternative to Least Squares

Least squares is so frequently the method by which linear regressions are estimated that in many write-ups of analyses, explicit mention of the method is omitted. Authors save the ink or pixels otherwise consumed by “least squares” and let it simply be inferred. This is an understandable elision: You could make good money repeatedly betting […]

Getting Started with Analysis of Covariance

The Analysis of Covariance, or ANCOVA, is a regression model that includes both categorical and numeric predictors, often just one of each. It is commonly used to analyze a follow-up numeric response after exposure to various treatments, controlling for a baseline measure of that same response. For example, given two subjects with the same baseline […]

Bootstrap Estimates of Confidence Intervals

What is Bootstrapping? Bootstrapping is a statistical procedure that utilizes resampling (with replacement) of a sample to infer properties of a wider population. More often than not, we want to understand the properties of a population but we only have access to a small sample of that population. Sometimes, we are unable to gather more […]

Getting Started with Simple Slopes Analysis

A Simple Slopes Analysis is a follow-up procedure to regression modeling that helps us investigate and interpret “significant” interactions. The analysis is often employed for interactions between two numeric predictors, but it can be applied to other types of interactions as well. To motivate why we might be interested in this type of analysis, consider […]

The Shortcomings of Standardized Regression Coefficients

Analysts and researchers occasionally want to compare the magnitudes of different predictive or causal effects estimated via regression. But comparison is a tricky endeavor when predictor variables are measured on different scales: If y is predicted from x and z, with x measured in kilograms and z measured in years, what does the relative size […]

Simulating Multinomial Logistic Regression Data

In this article we demonstrate how to simulate data suitable for a multinomial logistic regression model using R. One reason to do this is to gain a better understanding of how multinomial logistic regression models work. Another is to simulate data for the purposes of estimating power and sample size for a planned experiment that […]

Understanding Precision-Based Sample Size Calculations

When designing an experiment it’s good practice to estimate the number of subjects or observations we’ll need. If we recruit or collect too few, our analysis may be too uncertain or misleading. If we collect too many, we potentially waste time and expense on diminishing returns. The optimal sample size provides enough information to allow […]

Continuity Corrections: Imperfect Responses to Slight Problems

R users who have run base R’s prop.test() function to perform a null hypothesis test of a proportion—as when assessing whether a coin is weighted toward heads or whether more than 50% of the wines a vineyard sold in a given month were reds—may have noticed curious language in the output: The default test is […]

Understanding Semivariograms

I’ve heard something frightening from practicing statisticians who frequently use mixed effects models. Sometimes when I ask them whether they produced a [semi]variogram to check the correlation structure they reply “what’s that?” –Frank Harrell When it comes to statistical modeling, semivariograms help us visualize and assess correlation in residuals. We can use them for two […]

Nonparametric and Parametric Power: Comparing the Wilcoxon Test and the t-test

From 2004 to 2008, a series of four brief, disagreeing papers in the journal Medical Education took up the question of whether and when it’s appropriate to analyze data from Likert scales (i.e., integers reflecting degrees of agreement with statements) with parametric or nonparametric statistical methods. Although no overly convincing consensus emerged, at least in […]