StatLab Articles

How to Use Docker for Study Reproducibility with R Markdown

What is Docker? Docker is a software product that allows for the efficient building, packaging, and deployment of applications. It uses containers, which are isolated environments that bundle software and its dependencies. These containers can run an application with all the same software, dependencies, settings, and more as were on the original machine on any […]

Theil-Sen Regression: Programming and Understanding an Outlier-Resistant Alternative to Least Squares

Least squares is so frequently the method by which linear regressions are estimated that in many write-ups of analyses, explicit mention of the method is omitted. Authors save the ink or pixels otherwise consumed by “least squares” and let it simply be inferred. This is an understandable elision: You could make good money repeatedly betting […]

Getting Started with Analysis of Covariance

The Analysis of Covariance, or ANCOVA, is a regression model that includes both categorical and numeric predictors, often just one of each. It is commonly used to analyze a follow-up numeric response after exposure to various treatments, controlling for a baseline measure of that same response. For example, given two subjects with the same baseline […]

Bootstrap Estimates of Confidence Intervals

What is Bootstrapping? Bootstrapping is a statistical procedure that utilizes resampling (with replacement) of a sample to infer properties of a wider population. More often than not, we want to understand the properties of a population but we only have access to a small sample of that population. Sometimes, we are unable to gather more […]

Getting Started with Simple Slopes Analysis

A Simple Slopes Analysis is a follow-up procedure to regression modeling that helps us investigate and interpret “significant” interactions. The analysis is often employed for interactions between two numeric predictors, but it can be applied to other types of interactions as well. To motivate why we might be interested in this type of analysis, consider […]

The Shortcomings of Standardized Regression Coefficients

Analysts and researchers occasionally want to compare the magnitudes of different predictive or causal effects estimated via regression. But comparison is a tricky endeavor when predictor variables are measured on different scales: If y is predicted from x and z, with x measured in kilograms and z measured in years, what does the relative size […]

Simulating Multinomial Logistic Regression Data

In this article we demonstrate how to simulate data suitable for a multinomial logistic regression model using R. One reason to do this is to gain a better understanding of how multinomial logistic regression models work. Another is to simulate data for the purposes of estimating power and sample size for a planned experiment that […]

Understanding Precision-Based Sample Size Calculations

When designing an experiment it’s good practice to estimate the number of subjects or observations we’ll need. If we recruit or collect too few, our analysis may be too uncertain or misleading. If we collect too many, we potentially waste time and expense on diminishing returns. The optimal sample size provides enough information to allow […]

Continuity Corrections: Imperfect Responses to Slight Problems

R users who have run base R’s prop.test() function to perform a null hypothesis test of a proportion—as when assessing whether a coin is weighted toward heads or whether more than 50% of the wines a vineyard sold in a given month were reds—may have noticed curious language in the output: The default test is […]

Understanding Semivariograms

I’ve heard something frightening from practicing statisticians who frequently use mixed effects models. Sometimes when I ask them whether they produced a [semi]variogram to check the correlation structure they reply “what’s that?” –Frank Harrell When it comes to statistical modeling, semivariograms help us visualize and assess correlation in residuals. We can use them for two […]