The Analysis of Covariance, or ANCOVA, is a regression model that includes both categorical and numeric predictors, often just one of each. It is commonly used to analyze a follow-up numeric response after exposure to various treatments, controlling for a baseline measure of that same response. For example, given two subjects with the same baseline […]
StatLab is Hiring Graduate Student Associates
The UVA StatLab is seeking Graduate Student Associates for the 2023-2024 academic year. The StatLab Associates will assist UVA students who need help with statistics, data wrangling, and visualization; write StatLab articles for our website; and work on data science projects as they arise. Associates will be paid $22/hour up to 10 hours a week […]
Bootstrap Estimates of Confidence Intervals
What is Bootstrapping? Bootstrapping is a statistical procedure that utilizes resampling (with replacement) of a sample to infer properties of a wider population. More often than not, we want to understand the properties of a population but we only have access to a small sample of that population. Sometimes, we are unable to gather more […]
Getting Started with Simple Slopes Analysis
A Simple Slopes Analysis is a follow-up procedure to regression modeling that helps us investigate and interpret “significant” interactions. The analysis is often employed for interactions between two numeric predictors, but it can be applied to other types of interactions as well. To motivate why we might be interested in this type of analysis, consider […]
Simulating Multinomial Logistic Regression Data
In this article we demonstrate how to simulate data suitable for a multinomial logistic regression model using R. One reason to do this is to gain a better understanding of how multinomial logistic regression models work. Another is to simulate data for the purposes of estimating power and sample size for a planned experiment that […]
Understanding Precision-Based Sample Size Calculations
When designing an experiment it’s good practice to estimate the number of subjects or observations we’ll need. If we recruit or collect too few, our analysis may be too uncertain or misleading. If we collect too many, we potentially waste time and expense on diminishing returns. The optimal sample size provides enough information to allow […]
Understanding Semivariograms
I’ve heard something frightening from practicing statisticians who frequently use mixed effects models. Sometimes when I ask them whether they produced a [semi]variogram to check the correlation structure they reply “what’s that?” –Frank Harrell When it comes to statistical modeling, semivariograms help us visualize and assess correlation in residuals. We can use them for two […]
Getting Started with Gamma Regression
In this article we plan to get you up and running with gamma regression. But before we dive into that, let’s review the familiar Normal distribution. This will provide some scaffolding to help us transition to the gamma distribution. As you probably know, a Normal distribution is described by its mean and standard deviation. These […]
Understanding Deviance Residuals
If you have ever performed binary logistic regression in R using the glm() function, you may have noticed a summary of “Deviance Residuals” at the top of the summary output. In this article we talk about how these residuals are calculated and what we can use them for. We also talk about other types of […]
Logistic Regression Four Ways with Python
What is Logistic Regression? Logistic regression is a predictive analysis that estimates/models the probability of an event occurring based on a given dataset. This dataset contains both independent variables, or predictors, and their corresponding dependent variable, or response. To model the probability of a particular response variable, logistic regression assumes that the log-odds for the […]