# statistical methods

## Getting started with Multivariate Multiple Regression

Multivariate Multiple Regression is the method of modeling multiple responses, or dependent variables, with a single set of predictor variables. For example, we might want to model both math and reading SAT scores as a function of gender, race, parent income, and so forth. This allows us to evaluate the relationship of, say, gender with […]

## The Wilcoxon Rank Sum Test

The Wilcoxon Rank Sum Test is often described as the non-parametric version of the two-sample t-test. You sometimes see it in analysis flowcharts after a question such as “is your data normal?” A “no” branch off this question will recommend a Wilcoxon test if you’re comparing two groups of continuous measures. So what is this […]

## Pairwise comparisons of proportions

Pairwise comparison means comparing all pairs of something. If I have three items A, B and C, that means comparing A to B, A to C, and B to C. Given n items, I can determine the number of possible pairs using the binomial coefficient: $$\frac{n!}{2!(n – 2)!} = \binom {n}{2}$$ Using the R […]

## Getting Started with Factor Analysis

Take a look at the following correlation matrix for Olympic decathlon data calculated from 280 scores from 1960 through 2004 (Johnson and Wichern, p. 499): 100m LJ SP HJ 400m 100mH DS PV JV 1500m 100m 1.0000 0.6386 0.4752 0.3227 0.5520 0.3262 0.3509 0.4008 0.1821 -0.0352 LJ 0.6386 1.0000 0.4953 0.5668 0.4706 0.3520 0.3998 0.5167 […]

## An Introduction to Loglinear Models

Loglinear models model cell counts in contingency tables. They’re a little different from other modeling methods in that they don’t distinguish between response and explanatory variables. All variables in a loglinear model are essentially “responses”. To learn more about loglinear models, we’ll explore the following data from Agresti (1996, Table 6.3). It summarizes responses from […]

## Getting Started with Hurdle Models

Hurdle Models are a class of models for count data that help handle excess zeros and overdispersion. To motivate their use, let’s look at some data in R. The following data come with the AER package. It is a sample of 4,406 individuals, aged 66 and over, who were covered by Medicare in 1988. One […]

## Hierarchical Linear Regression

This post is NOT about Hierarchical Linear Modeling (HLM; multilevel modeling). The hierarchical regression is model comparison of nested regression models. When do I want to perform hierarchical regression analysis? Hierarchical regression is a way to show if variables of your interest explain a statistically significant amount of variance in your Dependent Variable (DV) after […]

## Getting started with Negative Binomial Regression Modeling

When it comes to modeling counts (ie, whole numbers greater than or equal to 0), we often start with Poisson regression. This is a generalized linear model where a response is assumed to have a Poisson distribution conditional on a weighted sum of predictors. For example, we might model the number of documented concussions to […]

## Introduction to Mediation Analysis

This post intends to introduce the basics of mediation analysis and does not explain statistical details. For details, please refer the articles at the end of this post. What is mediation? Let’s say previous studies have suggested that higher grades predict higher happiness: X (grades) → Y (happiness). (This research example is made up for […]

## Understanding 2-way Interactions

When doing linear modeling or ANOVA it’s useful to examine whether or not the effect of one variable depends on the level of one or more variables. If it does then we have what is called an “interaction”. This means variables combine or interact to affect the response. The simplest type of interaction is the […]