# statistical methods

## Introduction to Mediation Analysis

This post intends to introduce the basics of mediation analysis and does not explain statistical details. For details, please refer to the articles at the end of this post. What is mediation? Let’s say previous studies have suggested that higher grades predict higher happiness: X (grades) → Y (happiness). (This research example is made up […]

## Understanding 2-way Interactions

When doing linear modeling or ANOVA it’s useful to examine whether or not the effect of one variable depends on the level of one or more variables. If it does then we have what is called an “interaction”. This means variables combine or interact to affect the response. The simplest type of interaction is the […]

## Comparing Proportions with Relative Risk and Odds Ratios

The classic two-by-two table displays counts of what may be called “successes” and “failures” versus some two-level grouping variable, such as gender (male and female) or treatment (placebo and active drug). An example of one such table is given in the book An Introduction to Categorical Data Analysis (Agresti, 1996, p. 20). The table classifies […]

## Using and Interpreting Cronbach’s Alpha

I. What is Cronbach’s alpha? Cronbach’s alpha is a measure used to assess the reliability, or internal consistency, of a set of scale or test items. In other words, the reliability of any given measurement refers to the extent to which it is a consistent measure of a concept, and Cronbach’s alpha is one way […]

## Is R-squared Useless?

On Thursday, October 16, 2015, a disbelieving student posted on Reddit My stats professor just went on a rant about how R-squared values are essentially useless, is there any truth to this? It attracted a fair amount of attention, at least compared to other posts about statistics on Reddit. It turns out the student’s stats […]

## Fitting and Interpreting a Proportional Odds Model

Take a look at the following table. It is a cross tabulation of data taken from the 1991 General Social Survey that relates political party affiliation to political ideology. (Agresti, An Introduction to Categorical Data Analysis, 1996) Political Ideology by Party Affiliation, from the 1991 General Social Survey Very Liberal SlightlyLiberal Moderate SlightlyConservative Very Conservative […]

## Understanding Diagnostic Plots for Linear Regression Analysis

You ran a linear regression analysis and the stats software spit out a bunch of numbers. The results were significant (or not). You might think that you’re done with analysis. No, not yet. After running a regression analysis, you should check if the model works well for data. We can check if a model works […]

## Getting Started with Quantile Regression

When we think of regression we usually think of linear regression, the tried and true method for estimating a mean of some variable conditional on the levels or values of independent variables. In other words, we’re pretty sure the mean of our variable of interest differs depending on other variables. For example the mean weight […]

## Should I always transform my variables to make them normal?

When I first learned data analysis, I always checked normality for each variable and made sure they were normally distributed before running any analyses, such as t-test, ANOVA, or linear regression. I thought normal distribution of variables was the important assumption to proceed to analyses. That’s why stats textbooks show you how to draw histograms […]

## Simulating Endogeneity

First off, what is endogeneity, and why would we want to simulate it? Endogeneity occurs when a statistical model has an independent variable that is correlated with the error term. The reason we would want to simulate it is to understand what exactly that definition means! Let’s first simulate ideal data for simple linear regression […]