The classic two-by-two table displays counts of what may be called “successes” and “failures” versus some two-level grouping variable, such as gender (male and female) or treatment (placebo and active drug). An example of one such table is given in the book An Introduction to Categorical Data Analysis (Agresti, 1996, p. 20). The table classifies […]

# statistical methods

## Using and Interpreting Cronbach’s Alpha

I. What is Cronbach’s alpha? Cronbach’s alpha is a measure used to assess the reliability, or internal consistency, of a set of scale or test items. In other words, the reliability of any given measurement refers to the extent to which it is a consistent measure of a concept, and Cronbach’s alpha is one way […]

## Is R-squared Useless?

On Thursday, October 16, 2015, a disbelieving student posted on Reddit My stats professor just went on a rant about how R-squared values are essentially useless, is there any truth to this? It attracted a fair amount of attention, at least compared to other posts about statistics on Reddit. It turns out the student’s stats […]

## Fitting and Interpreting a Proportional Odds Model

Take a look at the following table. It is a cross tabulation of data taken from the 1991 General Social Survey that relates political party affiliation to political ideology. (Agresti, An Introduction to Categorical Data Analysis, 1996) Political Ideology by Party Affiliation, from the 1991 General Social Survey Very Liberal SlightlyLiberal Moderate SlightlyConservative Very Conservative […]

## Understanding Diagnostic Plots for Linear Regression Analysis

You ran a linear regression analysis and the stats software spit out a bunch of numbers. The results were significant (or not). You might think that you’re done with analysis. No, not yet. After running a regression analysis, you should check if the model works well for data. We can check if a model works […]

## Getting Started with Quantile Regression

When we think of regression we usually think of linear regression, the tried and true method for estimating a mean of some variable conditional on the levels or values of independent variables. In other words, we’re pretty sure the mean of our variable of interest differs depending on other variables. For example the mean weight […]

## Should I always transform my variables to make them normal?

When I first learned data analysis, I always checked normality for each variable and made sure they were normally distributed before running any analyses, such as t-test, ANOVA, or linear regression. I thought normal distribution of variables was the important assumption to proceed to analyses. That’s why stats textbooks show you how to draw histograms […]

## Simulating Endogeneity

First off, what is endogeneity, and why would we want to simulate it? Endogeneity occurs when a statistical model has an independent variable that is correlated with the error term. The reason we would want to simulate it is to understand what exactly that definition means! Let’s first simulate ideal data for simple linear regression […]

## Understanding Q-Q Plots

The Q-Q plot, or quantile-quantile plot, is a graphical tool to help us assess if a set of data plausibly came from some theoretical distribution such as a Normal or exponential. For example, if we run a statistical analysis that assumes our dependent variable is Normally distributed, we can use a Normal Q-Q plot to […]

## A Rule of Thumb for Unequal Variances

One of the assumptions of the Analysis of Variance (ANOVA) is constant variance. That is, the spread of residuals is roughly equal per treatment level. A common way to assess this assumption is plotting residuals versus fitted values. Recall that residuals are the observed values of your response of interest minus the predicted value of […]