Consider the following data from the text Design and Analysis of Experiments, 7 ed (Montgomery, Table 3.1). It has two variables: power and rate. Power is a discrete setting on a tool used to etch circuits into a silicon wafer. There are four levels to choose from. Rate is the distance etched measured in Angstroms […]

# linear regression

## Understanding Robust Standard Errors

What are robust standard errors? How do we calculate them? Why use them? Why not use them all the time if they’re so robust? Those are the kinds of questions this post intends to address. To begin, let’s start with the relatively easy part: getting robust standard errors for basic linear models in Stata and […]

## Modeling Non-Constant Variance

One of the basic assumptions of linear modeling is constant, or homogeneous, variance. What does that mean exactly? Let’s simulate some data that satisfies this condition to illustrate the concept. Below we create a sorted vector of numbers ranging from 1 to 10 called x, and then create a vector of numbers called y that […]

## Getting Started with Multiple Imputation in R

Whenever we are dealing with a dataset, we almost always run into a problem that may decrease our confidence in the results that we are getting – missing data! Examples of missing data can be found in surveys – where respondents intentionally refrained from answering a question, didn’t answer a question because it is not […]

## Interpreting Log Transformations in a Linear Model

Log transformations are often recommended for skewed data, such as monetary measures or certain biological and demographic measures. Log transforming data usually has the effect of spreading out clumps of data and bringing together spread-out data. For example, below is a histogram of the areas of all 50 US states. It is skewed to the […]

## Hierarchical Linear Regression

This post is NOT about Hierarchical Linear Modeling (HLM; multilevel modeling). The hierarchical regression is model comparison of nested regression models. When do I want to perform hierarchical regression analysis? Hierarchical regression is a way to show if variables of your interest explain a statistically significant amount of variance in your Dependent Variable (DV) after […]

## Understanding Diagnostic Plots for Linear Regression Analysis

You ran a linear regression analysis and the stats software spit out a bunch of numbers. The results were significant (or not). You might think that you’re done with analysis. No, not yet. After running a regression analysis, you should check if the model works well for data. We can check if a model works […]

## Should I always transform my variables to make them normal?

When I first learned data analysis, I always checked normality for each variable and made sure they were normally distributed before running any analyses, such as t-test, ANOVA, or linear regression. I thought normal distribution of variables was the important assumption to proceed to analyses. That’s why stats textbooks show you how to draw histograms […]