Clay Ford

An Introduction to Loglinear Models

Loglinear models model cell counts in contingency tables. They’re a little different from other modeling methods in that they don’t distinguish between response and explanatory variables. All variables in a loglinear model are essentially “responses”. To learn more about loglinear models, we’ll explore the following data from Agresti (1996, Table 6.3). It summarizes responses from […]

Setting up Color Palettes in R

Plotting with color in R is kind of like painting a room in your house: you have to pick some colors. R has some default colors ready to go, but it’s only natural to want to play around and try some different combinations. In this post we’ll look at some ways you can define new […]

Getting Started with Hurdle Models

Hurdle Models are a class of models for count data that help handle excess zeros and overdispersion. To motivate their use, let’s look at some data in R. The following data come with the AER package. It is a sample of 4,406 individuals, aged 66 and over, who were covered by Medicare in 1988. One […]

Getting started with Negative Binomial Regression Modeling

When it comes to modeling counts (ie, whole numbers greater than or equal to 0), we often start with Poisson regression. This is a generalized linear model where a response is assumed to have a Poisson distribution conditional on a weighted sum of predictors. For example, we might model the number of documented concussions to […]

Visualizing the Effects of Logistic Regression

Logistic regression is a popular and effective way of modeling a binary response. For example, we might wonder what influences a person to volunteer, or not volunteer, for psychological research. Some do, some don’t. Are there independent variables that would help explain or distinguish between those who volunteer and those who don’t? Logistic regression gives […]

Reading PDF files into R for text mining

8-DEC-2016 UPDATE: Added section on using the pdftools package for reading PDF files into R. Let’s say we’re interested in text mining the opinions of The Supreme Court of the United States from the 2014 term. The opinions are published as PDF files at the following web page http://www.supremecourt.gov/opinions/slipopinion/14. We would probably want to look […]

Understanding 2-way Interactions

When doing linear modeling or ANOVA it’s useful to examine whether or not the effect of one variable depends on the level of one or more variables. If it does then we have what is called an “interaction”. This means variables combine or interact to affect the response. The simplest type of interaction is the […]

Comparing Proportions with Relative Risk and Odds Ratios

The classic two-by-two table displays counts of what may be called “successes” and “failures” versus some two-level grouping variable, such as gender (male and female) or treatment (placebo and active drug). An example of one such table is given in the book An Introduction to Categorical Data Analysis (Agresti, 1996, p. 20). The table classifies […]

Is R-squared Useless?

On Thursday, October 16, 2015, a disbelieving student posted on Reddit My stats professor just went on a rant about how R-squared values are essentially useless, is there any truth to this? It attracted a fair amount of attention, at least compared to other posts about statistics on Reddit. It turns out the student’s stats […]

Fitting and Interpreting a Proportional Odds Model

Take a look at the following table. It is a cross tabulation of data taken from the 1991 General Social Survey that relates political party affiliation to political ideology. (Agresti, An Introduction to Categorical Data Analysis, 1996) Political Ideology by Party Affiliation, from the 1991 General Social Survey Very Liberal SlightlyLiberal Moderate SlightlyConservative Very Conservative […]