I’m still looking at the rhetoric from the presidential debates, this time focusing on the first general election debate between Hillary Clinton and Donald Trump. I turned the data frame from last week into a corpus, did some pre-processing with the tm package (remove capitalization, punctuation, stopwords, stemming the words and then completing stems with […]

# R

## Using Census Data API with R

Datasets provided by the US Census Bureau, such as Decennial Census and American Community Survey (ACS), are widely used by many researchers, among others. You can certainly find and download census data from the Census Bureau website, from our licensed data source Social Explorer, or other free sources such as IPUMS-USA, then load the data […]

## Debate Prep!

I’m teaching a Text as Data short course (using R) right now, and as a card-carrying political scientist, I couldn’t resist using the ongoing campaign as an example (this was, in party, a way of handling my own anxiety about last Monday’s debate — this is what I was doing while watching). So here goes… […]

## A tidyr Tutorial

The tidyr package by Hadley Wickham centers on two functions: gather and spread. If you have struggled to understand what exactly these two functions do, this tutorial is for you. To begin we need to wrap our heads around the idea of “key-value pairs”. The help pages for gather and spread use this terminology to […]

## Getting Started with Factor Analysis

Take a look at the following correlation matrix for Olympic decathlon data calculated from 280 scores from 1960 through 2004 (Johnson and Wichern, p. 499): 100m LJ SP HJ 400m 100mH DS PV JV 1500m 100m 1.0000 0.6386 0.4752 0.3227 0.5520 0.3262 0.3509 0.4008 0.1821 -0.0352 LJ 0.6386 1.0000 0.4953 0.5668 0.4706 0.3520 0.3998 0.5167 […]

## An Introduction to Loglinear Models

Loglinear models model cell counts in contingency tables. They’re a little different from other modeling methods in that they don’t distinguish between response and explanatory variables. All variables in a loglinear model are essentially “responses”. To learn more about loglinear models, we’ll explore the following data from Agresti (1996, Table 6.3). It summarizes responses from […]

## Setting up Color Palettes in R

Plotting with color in R is kind of like painting a room in your house: you have to pick some colors. R has some default colors ready to go, but it’s only natural to want to play around and try some different combinations. In this post we’ll look at some ways you can define new […]

## Hierarchical Linear Regression

This post is NOT about Hierarchical Linear Modeling (HLM; multilevel modeling). The hierarchical regression is model comparison of nested regression models. When do I want to perform hierarchical regression analysis? Hierarchical regression is a way to show if variables of your interest explain a statistically significant amount of variance in your Dependent Variable (DV) after […]

## Getting started with Negative Binomial Regression Modeling

When it comes to modeling counts (ie, whole numbers greater than or equal to 0), we often start with Poisson regression. This is a generalized linear model where a response is assumed to have a Poisson distribution conditional on a weighted sum of predictors. For example, we might model the number of documented concussions to […]

## Visualizing the Effects of Logistic Regression

Logistic regression is a popular and effective way of modeling a binary response. For example, we might wonder what influences a person to volunteer, or not volunteer, for psychological research. Some do, some don’t. Are there independent variables that would help explain or distinguish between those who volunteer and those who don’t? Logistic regression gives […]