R

Data Scientist as Cartographer: An Introduction to Making Interactive Maps in R with Leaflet

Note: This version of the article contains static images of maps generated with Leaflet. To view a version with interactive maps, click here. A striking feature of many maps from early in the history of cartography is their linearity. Being primarily for travel (and given the technological limitations on how faithfully geographies could be understood […]

Understanding Robust Standard Errors

What are robust standard errors? How do we calculate them? Why use them? Why not use them all the time if they’re so robust? Those are the kinds of questions this post intends to address. To begin, let’s start with the relatively easy part: getting robust standard errors for basic linear models in Stata and […]

Getting Started with Multinomial Logit Models

Multinomial logit models allow us to model membership in a group based on known variables. For example, operating system preference of a university’s students could be classified as “Windows”, “Mac”, or “Linux”. Perhaps we would like to better understand why students choose one OS versus another. We might want to build a statistical model that […]

Understanding Empirical Cumulative Distribution Functions

What are empirical cumulative distribution functions and what can we do with them? To answer the first question, let’s first step back and make sure we understand “distributions”, or more specifically, “probability distributions”. A Basic Probability Distribution Imagine a simple event, say flipping a coin 3 times. Here are all the possible outcomes, where H […]

Getting Started with Rate Models

Let’s say we’re interested in modeling the number of auto accidents that occur at various intersections within a city. Upon collecting data after a certain period of time perhaps we notice two intersections have the same number of accidents, say 25. Is it correct to conclude these two intersections are similar in their propensity for […]

Getting Started with Regular Expressions

Regular expressions (or regex) are tools for matching patterns in character strings. These can be useful for finding words or letter patterns in text, parsing filenames for specific information, and interpreting input formatted in a variety of ways (e.g., phone numbers). The syntax of regular expressions is generally recognized across operating systems and programming languages. […]

Getting Started with Shiny

What is Shiny? Shiny is an R package that facilitates the creation of interactive web apps using R code, which can be hosted locally, on the shinyapps server, or on your own server. Shiny apps can range from extremely simple to incredibly sophisticated. They can be written purely with R code or supplemented with HTML, […]

Modeling Non-Constant Variance

One of the basic assumptions of linear modeling is constant, or homogeneous, variance. What does that mean exactly? Let’s simulate some data that satisfies this condition to illustrate the concept. Below we create a sorted vector of numbers ranging from 1 to 10 called x, and then create a vector of numbers called y that […]

Creating a SQLite database for use with R

When you import or load data into R, the data are stored in Random Access Memory (RAM). This is the memory that is deleted when you close R or shut off your computer. It’s very fast but temporary. If you save your data, it is saved to your hard drive. But when you open R […]

Simulating Data for Count Models

A count model is a linear model where the dependent variable is a count. For example, the number of times a car breaks down, the number of rats in a litter, the number of times a young student gets out of his seat, etc. Counts are either 0 or a postive whole number, which means […]