Jacob Goldstein-Greenwood

Nonparametric and Parametric Power: Comparing the Wilcoxon Test and the t-test

From 2004 to 2008, a series of four brief, disagreeing papers in the journal Medical Education took up the question of whether and when it’s appropriate to analyze data from Likert scales (i.e., integers reflecting degrees of agreement with statements) with parametric or nonparametric statistical methods. Although no overly convincing consensus emerged, at least in […]

Detecting Influential Points in Regression with DFBETA(S)

In regression modeling, influential points are observations that, individually, exert large effects on a model’s results—the parameter estimates (\(\hat{\beta_0}, \hat{\beta_1}, …, \hat{\beta_j}\)) and, consequently, the model’s predictions (\(\hat{y_1}, \hat{y_2}, …, \hat{y_i}\)). Influential points aren’t necessarily troublesome, but observations flagged as highly influential warrant follow-up. A large value on an influence measure can signal anything from […]

ROC Curves and AUC for Models Used for Binary Classification

This article assumes basic familiarity with the use and interpretation of logistic regression, odds and probabilities, and true/false positives/negatives. The examples are coded in R. ROC curves and AUC have important limitations, and I encourage reading through the section at the end of the article to get a sense of when and why the tools […]

The Intuition Behind Confidence Intervals

Say it with me: An X% confidence interval captures the population parameter in X% of repeated samples. In the course of our statistical educations, many of us had that line (or some variant of it) crammed, wedged, stuffed, and shoved into our skulls until definitional precision was leaking out of noses and pooling on our […]

Ask Better Code Questions (and Get Better Answers) With Reprex

Note: This article was written about version 2.0.0 of the reprex package. In the forums and Q&A sections of websites like Stack Overflow, GitHub, and community.rstudio.com, there is a volunteer force of data science detectives, code consultants, and error-fighting emissaries ready to offer assistance to programmers who find themselves staring down unhappy code that’s resisting […]

A Brief on Brier Scores

Not all predictions are created equal, even if, in categorical terms, the predictions suggest the same outcome: “X will (or won’t) happen.” Say that I estimate that there’s a 60% chance that 100 million COVID-19 vaccines will be administered in the US during the first 100 days of Biden’s presidency, but my friend estimates that […]

Data Scientist as Cartographer: An Introduction to Making Interactive Maps in R with Leaflet

Note: This version of the article contains static images of maps generated with Leaflet. To view a version with interactive maps, click here. A striking feature of many maps from early in the history of cartography is their linearity. Being primarily for travel (and given the technological limitations on how faithfully geographies could be understood […]