Nonparametric and Parametric Power: Comparing the Wilcoxon Test and the t-test

From 2004 to 2008, a series of four brief, disagreeing papers in the journal Medical Education took up the question of whether and when it’s appropriate to analyze data from Likert scales (i.e., integers reflecting degrees of agreement with statements) with parametric or nonparametric statistical methods. Although no overly convincing consensus emerged, at least in […]

Detecting Influential Points in Regression with DFBETA(S)

In regression modeling, influential points are observations that, individually, exert large effects on a model’s results—the parameter estimates (\(\hat{\beta_0}, \hat{\beta_1}, …, \hat{\beta_j}\)) and, consequently, the model’s predictions (\(\hat{y_1}, \hat{y_2}, …, \hat{y_i}\)). Influential points aren’t necessarily troublesome, but observations flagged as highly influential warrant follow-up. A large value on an influence measure can signal anything from […]

ROC Curves and AUC for Models Used for Binary Classification

This article assumes basic familiarity with the use and interpretation of logistic regression, odds and probabilities, and true/false positives/negatives. The examples are coded in R. ROC curves and AUC have important limitations, and I encourage reading through the section at the end of the article to get a sense of when and why the tools […]