R

Analysis of Ours to Shape Comments, Part 5

Introduction In the penultimate post of this series, we’ll use some unsupervised learning approaches to uncover comment clusters and latent themes among the comments to President Ryan’s Ours to Shape website. The full code to recreate the analysis in the blog posts is available on GitHub. Cluster Analysis Cluster analysis is about discovering groups in […]

Analysis of Ours to Shape Comments, Part 4

Introduction We’re still analyzing the comments submitted to President Ryan’s Ours to Shape website. In the fourth installment of this series (we’re almost done, I promise), we’ll look at the sentiment – aka positive-negative tone, polarity, affect – of the comments to President Ryan’s Ours to Shape website. We don’t have a pre-labeled set of […]

Analysis of Ours to Shape Comments, Part 3

Introduction To recap, we’re exploring the comments submitted to President Ryan’s Ours to Shape website (as of December 7, 2018). In the first post we looked at the distribution of comments across Ryan’s three categories – community, discovery, and service – and across the contributors’ primary connection to the university. We extracted features like length […]

Analysis of Ours to Shape Comments, Part 2

Introduction In the last post, we began exploring the comments submitted to the Ours to Shape website. We looked at the distribution across categories and contributors, the length and readability of the comments, and a few key words in context. While I did more exploration of the data than reported, the first post gives a […]

Analysis of Ours to Shape Comments, Part 1

Introduction As part of a series of workshops on quantitative analysis of text this fall, I started examining the comments submitted to President Ryan’s Ours to Shape website. The site invites people to share their ideas and insights for UVA going forward, particularly in the domains of service, discovery, and community. The website was only […]

Assessing Type S and Type M Errors

The paper Beyond Power Calculations: Assessing Type S (Sign) and Type M (Magnitude) Errors by Andrew Gelman and John Carlin introduces the idea of performing design calculations to help prevent researchers from being misled by statistically significant results in studies with small samples and/or noisy measurements. The main idea is that researchers often overestimate effect […]

Interpreting Log Transformations in a Linear Model

Log transformations are often recommended for skewed data, such as monetary measures or certain biological and demographic measures. Log transforming data usually has the effect of spreading out clumps of data and bringing together spread-out data. For example, below is a histogram of the areas of all 50 US states. It is skewed to the […]

Getting Started with Matching Methods

A frequent research question is whether or not some “treatment” causes an effect. For example, does taking aspirin daily reduce the chance of a heart attack? Does more sleep lead to better academic performance for teenagers? Does smoking increase the risk of chronic obstructive pulmonary disease (COPD)? To truly answer such questions, we need a […]

Getting Started with Moderated Mediation

In a previous post we demonstrated how to perform a basic mediation analysis. In this post we look at performing a moderated mediation analysis. The basic idea is that a mediator may depend on another variable called a “moderator”. For example, in our mediation analysis post we hypothesized that self-esteem was a mediator of student […]