text mining

Analysis of Ours to Shape Comments, Part 5

Introduction In the penultimate post of this series, we’ll use some unsupervised learning approaches to uncover comment clusters and latent themes among the comments to President Ryan’s Ours to Shape website. The full code to recreate the analysis in the blog posts is available on GitHub. Cluster Analysis Cluster analysis is about discovering groups in […]

Analysis of Ours to Shape Comments, Part 4

Introduction We’re still analyzing the comments submitted to President Ryan’s Ours to Shape website. In the fourth installment of this series (we’re almost done, I promise), we’ll look at the sentiment – aka positive-negative tone, polarity, affect – of the comments to President Ryan’s Ours to Shape website. We don’t have a pre-labeled set of […]

Analysis of Ours to Shape Comments, Part 3

Introduction To recap, we’re exploring the comments submitted to President Ryan’s Ours to Shape website (as of December 7, 2018). In the first post we looked at the distribution of comments across Ryan’s three categories – community, discovery, and service – and across the contributors’ primary connection to the university. We extracted features like length […]

Analysis of Ours to Shape Comments, Part 2

Introduction In the last post, we began exploring the comments submitted to the Ours to Shape website. We looked at the distribution across categories and contributors, the length and readability of the comments, and a few key words in context. While I did more exploration of the data than reported, the first post gives a […]

Analysis of Ours to Shape Comments, Part 1

Introduction As part of a series of workshops on quantitative analysis of text this fall, I started examining the comments submitted to President Ryan’s Ours to Shape website. The site invites people to share their ideas and insights for UVA going forward, particularly in the domains of service, discovery, and community. The website was only […]

Look People are Going to Think… (Debate Rhetoric Redux)

I’m still looking at the rhetoric from the presidential debates, this time focusing on the first general election debate between Hillary Clinton and Donald Trump. I turned the data frame from last week into a corpus, did some pre-processing with the tm package (remove capitalization, punctuation, stopwords, stemming the words and then completing stems with […]

Debate Prep!

I’m teaching a Text as Data short course (using R) right now, and as a card-carrying political scientist, I couldn’t resist using the ongoing campaign as an example (this was, in party, a way of handling my own anxiety about last Monday’s debate — this is what I was doing while watching). So here goes… […]

Reading PDF files into R for text mining

Let’s say we’re interested in text mining the opinions of The Supreme Court of the United States from the 2014 term. The opinions are published as PDF files at the following web page http://www.supremecourt.gov/opinions/slipopinion/14. We would probably want to look at all 76 opinions, but for the purposes of this introductory tutorial we’ll just look […]