text mining

Analysis of Ours to Shape Comments, Part 1

Introduction As part of a series of workshops on quantitative analysis of text this fall, I started examining the comments submitted to President Ryan’s Ours to Shape website. The site invites people to share their ideas and insights for UVA going forward, particularly in the domains of service, discovery, and community. The website was only […]

A Beginner’s Guide to Text Analysis with quanteda

A lot of introductory tutorials to quanteda assume that the reader has some base of knowledge about the program’s functionality or how it might be used. Other tutorials assume that the user is an expert in R and on what goes on under the hood when you’re coding. This introductory guide will assume none of […]

Look People are Going to Think… (Debate Rhetoric Redux)

I’m still looking at the rhetoric from the presidential debates, this time focusing on the first general election debate between Hillary Clinton and Donald Trump. I turned the data frame from last week into a corpus, did some pre-processing with the tm package (remove capitalization, punctuation, stopwords, stemming the words and then completing stems with […]

Debate Prep!

I’m teaching a Text as Data short course (using R) right now, and as a card-carrying political scientist, I couldn’t resist using the ongoing campaign as an example (this was, in party, a way of handling my own anxiety about last Monday’s debate — this is what I was doing while watching). So here goes… […]

Reading PDF files into R for text mining

8-DEC-2016 UPDATE: Added section on using the pdftools package for reading PDF files into R. Let’s say we’re interested in text mining the opinions of The Supreme Court of the United States from the 2014 term. The opinions are published as PDF files at the following web page http://www.supremecourt.gov/opinions/slipopinion/14. We would probably want to look […]