Workshops

We offer, coordinate, and highlight workshops and training on data analysis and statistics, computation and software, as well as on Library resources and methods. Anyone in the UVA community may attend. It’s free! Feel free to email us recommendations for workshops you’d like to see: researchdataservices@virginia.edu.

View/register for StatLab workshops || View Library workshops || Find materials from past workshops
CADRE Training & Workshops || BioConnector Workshops || ARCS Workshops

StatLab Workshops: Spring 2017 Click the date to register! (Registration is not required, but we usually send out an email ahead of time with links to resources you’ll need for the workshop, and we don’t want you to miss out!)

Workshop Topic (Instructor) Day Time Location
Intro to R (Clay Ford) Tuesday, 1/24 10:00-11:30 Brown 133
Designed for the absolute beginner, this workshop provides a gentle introduction to R and RStudio. R is a free, open-source software environment and programming language designed specifically for statistical analysis. Since its introduction in 2000, R has rapidly increased in popularity thanks to its power, price (free!), and supportive community. RStudio is a free integrated development environment (IDE) that makes using and learning R much easier. In this workshop we’ll get you started using R with RStudio, show you how to import data, do some basic data manipulation, create a few graphics, perform some basic statistical analyses, and point you in the direction to learn more and go further with R!

Download workshop materials


Intro to Python (2 hours) (Pete Alonzi) Wednesday, 1/25
Tuesday 1/31
1:30-3:30
10:00-12:00
Brown 133
Brown 133
This workshop covers the fundamentals of python beginning with setting it up on your system. No prior experience is required. Just bring your laptop. We will start with installation and then move to interpreted coding focusing on the built-in data types. This will be a hands on experience with exercises throughout and plenty of time to get your hands dirty.

Download workshop materials


Web Scraping in R with rvest (Clay Ford) Thursday, 2/2 10:00-11:30 Brown 133
Sometimes data we find on the internet isn’t formatted for downloading and easy importing into our statistical program of choice. It’s simply displayed on a static web page as a table (if we’re lucky) or scattered about the page in various locations. To get this data requires “web scraping”. This means pulling out specific parts of a web page that we want to keep and wrangling into a structure suitable for further analysis. A recently-developed R package called rvest makes this process easier. In this workshop we’ll introduce how to use rvest for scraping web pages by way of several examples. We’ll also present a general strategy for web scraping and demonstrate some basic programming approaches to scraping multi-page web sites. Previous experience with R will be helpful.

Download workshop materials


Introduction to Unix (Ricky Patterson) Tuesday, 2/7 10:00-11:30 Brown 133
This workshop will introduce new users to the command line interface and Unix shell commands. This would be useful both for users interested in using Unix on a local machine (including Linux and Mac OS X), as well as users who want to make use of remote resources such as the Rivanna cluster. Users will learn how to create and navigate directories, and to create, copy, move, and search files. We will also cover setting and changing file permissions, and creating symbolic links. Redirection of output and job control, with a brief discussion of shell scripts.

Users will need to bring their own laptop in order to fully participate in the workshop.

Download workshop materials


Web Scraping in Python (Eric Rochester) Wednesday, 2/8 10:00-11:30 Alderman 421
Sometimes data we find on the internet isn’t formatted for downloading and easy importing into our statistical program of choice. It’s simply displayed on a static web page as a table (if we’re lucky) or scattered about the page in various locations. To get this data requires “web scraping”. This means pulling out specific parts of a web page that we want to keep and wrangling into a structure suitable for further analysis. The general-purpose programming language Python has a number of libraries that work together to make this process relatively painless. We’ll talk about the process involved in web-scraping, some of the things to keep in mind, and how to use these tools in concert to get the data you need in a format you need. Some knowledge of Python would be helpful.

Workshop slides


Sentiment Analysis in R (Michele Claibourn) Thursday, 2/9 10:00-11:30 Brown 133
Sentiment analysis attempts to computationally identify and categorize the tone, opinion, affect, or polarity of textual communication. This workshop will illustrate lexicon-based sentiment analysis techniques (using pre-defined dictionaries) in R using multiple packages (quanteda and tidytext) through multiple examples and for multiple types of sentiment. Previous experience with R will be helpful but not required.

Download workshop materials


Character Manipulation in R (Clay Ford) Tuesday, 2/14 10:00-11:30 Brown 133
Extract text between HTML tags. Pad zip codes with leading 0s. Split a name into First and Last name fields. Pull only responses from a certain person from a transcript. These are all examples of character manipulation, something that is often done when cleaning data. In this workshop we’ll introduce a variety of helpful R functions and packages for working with character data. We’ll also introduce regular expressions, a special language for defining text patterns. At the end of this workshop you’ll have a powerful arsenal of tools for manipulating character strings in R. Previous experience with R will be helpful but not required.

Download workshop materials


Text Processing and Topic Modeling in Python (Jon Ashley) Wednesday, 2/15 10:00-11:30 Brown 133
This workshop is an introduction to Natural Language Processing and some of the basic processes that can be applied to a corpus of texts. We will cover preparation of texts, tokenization, part of speech tagging, and topic modeling. Although there are a variety of NLP packages available in Python we will be using “spaCy”. Prior experience with Python is helpful.

View Jupyter Notebook


Topic Modeling in R (Michele Claibourn) Thursday, 2/16 10:00-11:30 Brown 133
Topic modeling is a popular tool for modeling document collections and has been applied in a variety of domains, from medical science to digital humanities. This workshop will introduce topic modeling in R, from processing text for a topic model, estimating topic models, steps for evaluation and interpretation of results, and visualization. Previous experience with R will be immensely helpful but not required.

Download workshop materials


Introduction to R Markdown (Clay Ford) Tuesday, 2/21 10:00-11:30 Brown 133
R Markdown is an authoring format that makes it easy to write reports and create presentations with R. You simply combine R code with text written in markdown and then export the results as an html, pdf, or Word file. You don’t need to save individual graphs and insert them into your document. You don’t need to copy and paste calculations. You don’t need to learn a new programming language. With a single keystroke you can generate a professional-looking document that contains all your R code, statistical results, plots and exposition. The best part is that it’s free and easy to do with RStudio. In this workshop we’ll get you up and running with R Markdown and have you creating reports and presentations in no time. Previous experience with R will be helpful but not required.


Introduction to ShareLaTeX for collaborative LaTeX (Ricky Patterson) Wednesday, 2/22 10:00-11:30 Brown 133
LaTeX is a powerful (and free) document typesetting program, widely used in a number of academic disciplines for compiling professional research papers, articles, dissertations, presentations, letters, and books. It is especially useful for the creation and integration of mathematical formulae, tables and bibliographies into documents. Running an installation of LaTeX on your own computer can make it difficult to work on a document collaboratively. The UVa Library has recently provided access for all UVa users to an on-line collaborative LaTeX editor, ShareLaTeX. Come learn how to take full advantage of this powerful tool.

Participants will need to bring their own laptop for this workshop.


Introduction to Dedoose (Nancy Kechner) Thursday, 2/23 10:00-11:30 Library Data Commons@Curry
New to Qualitative Research? Imagine being able to blend your video, audio, and text data with your spreadsheet information in an on-line tool to get the most out of all of your information! Dedoose is an easy to learn, feature rich, and affordable web app that can help you visualize a variety of information from your work that you can share with the research community. Come and see Dedoose in action to see if you want to add qualitative analysis to your research toolbox.


Text Classification in Python (Pete Alonzi) Monday, 2/27 2:30-4:00 Brown 133
Text classification is a broad field covering an array of topics, for example determining if a newspaper article is from the sports page or not. In this workshop we will go through the fundamental steps like tokenization and then proceed to completing tasks such as creating a spam filter. These techniques use machine learning but no prior machine learning knowledge is necessary for this workshop. We will primarily be working with the package NLTK. Fundamental knowledge of python is helpful but not mandatory for this workshop. Please bring your laptop.


Survival Analysis in Stata (Alex Jakubow) Wednesday, 3/1 10:00-11:30 Brown 133
Social scientific and biomedical researchers are frequently interested in understanding the time to the occurrence of some event—such as the death of a participant in a clinical trial or the end of a governing coalition. This workshop introduces key methodological concepts in survival analysis and contrasts fully-, semi-, and non-parametrized modeling frameworks. Examples from Stata illustrate how to prepare a dataset for survival analysis, interpret regression results, and conduct important diagnostic tests. This workshop assumes participants are comfortable with multivariate regression and familiar with the analysis of limited/categorical dependent variables. Prior experience with Stata is helpful but not required.


Text Classification in R (Michele Claibourn) Thursday, 3/2 10:00-11:30 Brown 133
Text classification encompasses a variety of models for categorizing documents into existing groups — spam filters are the prototypical example. It is a form of machine learning applied to textual data. This workshop will work through methods of performing text classification in R using multiple examples and packages. Previous experience with R will be immensely helpful but not required.


Intro to Git/Github (Pete Alonzi) Tuesday, 3/14 10:00-11:30 Brown 133
Git is a program in the class of version control software. Proper use will help you to manage your development. Until recently the software has been a burden to operate but the development of Github.com has changed that. In this workshop we will explore the use of git through the github framework. We will work with the web interface and the desktop client. Please bring your laptops. The use of github requires a user account so please set one up prior to arrival at github.com.


Visualization with Python+Bokeh (Eric Rochester) Wednesday, 3/15 10:00-11:30 Alderman 421
We will explore how to take your data from bits in memory to beautiful images on the screen. We’ll use the package Bokeh, which allows you to create interactive visualizations that you can easily publish on the Internet. This will be a hands on experience with exercises throughout and plenty of time to get your hands dirty.


Advanced Visualization with R (Pete Nagraj) Thursday, 3/16 10:00-11:30 Brown 133
This workshop will cover fundamental concepts for creating effective data visualization and will introduce tools and techniques for visualizing data using R. We will review fundamental concepts for visually displaying quantitative information, such as using series of small multiples, avoiding “chart-junk,” and maximizing the data-ink ratio. After briefly covering data visualization using base R graphics, we will introduce the ggplot2 package for advanced high-dimensional visualization. We will cover the grammar of graphics (geoms, aesthetics, stats, and faceting), and using ggplot2 to create plots layer-by-layer. Upon completing this lesson, learners will be able to use ggplot2 to explore a high-dimensional dataset by faceting and scaling scatter plots in small multiples.