We offer, coordinate, and highlight workshops and training on data analysis and statistics, computation and software, as well as on Library resources and methods. Anyone in the UVA community may attend. It’s free! Feel free to email us recommendations for workshops you’d like to see: email@example.com.
StatLab Workshops: Spring 2017 Click the date to register! (Registration is not required, but we usually send out an email ahead of time with links to resources you’ll need for the workshop, and we don’t want you to miss out!)
|Workshop Topic (Instructor)||Day||Time||Location|
|Intro to R (Clay Ford)||Tuesday, 1/24||10:00-11:30||Brown 133|
|Designed for the absolute beginner, this workshop provides a gentle introduction to R and RStudio. R is a free, open-source software environment and programming language designed specifically for statistical analysis. Since its introduction in 2000, R has rapidly increased in popularity thanks to its power, price (free!), and supportive community. RStudio is a free integrated development environment (IDE) that makes using and learning R much easier. In this workshop we’ll get you started using R with RStudio, show you how to import data, do some basic data manipulation, create a few graphics, perform some basic statistical analyses, and point you in the direction to learn more and go further with R!
|Intro to Python (2 hours) (Pete Alonzi)||Wednesday, 1/25
|This workshop covers the fundamentals of python beginning with setting it up on your system. No prior experience is required. Just bring your laptop. We will start with installation and then move to interpreted coding focusing on the built-in data types. This will be a hands on experience with exercises throughout and plenty of time to get your hands dirty.
|Web Scraping in R with rvest (Clay Ford)||Thursday, 2/2||10:00-11:30||Brown 133|
|Sometimes data we find on the internet isn’t formatted for downloading and easy importing into our statistical program of choice. It’s simply displayed on a static web page as a table (if we’re lucky) or scattered about the page in various locations. To get this data requires “web scraping”. This means pulling out specific parts of a web page that we want to keep and wrangling into a structure suitable for further analysis. A recently-developed R package called rvest makes this process easier. In this workshop we’ll introduce how to use rvest for scraping web pages by way of several examples. We’ll also present a general strategy for web scraping and demonstrate some basic programming approaches to scraping multi-page web sites. Previous experience with R will be helpful.
Materials for presentation made during Endangered Data Week (April 17-21, 2017)
|Introduction to Unix (Ricky Patterson)||Tuesday, 2/7||10:00-11:30||Brown 133|
|This workshop will introduce new users to the command line interface and Unix shell commands. This would be useful both for users interested in using Unix on a local machine (including Linux and Mac OS X), as well as users who want to make use of remote resources such as the Rivanna cluster. Users will learn how to create and navigate directories, and to create, copy, move, and search files. We will also cover setting and changing file permissions, and creating symbolic links. Redirection of output and job control, with a brief discussion of shell scripts.
Users will need to bring their own laptop in order to fully participate in the workshop.
|Web Scraping in Python (Eric Rochester)||Wednesday, 2/8||10:00-11:30||Alderman 421|
|Sometimes data we find on the internet isn’t formatted for downloading and easy importing into our statistical program of choice. It’s simply displayed on a static web page as a table (if we’re lucky) or scattered about the page in various locations. To get this data requires “web scraping”. This means pulling out specific parts of a web page that we want to keep and wrangling into a structure suitable for further analysis. The general-purpose programming language Python has a number of libraries that work together to make this process relatively painless. We’ll talk about the process involved in web-scraping, some of the things to keep in mind, and how to use these tools in concert to get the data you need in a format you need. Some knowledge of Python would be helpful.
|Sentiment Analysis in R (Michele Claibourn)||Thursday, 2/9||10:00-11:30||Brown 133|
|Sentiment analysis attempts to computationally identify and categorize the tone, opinion, affect, or polarity of textual communication. This workshop will illustrate lexicon-based sentiment analysis techniques (using pre-defined dictionaries) in R using multiple packages (quanteda and tidytext) through multiple examples and for multiple types of sentiment. Previous experience with R will be helpful but not required.
|Character Manipulation in R (Clay Ford)||Tuesday, 2/14||10:00-11:30||Brown 133|
|Extract text between HTML tags. Pad zip codes with leading 0s. Split a name into First and Last name fields. Pull only responses from a certain person from a transcript. These are all examples of character manipulation, something that is often done when cleaning data. In this workshop we’ll introduce a variety of helpful R functions and packages for working with character data. We’ll also introduce regular expressions, a special language for defining text patterns. At the end of this workshop you’ll have a powerful arsenal of tools for manipulating character strings in R. Previous experience with R will be helpful but not required.
|Text Processing and Topic Modeling in Python (Jon Ashley)||Wednesday, 2/15||10:00-11:30||Brown 133|
|This workshop is an introduction to Natural Language Processing and some of the basic processes that can be applied to a corpus of texts. We will cover preparation of texts, tokenization, part of speech tagging, and topic modeling. Although there are a variety of NLP packages available in Python we will be using “spaCy”. Prior experience with Python is helpful.
|Topic Modeling in R (Michele Claibourn)||Thursday, 2/16||10:00-11:30||Brown 133|
|Topic modeling is a popular tool for modeling document collections and has been applied in a variety of domains, from medical science to digital humanities. This workshop will introduce topic modeling in R, from processing text for a topic model, estimating topic models, steps for evaluation and interpretation of results, and visualization. Previous experience with R will be immensely helpful but not required.
|Introduction to R Markdown (Clay Ford)||Tuesday, 2/21||10:00-11:30||Brown 133|
|R Markdown is an authoring format that makes it easy to write reports and create presentations with R. You simply combine R code with text written in markdown and then export the results as an html, pdf, or Word file. You don’t need to save individual graphs and insert them into your document. You don’t need to copy and paste calculations. You don’t need to learn a new programming language. With a single keystroke you can generate a professional-looking document that contains all your R code, statistical results, plots and exposition. The best part is that it’s free and easy to do with RStudio. In this workshop we’ll get you up and running with R Markdown and have you creating reports and presentations in no time. Previous experience with R will be helpful but not required.
|Introduction to ShareLaTeX for collaborative LaTeX (Ricky Patterson)||Wednesday, 2/22||10:00-11:30||Brown 133|
|LaTeX is a powerful (and free) document typesetting program, widely used in a number of academic disciplines for compiling professional research papers, articles, dissertations, presentations, letters, and books. It is especially useful for the creation and integration of mathematical formulae, tables and bibliographies into documents. Running an installation of LaTeX on your own computer can make it difficult to work on a document collaboratively. The UVa Library has recently provided access for all UVa users to an on-line collaborative LaTeX editor, ShareLaTeX. Come learn how to take full advantage of this powerful tool.
Participants will need to bring their own laptop for this workshop.
|Introduction to Dedoose (Nancy Kechner)||Thursday, 2/23||10:00-11:30||Library Data Commons@Curry|
|New to Qualitative Research? Imagine being able to blend your video, audio, and text data with your spreadsheet information in an on-line tool to get the most out of all of your information! Dedoose is an easy to learn, feature rich, and affordable web app that can help you visualize a variety of information from your work that you can share with the research community. Come and see Dedoose in action to see if you want to add qualitative analysis to your research toolbox.
|Text Classification in Python (Pete Alonzi)||Monday, 2/27||2:30-4:00||Brown 133|
|Text classification is a broad field covering an array of topics, for example determining if a newspaper article is from the sports page or not. In this workshop we will go through the fundamental steps like tokenization and then proceed to completing tasks such as creating a spam filter. These techniques use machine learning but no prior machine learning knowledge is necessary for this workshop. We will primarily be working with the package NLTK. Fundamental knowledge of python is helpful but not mandatory for this workshop. Please bring your laptop.
|Survival Analysis in Stata (Alex Jakubow)||Wednesday, 3/1||10:00-11:30||Brown 133|
|Social scientific and biomedical researchers are frequently interested in understanding the time to the occurrence of some event—such as the death of a participant in a clinical trial or the end of a governing coalition. This workshop introduces key methodological concepts in survival analysis and contrasts fully-, semi-, and non-parametrized modeling frameworks. Examples from Stata illustrate how to prepare a dataset for survival analysis, interpret regression results, and conduct important diagnostic tests. This workshop assumes participants are comfortable with multivariate regression and familiar with the analysis of limited/categorical dependent variables. Prior experience with Stata is helpful but not required.
|Text Classification in R (Michele Claibourn)||Thursday, 3/2||10:00-11:30||Brown 133|
|Text classification encompasses a variety of models for categorizing documents into existing groups — spam filters are the prototypical example. It is a form of machine learning applied to textual data. This workshop will work through methods of performing text classification in R using multiple examples and packages. Previous experience with R will be immensely helpful but not required.
|Intro to Git/Github (Pete Alonzi)||RESCHEDULED: Tuesday, 3/21||10:00-11:30||Brown 133|
|Git is a program in the class of version control software. Proper use will help you to manage your development. Until recently the software has been a burden to operate but the development of Github.com has changed that. In this workshop we will explore the use of git through the github framework. We will work with the web interface and the desktop client. Please bring your laptops. The use of github requires a user account so please set one up prior to arrival at github.com.
|Advanced Visualization with R (Pete Nagraj)||Thursday, 3/16||10:00-11:30||Brown 133|
|This workshop will cover fundamental concepts for creating effective data visualization and will introduce tools and techniques for visualizing data using R. We will review fundamental concepts for visually displaying quantitative information, such as using series of small multiples, avoiding “chart-junk,” and maximizing the data-ink ratio. After briefly covering data visualization using base R graphics, we will introduce the ggplot2 package for advanced high-dimensional visualization. We will cover the grammar of graphics (geoms, aesthetics, stats, and faceting), and using ggplot2 to create plots layer-by-layer. Upon completing this lesson, learners will be able to use ggplot2 to explore a high-dimensional dataset by faceting and scaling scatter plots in small multiples.