StatLab Articles

Working with dates and time in R using the lubridate package

Sometimes we have data with dates and/or times that we want to manipulate or summarize. A common example in the health sciences is time-in-study. A subject may enter a study on Feb 12, 2008 and exit on November 4, 2009. How many days was the person in the study? (Don’t forget 2008 was a leap […]

The Wilcoxon Rank Sum Test

The Wilcoxon Rank Sum Test is often described as the non-parametric version of the two-sample t-test. You sometimes see it in analysis flowcharts after a question such as “is your data normal?” A “no” branch off this question will recommend a Wilcoxon test if you’re comparing two groups of continuous measures. So what is this […]

Pairwise comparisons of proportions

Pairwise comparison means comparing all pairs of something. If I have three items A, B and C, that means comparing A to B, A to C, and B to C. Given n items, I can determine the number of possible pairs using the binomial coefficient: $$ \frac{n!}{2!(n – 2)!} = \binom {n}{2}$$ Using the R […]

Stata Basics: foreach and forvalues

There are times we need to do some repetitive tasks in the process of data preparation, analysis or presentation, for instance, computing a set of variables in a same manner, rename or create a series of variables, or repetitively recode values of a number of variables. In this post, I show a few of simple […]

Stata Basics: Reshape Data

In this post, I use a few examples to illustrate the two common data forms: wide form and long form, and how to convert datasets between the two forms – here we call it “reshape” data. Reshaping often needed when you work with datasets that contain variables with some kinds of sequences, say, time-series data. […]

Stata Basics: Combine Data (Append and Merge)

When I first started working with data, which was in a statistics class, we mostly used clean and completed dataset as examples. Later on, I realize it’s not always the case when doing research or data analysis for other purposes; in reality, we often need to put two or more dataset together to be able […]

Stata Basics: Subset Data

Sometimes only parts of a dataset mean something to you. In this post, we show you how to subset a dataset in Stata, by variables or by observations. We use the census.dta dataset installed with Stata as the sample data. Subset by variables * Load the data > sysuse census.dta (1980 Census data by state) […]

Stata Basics: Create, Recode and Label Variables

This post demonstrates how to create new variables, recode existing variables and label variables and values of variables. We use variables of the census.dta data come with Stata as examples. -generate-: create variables Here we use the -generate- command to create a new variable representing population younger than 18 years old. We do so by […]

Stata Basics: Data Import, Use and Export

In Stata, the very first step of analyzing a dataset, or do anything about it, should be opening the dataset in Stata so that it knows which file you are going to work with. Yes, you can simply double click on a Stata data file ends in .dta to open it, or you can do […]

Using Data.gov APIs in R

Data.gov catalogs government data and makes them available on the web; you can find data in a variety of topics such as agriculture, business, climate, education, energy, finance, public safty and many more. It is a good start point for finding data if you don’t already know which particular data source to begin your search, […]