Stata Basics: Data Import, Use and Export

In Stata, the very first step of analyzing a dataset, or do anything about it, should be opening the dataset in Stata so that it knows which file you are going to work with. Yes, you can simply double click on a Stata data file ends in .dta to open it, or you can do something fancier to achieve the same goal – like write some codes. Okay, there is at least one more reason than being fancier that makes me prefer to write syntax than clicking through things in Stata – I like to have everything I did recorded so that I can easily reproduce the same work or use the scripts again when working on similar tasks next time. In this post, I introduce ways of reading in, using and saving Stata and other formats of data files.

-sysuse-: reading in datasets come with Stata

Several example datasets are installed with Stata. This command reads in one of them, census.dta, to memory. You should be able to see the data in your Stata Data Browser after running this following line.

> sysuse census.dta
(1980 Census data by state)
 

-describe-: the information of the dataset in memory

> describe 

Contains data from /Applications/Stata/ado/base/c/census.dta
  obs:            50                          1980 Census data by state
 vars:            13                          6 Apr 2014 15:43
 size:         2,900                          
-------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
-------------------------------------------------------------------------------------
state           str14   %-14s                 State
state2          str2    %-2s                  Two-letter state abbreviation
region          int     %-8.0g     cenreg     Census region
pop             long    %12.0gc               Population
poplt5          long    %12.0gc               Pop, < 5 year
pop5_17         long    %12.0gc               Pop, 5 to 17 years
pop18p          long    %12.0gc               Pop, 18 and older
pop65p          long    %12.0gc               Pop, 65 and older
popurban        long    %12.0gc               Urban population
medage          float   %9.2f                 Median age
death           long    %12.0gc               Number of deaths
marriage        long    %12.0gc               Number of marriages
divorce         long    %12.0gc               Number of divorces
-------------------------------------------------------------------------------------
Sorted by: 
 

Tip: run “set more off” to tell Stata to pause for -more- messages

-summarize-: summary statistics

> summarize

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
       state |          0
      state2 |          0
      region |         50        2.66    1.061574          1          4
         pop |         50     4518149     4715038     401851   2.37e+07
      poplt5 |         50    326277.8    331585.1      35998    1708400
-------------+---------------------------------------------------------
     pop5_17 |         50    945951.6    959372.8      91796    4680558
      pop18p |         50     3245920     3430531     271106   1.73e+07
      pop65p |         50    509502.8    538932.4      11547    2414250
    popurban |         50     3328253     4090178     172735   2.16e+07
      medage |         50       29.54    1.693445       24.2       34.7
-------------+---------------------------------------------------------
       death |         50    39474.26    41742.35       1604     186428
    marriage |         50     47701.4    45130.42       4437     210864
     divorce |         50    23679.44    25094.01       2142     133541

-clear-: wipe out the data in memory

> clear
 

-use-: read in Stata datasets

Most of the time, we use datasets that are either stored on our machine or on the web. Simply use this -use- command to read in the data file to memory.

* read in data files on the web
> use http://www.stata-press.com/data/r14/apple.dta
(Apple trees)

> describe 

Contains data from http://www.stata-press.com/data/r14/apple.dta
  obs:            10                          Apple trees
 vars:             2                          16 Jan 2014 11:23
 size:           100                          
-------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
-------------------------------------------------------------------------
treatment       int     %8.0g                 Fertilizer
weight          double  %10.0g                Average weight in grams
-------------------------------------------------------------------------
Sorted by: 

-cd-: change directory

Now let’s save this dataset on the web to your machine. You can use the -cd- command to tell Stata where to save this file.

* see the current directory
> pwd
/Users/Username/Desktop/StataBasics

* Change directory (plug in the path on your machine)
> cd YOUR PATH

* Your directory/path may look like this -
* Stata for Windows: 
* cd C:Users\username\data
* Stata for Mac: 
* cd /Users/username/data

-save-: save files

> save apple
file apple.dta saved

* use the replace option to overwrite an existing file 
> save apple, replace
file apple.dta saved

-dir-: display file names

* see what's in your working directory
> dir
* you should see apple.dta listed in your directory

-insheet- and -outsheet-: import and export .csv files

Oftentimes we work with Stata and other softwares for a same project, in that case we need to import data files that are not in a Stata format or export Stata data files to other formats. Here is an example of how to save datasets as .csv files and read them into Stata.

* -outsheet-: save as .csv files
> outsheet using apple.csv, comma 

* -insheet-: read in .csv files
> insheet using "apple.csv", clear
(2 vars, 10 obs)

 

Yun Tai
CLIR Postdoctoral Fellow
University of Virginia Library