Organizing Files: File Naming and Version Control

What should I focus on when organizing data?

There are some fundamental decisions that you need to make when you start your research, and data organization should be within this set. The choices that you make will vary based on type of research that you do, but everyone must address the same issues.

  • File Version Control
  • Directory Structure/File Naming Conventions
  • File Naming Conventions for Specific Disciplines
  • File Structure
  • Use Same Structure for Backups

How should I name my files?

File Naming is often taken for granted.  Best practice is that the names are descriptive – they reflect the content of the file.  Be consistent – use the same format for all of the files in a project, including data set files and zip or tar files. Some suggested attributes to include:

  • unique identifier or project name/acronym
  • PI
  • location/spatial coordinates
  • year of study
  • data type
  • version number
  • file type

Use no more than 32 characters.  Use only numbers, letters, and underscores.  Do not use special characters, dashes, spaces, or multiple dots or stops. Avoid using common terms (‘data’, ‘sample’, ‘final’, or ‘revision’).  Use consistent case – all lower case, or all UPPER CASE, or Lower case. Dates should be in a standard format – YYYYMMDD, which will allow them to sort chronologically.

Sequential numbering should allow for growth, and include leading zeros.  Do you have 100 files?  Numbering should run from 001 to 100.

Already have a lot of data collected and want or need to rename the files?  Consider using one of these tools:

How should I keep track of changes?

Version Control is the way to track revisions of a data set, or a process.  If your research involves more than one person, it is essential.  You will want to record every change to a file, no matter how small.  Keep track of the changes to a file in your file naming convention and log files, or version control software.  File sharing software can also be used to track versions.

You can do it manually by including a version control indicator in the file name, such as v01, v02, v1.4.  The standard convention is to use whole numbers for major revisions, and decimals for minor ones.

There are several software programs that are designed for managing versions tracking. Mercurial, TortiseSVN, Apache Subversion, Git, and SmartSVN.

File sharing software can also be used to track versions.  UVaBox has options to track both major and minor versions of files. Google Docs records version changes as well.

As you think through how to manage this step, keep the following issues in mind.

  • record every change to a file, no matter how small
  • keep track of changes to files
  • use file naming conventions
  • headers inside the file
  • log files
  • version control software (SVN, Git, Subversion)
  • file sharing software (UVaBox, Google Docs)