Data Management

Choosing file formats

The file formats you choose to store your data in will depend on the needs of your research project. Some questions to keep in mind when making format decisions should include:

  • What program is needed to open these files? Is it commonly available? Will it limit my ability to access files later?
  • Will this format be accessible to other users? Will I need to convert my data to another format if asked to share it?

If possible opt for standardized open source formats over proprietary formats. Standardized open source formats can be more robust than proprietary formats because they are not reliant on the continuing existence of a single company to be maintained. Some preferred open formats include:

  • Text: ASCII, UTF-8, XML, PDF/A, HTML
  • Images: TIFF, JPEG 2000, PNG, GIF
  • Tabular data: XML, CSV
  • Video: MOV, MPEG, AVI, MXF
  • Sounds: WAVE, AIFF, MP3
  • Statistics: ASCII, DTA, POR, SAS, SAV

For more tips on choosing appropriate file formats see Stanford University Library's Best Practices for File Formats.

Naming your files

How you name your files will have a big impact on your ability to identify and retrieve your files later. Your names should be consistent and descriptive so it's clear what each file contains. Some basic guidelines to follow are:

  • Use file names that indicate what is in the file and what version it is.
  • Include creation dates in a yyyy-mm-dd format. This ensures your files will be sorted chronologically.
  • Avoid using spaces or special characters (ex. *%$) in your files names. Instead use the underscore (_) as a delimiter. Some software will not recognize spaces or special characters in file names.
  • Keep names as short as possible. This makes files easier to browse at a glance.

See this article Folder and File Naming Convention – 10 Rules for Best Practice for more tips on creating file names.