This blog post was written by the Monitoring and Evaluation Technical Assistance (META) Project and is included as an archived post on the Switchboard blog.
Too many of us have experienced a common M&E frustration: puzzling over an Excel spreadsheet that may contain all of the information we need, but that takes multiple time-consuming steps to clean and analyze. Unclear column names, multiple indicators mixed in a single row, inconsistent formatting across sheets…there are many reasons we may end up devoting far more time to “cleaning” data than we do on the analysis itself. And of course, if the data can’t be easily analyzed and used, there is little point to collecting it at all!
Having tidy data from the start can free up time down the line for what’s really important: learning from your data and using it to improve programs.
Statistician Hadley Wickam defines tidying data as “structuring datasets to facilitate analysis.” His 2014 paper Tidy Data discusses the concept in detail and MEASURE Evaluation’s Tidy Data and How to Get It offers a great summary. Tidy data has a few key components:
- Use organized columns, rows and sheets. For example, ensure that each variable forms a single column, such as “client name,” “date of birth,” “status,” “service received,” etc.
- Use consistent spelling and formatting: don’t mix numbers and text in the same column, and if you are using text, ensure consistent spelling through drop-down menus with controlled list choices.
- Put summary tables and data dashboards (the automatically updating charts that allow you to visualize your data over time) on separate sheets from their raw source data.
Overall, tidy data prioritizes the data’s ultimate uses, aiming for clarity and effective analysis. Untidy data, on the other hand, may result from using unclear column names, formatting your data inconsistently, mixing multiple variables in the same column, or spreading a single data set across multiple sheets, just to name a few. Data may start out untidy, especially if reporting systems are not designed with analysis in mind, or it may become untidy over time.
What might tidy data look like in the context of resettlement programs?
This series of images shows a sample Excel workbook for a fictional immigration legal services program. The first is a sheet detailing services data. Note that the naming conventions and formatting are consistent, and that no summary tables are present on this sheet. The data under the “Activity” and “Detail” columns is selected using dropdown menus with controlled list choices, preventing the analysis issues that can result from one user typing “N400,” for example, while others type “N-400 Application.”
This next image shows what a summary table might look like for this data. The totals under the “Yr1 Actual” and “Yr2 Actual” columns automatically populate using formulas that draw from the raw data on the Services Detail Sheet.
Finally, this image shows a Dashboard sheet. These charts are linked to raw data on a separate sheet that automatically updates through formulas, allowing you to visually monitor your program progress over time. The examples here are only a few of the things this immigration program could track—another might be progress of actuals against targets, to visually depict the information in the summary table above.