Don’t take your data at face value. That is the key message of this tutorial which focuses on how scholars can diagnose and act upon the accuracy of data. In this lesson, you will learn the principles and practice of data cleaning, as well as how OpenRefine can be used to perform four essential tasks that will help you to clean your data:
1. Remove duplicate records
2. Separate multiple values contained in the same field
3. Analyse the distribution of values throughout a data set
4. Group together different representations of the same reality
These steps are illustrated with the help of a series of exercises based on a collection of metadata from the Powerhouse museum, demonstrating how (semi-)automated methods can help you correct the errors in your data.
- Subject:
- Applied Science
- Computer Science
- Material Type:
- Diagram/Illustration
- Provider:
- Center for History and New Media
- Author:
- Ruben Verborgh and Max De Wilde
- Seth van Hooland
- Date Added:
- 06/16/2015