Skip to content

Motivation

The motivation, or deliberations, that led to the collection and storage of a data set can impact its appropriateness for model development. For example, the outcome of a deliberation might have been to:

  • Focus on, collect, and store, the data of a particular strata of a population.
  • Retain aggregated records only.

The former will limit the applicability of any models that depend on the data set, using the latter is ill-advised due to lost information and ecological fallacy risks. Additionally, it is important to beware of funding/sponsorship bias.1, 2, 3

Hence

  1. Why was the data set created?
    • Was there a specific task in mind?
    • Was there a specific gap that needed to be filled?
  2. Who created the data set?
    • In brief, which team, organisation, research group, etc., created the dataset? Was it created for another entity?
  3. Who funded the creation of the dataset?