Motivation¶
The motivation, or deliberations, that led to the collection and storage of a data set can impact its appropriateness for model development. For example, the outcome of a deliberation might have been to:
- Focus on, collect, and store, the data of a particular strata of a population.
- Retain aggregated records only.
The former will limit the applicability of any models that depend on the data set, using the latter is ill-advised due to lost information and ecological fallacy risks. Additionally, it is important to beware of funding/sponsorship bias.1, 2, 3
Hence
- Why was the data set created?
- Was there a specific task in mind?
- Was there a specific gap that needed to be filled?
- Who created the data set?
- In brief, which team, organisation, research group, etc., created the dataset? Was it created for another entity?
- Who funded the creation of the dataset?
-
External Validity, Research Methods Knowledge Base ↩
-
What is epistemically wrong with research affected by sponsorship bias? The evidential account. ↩