Skip to content

Reliability, Maintainability, Resilience, Availability

The systems engineering body of knowledge notes that reliability, maintenance, and availability attributes are three of the core attributes that affect the utility of a system and its economic life-cycle cost.1 In the context of machine learning products, reliability, maintenance, availability, and resilience specifications are critical to items such as

  • continuous service,
  • drift monitoring & continuous model performance improvement

and the costs thereof. The tasks herein will require the aid of a reliability engineer, or similar. The aim being to

  • Define the reliability, maintenance, availability, and resilience expectations via metrics constraints.
  • Outline in-built designs/solutions that minimise the probability of constraints breaches.