Reliability, Maintainability, Resilience, Availability¶
The systems engineering body of knowledge notes that reliability, maintenance, and availability attributes are three of the core attributes that affect the utility of a system and its economic life-cycle cost.1 In the context of machine learning products, reliability, maintenance, availability, and resilience specifications are critical to items such as
- continuous service,
- drift monitoring & continuous model performance improvement
and the costs thereof. The tasks herein will require the aid of a reliability engineer, or similar. The aim being to
- Define the reliability, maintenance, availability, and resilience expectations via metrics constraints.
- Outline in-built designs/solutions that minimise the probability of constraints breaches.