Reliability¶
In the systems engineering body of knowledge the basic definition of reliability is
“ … the probability of a product performing its intended function under stated conditions without failure for a given time period”.
In the context of machine learning systems, the reliability specification of each intended function, i.e., each functional requirement, must detail/specify the
- Function in question, and its operating environment.
- Time scales; for evaluating the reliability metrics.
- Definition of failure; for evaluating the reliability metrics, and for in-built failure mitigation considerations.
- Reliability metrics
This aids (a) the design of possible in-built solutions that mitigate factors that affect reliability, and (b) system monitoring strategy, and the operations thereof. Study the reliability, maintainability, and availability page of the systems engineering body of knowledge for the expectations herein.