notes
Managing Risks
Risks definitions
Define all the SLO risks. The sources:
failures in the past could be a good source of data
analyzing all the possible hazards.
Prioritize risk by the following equation:
Risk = probability * impact
probability => time to failure
impact => service downtime
With the above convention we could measure risk as
Bad Minutes / Year
Reducing risks
Perform education
Perform “wheel of misfortune”
Train the staff by injecting real bug in production system.