notes

Managing Risks

Risks definitions

  1. Define all the SLO risks. The sources:
    • failures in the past could be a good source of data
    • analyzing all the possible hazards.
  2. Prioritize risk by the following equation:
     Risk = probability * impact
    
    • probability => time to failure
    • impact => service downtime
  3. With the above convention we could measure risk as Bad Minutes / Year

Reducing risks

  1. Perform education
  2. Perform “wheel of misfortune”
  3. Train the staff by injecting real bug in production system.