On-Call
Balance Interrupt Work and Project work
-
Be project driven, not interrupt-driven
- The on-call makes sure that the issue is fixed
- Let everyone’s else work to be uninterrupted
- If nobody is oncall it means that everyone is oncall.
- The rotation schedule prevents burnout.
Dealing With Page Overload
Reduce the amount of the bellow issues
- Production
- exited bugs
- introduction of new bugs
- the speed which newly introduced bugs are identified
- the speed which bugs are mitigated and removed from production
- Alerting
- alerting thresholds that trigger a paging alert
- introduction of new paging alerts
- Human processes
- the rigor of fixes and follow-up on bugs
- the quality of data collected about paging alerts
- the attention paid to pager load trends
- human-actuated changes to production
Reducing the page overload
- Cleanup noisy alerts
- Implement serverity levels
- Automate what can be automated.