We'll use an example application to describe how to define SLIs and SLOs, including an overview of architecture, a how-to for developing SLOs, and suggestions for implementing SLOs. We-ll also focus on how to identify CUJs and recommendations for implementing metrics to use as SLI and SLO targets.
Writing postmortems after incidents and outages is an essential part of Google's SRE culture. They are blameless, widely shared internally, and allow us as an organization to maximize the insights from failures. We touch on how postmortems are written and used at Google, as well as how they can help in making decisions and driving improved reliability. We also show how you can get started with...
Priority access to all content
Video hallway track
Community chat
Exclusive promotions and giveaways