Humans are essential to the development and reliability of our technical systems, but we often forget them. This talk will share the benefits of prioritizing human systems and how to get there. We spend all day thinking about our technical systems, but we often neglect the needs of our human systems. Ana and Julie will walk attendees through the principles of system reliability and how to not...
Over the years a lot of research has been conducted and many books have been written on how to improve the resilience of our software. This talk will dive deep into the three keep practices identified by the authors of Accelerate to improve reliability: Chaos Engineering, GameDays, and Disaster Recovery. We will discuss the key measures of tempo and stability, and how practicing Chaos...
Customer experience is the responsibility of the entire team. Many organizations leave reliability up to the SRE team, however reliability should be built in from the very beginning. In this talk Julie and Mandi will discuss what Service Levels Objectives are, why they are important to the organization, and how to define and set them. Going beyond SLOs, attendees will learn what Chaos...
So you’ve had an incident. Restoring service is just the first step—your team should also be prepared to learn from incidents and outages. In this talk you will learn some best practices around postmortems/post incident reviews to help your team and organization see incidents as a learning opportunity and not just a disruption in service. In this workshop, attendees will: * Get an overview of...
Priority access to all content
Video hallway track
Community chat
Exclusive promotions and giveaways