Your service is unavailable but it's not your fault! Your vendor is having an incident that is impacting your products. That's important information, but your customers don't really care. What can your team do before and during vendor incidents to better manage customer expectations?
Toil is a four-letter word. No one likes it, but it has to get done. You create some scripts for repetitive tasks for your team. That’s just the first step for tackling toil and reducing interruptions. Provide everyone in your organization with access to your expertise in a safe, auditable way with Rundeck
How do you plan for unplanned incidents? You practice with Chaos Engineering. Strong incident response doesn't just happen, you have to build the skills and train your team. Practicing for major incidents gives your team insight into how your applications will behave when something goes wrong as well as how the team will interact to solve problems. Combining your Incident Response practices...
Customer experience is the responsibility of the entire team. Many organizations leave reliability up to the SRE team, however reliability should be built in from the very beginning. In this talk Julie and Mandi will discuss what Service Levels Objectives are, why they are important to the organization, and how to define and set them. Going beyond SLOs, attendees will learn what Chaos...
In the course of your day as an SRE, your knowledge and expertise are in high demand. You can’t do every task every person in your org needs from you without the help of comprehensive automation. Automation can be tricky. Some systems aren’t built with automation in mind, but assume that a human being will be there to keep an eye on things and fix errors on the fly, and we can’t be everywhere...
Priority access to all content
Video hallway track
Community chat
Exclusive promotions and giveaways