How do you reconcile the ideals of blamelessness with a demand for blame? When is accountability actually required? We'll navigate these challenges by explaining: How to empathize with blameful people - we'll look at how their goals align with yours, even if their methods are archaic How to skilfully respond to a demand for blame - blameful peoples' goals can be achieved blamelessly -...
Most engineers respond to messages or emails from an SRE or security engineer with disdain. They often see the work of these teams as another hurdle to getting code out the door and a tax on their productivity. We know they’re wrong. We need to spread the SRE mindset and approach to all engineering teams and pivot their thinking towards “How can I build a solution that is resilient, secure, and...
In this session, Uma Mukkara will talk about how Site Reliability Engineers can use Chaos Engineering to do continuous validation of Service-level Objectives and thereby improve the resilience of the systems they are operating on.
Incident response is overwhelming. So where do you start? There's a lot of advice out there, but it's mostly theories that aren't taking reality into account. So how do you get a process in place that actually works and scales? In this session, FireHydrant CEO and Co-Founder, Robert Ross, will share stories (good and bad) from his experience as an SRE and what 5 pragmatic tips he’s learned...
Service level indicators are quantitative measures of a service, which in turn, are measured by SLOs. This is not the talk you think it is. As Engineers, we have our own SLIs, which are Survival Level Indicators, that measure and define if we are okay or not okay at a job. What happens when the rockstar engineer, who performs essential task A and B, hasn't taken vacation in 9 months? Over time,...
In the “good old days”, Ops/IT teams were responsible for handling issues when applications crash. In the world of microservices, however, developers are required to take on a bigger role in identifying and fixing issues in production, sometimes without having proper tools, privileges or even training. So how can we empower developers to troubleshoot efficiently and independently? Join us as...
Today's software developers, DevOps teams, SRE's, and SysAdmins are familiar with the concept of public-key cryptography for gaining access to remote resources but using public and private keys has its limitations and can be difficult to scale and manage. Consider leveraging short-term certificates for SSH access over keys and rest easy! This talk will go over the pros and cons of using keys vs...
The transition into more complex systems is accelerating, and chaos engineering has proved to be a great-to-have option in our toolbox to handle this complexity. But the speed at which we're developing and deploying makes it hard to keep up through manual chaos experiments, so we turn to automation. In this session, we'll look at how automated chaos experiments help us cover a more extensive...
Priority access to all content
Video hallway track
Community chat
Exclusive promotions and giveaways