Transcript
This transcript was autogenerated. To make changes, submit a PR.
Incidents welcome to incidents,
the customer empathy workshop you never wanted
I'm Ryan McDonald, responder advocate for firehydrant. I am
in the delightful position of getting to chat with fire hydrant's customers and
the broader community about all aspects of incident management and
response. It's total hoot. So thanks to both fire
hydrant and to conf 42 opportunity to be here.
That's nice, I imagine you think, but how does someone get such a silly
job? Incidents have been a persistent part of my
professional life. Regardless of the domain, they seem to
follow me. Consequently, I ended up falling in
love with them. I started my professional life as an outward bound
instructor, leading mountaineering, rafting and climbing
expeditions in the western United States for anywhere from
22 to 30 days. And as you can imagine,
we had incidents. Thankfully, they were infrequent,
but when we did have them, they tended to be pretty large affairs, life and
limb emergencies, people being lost, search and rescue skills
being required. So kind of a big deal.
It was great. I loved that job and I loved that role.
A quick joke. What's the difference between an outward bound instructor
and a large domino's pizza?
A large domino's pizza can feed a family.
Consequently, I caved to capitalistic urges and
ended up getting into tech. From my first
experience in a software outage as an intern,
I remember being both intrigued and,
I must confess, a little bit confused and entertained.
A deploy was botched and people are running around
trying to roll it back.
Meanwhile, I'm off on the side saying, so the website
is down, but nobody's dying, no one's
bleeding out, right? Transitioning from the
notion of lost teenagers in the wood compound fractures or
helicopter evacs to this world was something that seemed
fascinating, both lowering the stakes and increasing
the complexity and sort of the interest there. This love of
interest continued as my time as an engineer in various
process roles, and formally or informally,
I ended up involved or responsible for the incidents management
training programs or other aspects of incident process.
I finally doubled down on incidents and ended up
as a founding member of the incident command group
at Twilio. My time at Twilio was amazing and
stressful and high impact and great, and over the
course of that, I had the need for an incident management
tool, and I was lucky enough to meet the folks here at
Firehydrant and as a lot of you know,
the time and place that you need an incident management tool oftentimes is
pretty stressful. So I was at Twilio for a while and
then eventually left. Took a little bit of time off and
fire hydrant reached out and asked if I would be interested in a customer
facing role, which is not something that I had really considered that
was not on my career roadmap or bingo card.
But working in customer success as a customer success engineer
directly with customers has been an enlightening process.
I've had the privilege and the opportunity to dive in with
a ton of different organizations and learn so, so much
with that. I'm excited to share some of that learning with you
all. So we'll go ahead and dive in.
We'll start with a story that might seem familiar
to some of you.
So your day begins and
you wake up. It's feeling like one of those above average kind of
days, right? Coffee is just like hitting you just right.
Everything seems to be lining up.
There's a strong possibility of this being like
a serious flow state kind of thing. Like real destroyed
the to do list kind of scenario.
So sit down, crack open your ide,
and get into your first task.
Not too much longer. Later, your phone buzzes.
Pagerduty informs you that your product has
other plans for your day. Remind yourself
it's can above average day. Things are going well. How bad
can this be? Crack your knuckles,
jump in and start to jump through the hoops that are required
to verify what exactly is happening with
your systems. You also promise yourself that you'll actually
check into those transient alerts from two weeks ago. But lo
and behold, after a little bit of digging, you realize this is a live
one. This is not only blowing up your pager, it's also impacting
your customers. Remembering things are
good. It's a good day, things are lined up. You go ahead and kick off
the incident management process and start investigating the
issue. You're getting real serious.
Out of nowhere, some rando from customer support comes literally
screaming into your chat. And you've probably
heard these questions before, right? What's the
impact like? Just getting unnecessarily
deep into something that you've only just started on
and at this point your chill is destroyed.
Your day feels like it's trending towards dumpster fire. What started
with such promise is proving to be just another on
call shift that you're not going to be enthused about.
Thin and seen. If this sounds familiar,
take a moment. Please enjoy this relaxing gIf.
Many of us with time and seat will hear this story and think
about process changes first, right? An industrial
strength racy right to clarify responsibilities.
Maybe the introduction of a new role, some kind of incident commander
type to wrangle stakeholders or serve as a firewall.
While one would be justified to consider these
kinds of changes, I would like to propose
an alternate take. I would argue that the above
story is at least partly due to a deficit
in customer empathy.
And if you're not familiar with the term, this is the action of understanding,
being aware of, being sensitive to, and vicariously experiencing
the feelings of your customers. So that's
kind of a mouthful. And it does include words like feelings which
obviously feel like they fall into that kind of nebulous,
squishy side of things in our land. And why
am I talking about this? In the course of incident management,
shouldn't we be digging into technical solutions to high stigs
ARR compromising incidents? What can
customer empathy possibly bring to the table when we start
to put that into focus? In the course of our incident management
process? I would
argue quite a few things. I think there's a handful of
outcomes that we can land. So when we begin to focus on customer empathy,
I've observed the following outcomes, including improving
your customers experience during an incident, better working
relationships with other responder organizations,
and then gaining a deeper learning right from those incidents,
and better processes and product improvements from your incidents.
So I'd like to share with you all some concrete actions that have helped me
drive these outcomes in the organizations that I've been part of.
While none of these are mind bending,
I do find that when framed through the lens of customer empathy, we'll end
up landing in a few places that might not be typical or intuitive.
When you're thinking about learning from an incident,
let's dig in. So first,
missed expectations are the basis for basically
every relationship issue that's ever been. So naturally,
it shouldn't be any different with our customers or with incidents.
The first thing that we can do is we can get more exposure
to the product. So take some time and actually
use your product and encourage that others use the
product as the customers might. This doesn't have
to be a deep dive. You don't need to become a power user of your
product, but just digging in and trying to get a sense for the basic
workflows of your product can add a ton of value.
And we'll get into this more later as to what that value actually
is. But if you don't have the option to use the product
and or there are some other alternative
routes to get that kind of experience of the day to
day usage of the tool, you can ride along with
folks in the customer facing orgs, right? So whether that's customer
support, customer success user research, there are a ton
of options. All of those folks are going to have a sense
for how your product is being used. Common use
cases, and then also common friction points.
Whatever avenue it is that you choose to take this,
it really is all about just gaining that fluency in
your tool. This can be invaluable when it
comes time to serve as that translation layer
from whomever it is is reporting that issues are taking place
to trying to figure out what's actually happening.
It can also really help during an incident when
you're trying to more acutely describe what
that impact is to a broader audience. So collecting
and displaying customer impact prominently during can incidents
is the next thing that I found adds a ton of value. And this
is one of those things that it's nice to say, but it can be very
difficult to prioritize in the course of an incident. And I
found that trying to bifurcate efforts
initially. Right. So the first person is digging into what maybe the
issue looks like from the back end, and the other is trying to clarify what
that experience would be like if that were down for
customers. Building space into your incident response
processes to collect that customer impact and then display it can also
take a huge load off of responders throughout the
course of an incident. Stakeholders and other responders,
oftentimes this is the first question that they ask. Right. So that's
sort of the no brainer side of things. But there is also this
idea that if we keep this prominently placed in
front of everyone, that it can help frame the experience.
So as we add more people to the incident, the fact
that customers are experiencing pain is front and center.
This isn't a technical exercise. This isn't some kind
of logical problem. The people who pay
us are experiencing issues, and here's what those are
like. And I think centering and grounding responders in that
idea can help make the experience more relatable.
It also can help increase the urgency. Right. It can be easy
for these sometimes to just turn into long sojourns,
right. Trying to understand a technical problem instead of
thinking about mitigation. Right. Which is our end goal.
Oftentimes in incidents response is not to resolve the issue
or even understand it, but just to stop the bleeding, stop the
pain for our customers.
One other thing that we can do to help us understand customer expectations is
to consider something like the field of chaos engineering,
right. And what kind of value we can bring from looking at our
application when we inject faults, when things are going
poorly. I'll even go so far, sometimes as to encourage people,
if you have downtime during an incident, to use your application in a
degraded state. Other folks can
speak more deeply to implementing programs like this.
But really using your tool when it's not at its best
is a great way to help build and develop customer empathy.
So by understanding our customers expectations and
what their experience is like, it can help us drive urgency and
ensure that we stay focused, making incidents potentially suck
less for customers. And obviously
mitigation, like trying to drive that notion of mitigation
is a part of that. But really, when you get down
to the dollars and cents of it, this can directly impact business
metrics. Net retention and ARR of
customers is something that can suffer if
customers don't feel like you understand what they're going through on
those bad days and are responding accordingly. Right?
Upsells can depend upon that sentiment. And really,
at the end of the day, as a customer, it's hard to stick
with an organization that you don't like. Right? And feeling
like your responders
understand what's happening is a great way to ensure
that customers are feeling heard and understood and that all of
those comms really can land for them.
Awesome. So our next point is to not just collaborate,
but partner with your support organization.
And to start with, customer support is hard.
In the story earlier, I wanted customer support as a boogeyman
of sorts. They emerged out of the ether to pull us from that delightful
flow state, and that's not an entirely fair characterization.
Customer support is exceptionally difficult. Before we dive
in, I've got a hard hitting analysis, the kind
that you come to these conferences for. Right.
I have captured an actual request from a customer facing
team during the early stages of an incident.
Get ready.
All right. Okay, so obviously,
the initial take here, this is a little heavy handed, right?
It's a little aggressive, right? Thankfully, with the
power of technology, we can break this down into its component parts.
There's nuance here if we really get into the details.
If we look at this frame by frame, what initially comes across as anger
or maybe even aggression, we realize quickly devolves
into desperation. The fact of
the matter is that customer support, customer success,
or other account executives are under immense pressure during
incidents. They are the middleman and
a bum deal where they have very little to no control and
have to serve as a sponge for customer angst.
So let's dig in a little bit here and see what we can do
to make some of these interactions a little bit more productive.
First, by proactively
building rapport with support, you can
help smooth over a lot of these things. If you remember earlier, I encouraged the
idea of ride alongs with customer support. Surprise.
Little do you know those are rapport building activities, right?
Digging in and getting interested about their domain can
go a long way towards helping to build that initial relationship.
So not only that, but there's a lot of informal
opportunities, engaging with the support organization during larger events,
inviting them to happy hours, off sites if possible, all of those kinds
of things. Another way that you can think about
how to build sort of that reciprocal vibe with
support is that you can figure, but ways
to leverage them inside of your company, outside of their
support job. So for example, are there
ways that support could be leveraged for internal or even customer
facing trainings? An example would be support
at Sungrid ran our product orientation.
All new employees were required to go through this set
of organizations where these people would do a deep dive
into how the app and why the app is doing
what it's doing. And by doing that early
and setting up support as kind of a tent pole inside
of the organization that really deeply understood not only
the tech, but the customer experience,
it just gave them this opportunity to be more
available to everyone else. So we were able to build a ton of rapport
by putting those folks in a situation where they could flex their skills
just in a different context. And then lastly,
I can't speak for everyone in support, but support is
a great way to get into tech with fairly
limited background in technology.
Consequently, a lot of these folks are excited and hungry
for mentors. Those of us that have been in the field for a while can
provide an amazing resource. Just sitting down and grabbing lunch with people,
right? Chatting about what their goals are, where they want to go
with their career. So anywhere in that
spectrum really are all great options for
building rapport with support.
So once you've taken the time to get to know support coworkers,
the next thing that you can do, especially in the context of an incidents,
is to create really clean interfaces and clear expectations
for communications with support,
including support and stakeholders.
So by setting expectations in your process for when you
will try to have descriptions of the impact to customers,
you can help avoid those frustrated confrontations
like we've been talking about. Support's primary goal is
to manage the expectation of customers, and input from engineers
is a huge currency in that, right? And helping them really describe
and empathize with customers and the experience that they're having.
So the other thing too to consider is once you
set those expectations, sometimes they're hard to meet,
right? So don't beat yourself up over it, but instead
communicate, right? There's nothing worse than having a vacuum in
the middle of an incidents. And so don't
put support folks or customer facing groups in the position of having to
come bother you or dig into a space where you're
doing some kind of technical investigation. Right. Communicate as
much. Let them know like, hey, we don't have something yet, but we will,
or we'll hope to in 30 minutes, and we'll check in
sooner if we've got something. Another added benefit
of these types of regular communication
cadences is that the broader stakeholder group
can feel supported as well. So execs
love feeling like they're in the loop, and obviously the
careful dance is to keep them in the loop, but just further enough
away. And these regular communication cadences can help them build confidence
and avoid dropping into the middle of some
kind of deep technical issue.
All right, and then last here, when it comes
to partnering with your support organization, and I alluded
to this earlier a little bit, but try and let support
behind the curtain more during incidents, I think it's easy to
think about support as a stakeholder, right, or someone who
simply needs information.
Consequently, they can end up feeling like they're playing second fiddle to
actual, like the responding engineers or other responders.
So by bringing support closer into the fold
and engaging them in the process, there can be a bunch of win wins.
The only caveat here that I think is worth calling out is oftentimes
the models of measurement of productivity
and support can be pretty substantially different than in engineering.
And oftentimes that looks like something like how many
tickets are being passed through your queue in a
given period of time. So I think advocating both with
and for support, once you understand what that model looks like,
to ensure that they have the freedom and the flexibility to
engage strictly in can incident and not be distracted
by other work. So really it's advocating for
them inside of your organization to have them brought in and
describing how valuable and how useful they can be.
So once you've done that and you've helped them achieve
that level of focus so that they can show up as their
best selves during that incident,
what can you do? How does support actually fit into the
response? Right. And I think there's a number
of different ways, not only handling tickets that are coming
in and ensuring that the correct organizations are going
out to those folks, those impacted customers,
but support can also provide additional evidence through
direct testing. These are folks that are an expert at using
the app from a customer's perspective. So by bringing them in,
you can have them test different cases, right? Which adds data
and adds information to sort of your quiver of information that you
can draw on throughout the course of an incident.
Additionally, they can pull in additional information from
the impacts from customers. So direct customer reports sometimes
can help you avoid red herrings and focus your investigation
efforts. Lastly, support can also
help in testing. So as you begin to roll changes out to mitigate
issues, getting them in there in lower environments
and having them play or behind feature flags. However, it is that you all have
things set up, they can be a person to go out and verify,
right? Which can take some energy and some of
the attention off of your responding engineers plates.
There is actually a bonus outcome from this increased interaction
and increased rapport that's being built with support.
They can actually before a secret hiring bench if you work
in a high growth company, there's no end to that company's
appetite for a whole variety of roles,
including program project, product managers,
incident commanders, customer success managers,
even potentially engineering, right? Like more junior engineers,
a sufficiently senior support person can bring
so much domain and organizational knowledge to roles to
like these that no outside hire could ever dream of
supplying. I have so many friends that have started in support
and are now adding a truly amount of mind blowing value in
an organization that they came up in and grew up in.
So intentionally nurturing your support folks creates
a pool of candidates that can lead to your next great hire.
So by partnering with your support organization,
it'll feel so much better when you meet those folks
on bad days, right? In those incidents.
So everyone knows, right?
The worst time to meet someone is in a crisis situation.
So figuring out the nuances of how your teams work together and what
does and doesn't work, push that. Try and start that stuff
before you're in the middle of an issue. The other
added benefit here is when folks feel like they're collaborating
and working as a team, you can have far more aha moments.
The number of times, for example, at Sendgrid, where our deeply engaged support
team would come to us with findings that helped
our investigation accelerate their efforts.
It happened frequently, so give yourself the
opportunity to take advantage of these collaborative aha
moments of having a well oiled team instead of an
adversarial group.
And the last point here,
broaden your incidents retrospectives so retrospectives
can sometimes be thought of, for incidents can be thought of as strictly technical
exercises. If you are digging in and
really a root cause analysis or some technical finding
is your goal, you may be leaving quite a bit on the table
so here are some ways that we can help broaden
that idea of what is a retrospective for and how do you
use it. The first thing is making space for
all of the responders, right? So after you've engaged
customer support as more of a responder role,
bringing them in to these post incident reviews or incident
retrospectives can be a great way to ensure that
you understand what their concerns are, what their experience
was like, and in turn, what kind of feedback they were getting from
customers. Right? And all of these things can feed
into both your processes and your product in
ways that you might not expect.
If getting those folks into a post into an incident retrospective
is challenging, which it is. Scheduling retrospectives
is difficult. It's hard to even get the core group of responders.
Oftentimes in an organization that's fairly busy,
you can always consider doing a quick check in with them
or an async check in even, just to get a sense for what their
experience was like, and then bring that experience
as a proxy to the group of broader responders
during the actual retrospective activities.
Next item, incident retrospectives,
plural. Right? And this can be a
bit of a hot take for some folks, but for sufficiently
large incidents, plan in advance
on having multiple meetings or multiple sessions, whether those
are meetings or async exercises to cover different themes
in the course of the response. So, for example,
having a technical or architectural deep dive, right, like trying
to understand the root cause, if that's a thing that
you're into, but then also
have a session to think, but the broader process. Right?
How did you interface in the broader organization, right.
Or potentially another session focused expressly around customer
communication? What did that cadence and that pattern look like
for folks?
Incidents, they're not optional right? Here in these complex
domains that we live and work. So taking the time to
really dig deep and learn about different ways that things are
or are not working can be key during these
nonoptional investments.
So the last item here is to capture all
of your improvements, not just the technical ones.
I think it can be easy to focus on action items to
make the application better or to avoid these types of incidents
from happening again. But at the end of the
day, there's a whole bunch of opportunities that
there are a whole bunch of things that you can take away from
an incident that are far beyond that, right?
So, for example, how do we think about our
cadence of communication with customers? Right?
Are there opportunities to improve our products so that
if something like this were to happen again, that we could degrade
gracefully instead of failing outright? How are our error
messages? Did things make sense from a customer perspective during
the course of the incident?
Making sure that that messaging is really good and clean and clear
and crisp can be helpful, not only for customers, right, for them
feeling confident that something is happening on
the other side of the screen there when something fails.
But also, how can we ensure that that is a
good signal to the folks who might be raising that to people internally who
are responding to this incident?
So all of these things, right, whether they're small changes to the
application, changes to our process, changes to our architecture
or broader organizational structures, all of these things
can get loaded into various backlogs, right? And can be invaluable
during prioritization exercises. So engaging
folks like your PMS and tpMs, those folks that serve as kind of
the glue of a lot of your organization, and ensuring that
they have a clear understanding for what happened and what
findings you had that came out of those retrospective
exercises, can be exceptionally useful.
So by broadening our incident retrospectives,
we can have deeper learning, better process,
and better product improvements. From that,
I like to think about incidents retrospectives as a flywheel for improvement.
By considering more perspectives, especially those that are closer to
customers, you're opening up that aperture, capturing improvements
to processes and your products that otherwise might never really have been considered.
All right, so we have a
handful of actions and some outcomes we'd expect.
By understanding the customer's expectation, we can
create more satisfying incidents experiences.
By partnering with your support organization, we can
have more aligned and better responses, right, that feel
better, that function better. And then lastly,
by cracking open your retrospectives, you can enable better
process and product prioritization, capitalizing on valuable
and potentially expensive learning opportunities.
I like to think about customer empathy as an organizational lubricant,
right? Using incidents to drive this mentality
and then better leveraging your support organization can
add all manner of value. So give
a few of these actions a try and and
feel free to hit me up. Let me know how it goes. Here are my
socials, and I'd love to chat with you all on discord on any of
this stuff. So thank you again for the opportunity to chat
with you all and looking forward next time.