Transcript
This transcript was autogenerated. To make changes, submit a PR.
My name is Dylan, and I am co founder and CEO of a company
called Sleuth, and this talk is about delegating engineering toil
to the. You know,
it's been a long day. You've all had a conference session. I want
you to visualize a beautiful and perfect world.
A world where developers get to spend their time on
producing engineering value, right?
Not wasting their time, not working on busy work, not doing these
things, having the focus time that they need to actually ship the things
that are going to make a difference for your customers and your organization.
Now, can this world exist? Perhaps. But there's this thing,
there's this thing called toil. Now, generally speaking,
in the english language, toil means to work extremely
hard or incessantly to do exhausting
physical labor. But more specifically, in our engineering
world, in software engineering world, we mean something slightly different.
What do we mean by developer experience toil?
Well, this is the repetitive tasks that your dev
team needs to do to get things done right.
They're repetitive tasks that aren't necessarily unimportant,
but they are tasks that, again, you have to do over and
over again. You have to do it correctly. And generally speaking,
at some point it's wasting your time. It's not giving your customers that
end value. It's something that you likely can do
without. And how do you know if you've got developer experience toil?
There's a couple of categories. There's a couple of ways that you can ask these
questions to understand if you do, or, you know, if our
team always does XYZ. In other words, like,
I'm a developer, I need to remember to transition this issue
into this bucket, and I need to remember when it goes out that I need
to ping the PM and let him know. This other thing, it's manual task.
I have to do it every time, but it's me. I have to
do it. I have to change context, change flow, manual steps
in your workflow, things like, I'm going to do a deploy,
and at some point before I merge things into production,
I should go figure out all the people that were involved in the change.
I should find them, mention them in slack, ask them, is this okay
to do? That's a manual step. That's context switching. It's toil.
I have to do it over and over and over again. And another giant
category is like syncing state between different systems. So again,
your Jira backlog, the place that is never in touch with
reality, right, because people forget to transition
state or you got to go into slack and mention people,
or you got to update the wiki with your release notes or whatever it is
you're pushing state from one place to the other.
Are your customers getting benefit from that? Likely not. Are your
developers getting frustrated? Likely so. All right.
My argument is that automations remove toil.
So when we talk about how we even just move our software
industry forward, it is through automations.
Automation is this lovely, wonderful thing. When we do something 20 times and we realize,
hey, this is a repetitive task, we can delegate that off to robots
and that can be super exciting. Don't just take
my word for it. I mean, we've got a timeline here of examples
of automations just transforming our industry. So circa 2001
automate unit and synthetic tests. I worked at Atlassian
back in the day, and right around them we were writing leads and heaps
of best, but they were running on my laptop. They didn't necessarily
gate my changes. We decided to move to
automated execution of that, and that changed the game.
Suddenly we could run them in a consistent manner. We could
break them up and execute them. It really changed how we could work.
Fast forward a little further. In 2005,
ephemeral infrastructure, defined as code puppet chef eventually
evolved into things like terraform again,
maybe even got rid of an entire role. I knew
ops people whose job was to understand the drift between this server
and that server, and new engineers who'd lost days of
time debugging issues that were specific to a
server. So you move to infrastructure as code automate that,
and suddenly we've changed the game again. And then fast forward
a little further. The DevOps pipelines with no manual steps.
I mean, huge game changer, right? DevOps revolution right there in
a can, being able to deploy and make it a non event.
This is how we move our industry forward.
All right, so maybe I've convinced you that automations have
a lot of potential for change.
Let's talk about what the traits of good
automations in relation to developer experience workflow look
like. Because it's not just any old automation. I mean,
obviously, if it's helping you, great, go for it. But I will argue that there
are some very key characteristics that allow you to level
things up. And I actually made one more point. If I
go back to this slide here, I put these things in a sort of staircase
because I think it's important to understand that the automations that
we have developed continue to build upon each other.
I would argue that today we can do far more exciting and powerful things for
developers. Experience workflow. Because of all of these things, that have come before
and it's truly now that we can start to unlock
this higher level of automation. All right,
so talking about traits,
number one trait of a good developer experience, automation is
that it's going to help drive your culture. Now I feel like when
I've talked to people about this one, initially the reaction is you're
talking about tooling. How is tooling culture? Well,
tooling helps define culture, right?
It helps build these things like guardrails. It says if we
are going to always do a thing, it helps define how we work because we
agree as people that we're going to work in a certain way and then the
automation can enforce those norms. For very simple
example, if I'm going to say there needs to be an issue,
key in the title of every pull request so that people have the context about
why we're making a change, they have the context to understand
for the review cycle and that sort of thing. That's an agreement amongst
a developer team and you can enforce that with an automation. And then rather
than having somebody be the bad guy, you have basically automations
being the bad guy. And it's not really a bad guy because you said we
want to do this and it's there to remind you of these things. So a
good automations enforces these norms and
it also allows you to try new things. So if you want to try something
different again, you can say let's agree as a group,
let's put in place some automation and then the automation is
there to support you and build the
layers that you're going to build upon as a team. Okay,
next automation. Excellent trait is that automations should
be low lift. Now I love these little XKCD things
because I think they got it super. Exactly right in all instances.
As an engineer, I am guilty of having done this a million times,
which is to say I'm working on a problem. It's a problem to solve things
for customer value. I realize, hey, I could
add an automation here and make my life way better. And what I think is
going to happen is I'm going to spend a little time on the automation and
then I'm really going to get into the flow and get to do things.
What happens almost nine times out of ten is that
I start working on the automation. It's far more complicated than I thought. It takes
more time. I've run into a bug, it's some ongoing development and
now I have no time to actually do the real work behind the scenes.
So you can imagine if that's the world that every automation
lives in. It's hard to make progress because you never quite know
the depth of that automation. Whereas if you have low lift automations,
you can treat them as an experiment and basically
iterate on them a lot quicker. The other thing I would say too,
is that the comic on the left is
also true that sometimes there's things that you do, but it doesn't take a lot
of time, it's just a context switch. And if you were to ask yourself if
this is going to take me a week to automate this, when am I going
to get paid back for that? And the answer might be never. But you add
twelve of those up and they're going to add up over time. So if
you can make your automation super low lift, you can get a lot
of benefit and you can start to attack some of those death by 1000
paper cuts. Which leads me to the next topic.
The idea of a low lift automations means that you can really
do a paradigm shift. And this is honestly the crux of,
I believe the way that modern leads are working is everybody is trying
to achieve this idea of continuous improvement,
continuous learning, and it's not rocket science to
realize that. It's just straight up science, right?
You need to be able to perform an experiment. You have a hypothesis,
you run an experiment, you measure. If the hypothesis was
wrong, you try again. But if your experiment is going to take forever,
you're only going to get one or two shots at goal. But if you can
do these things in a tight loop and you can really measure these things,
you can continuously iterate towards a better and better process.
So a good automation is an experiment.
It's your team saying, what if we did x? And you're going to
measure and you're going to check it out and you're going to try,
which actually I thought there was a different slide coming.
But another trait that I think you will find in
this day and age that is huge is that you have
to be able to use the tools that your teams are using already
today in these automations. So if you think back to that slide
where we had the stairstep of how automations have sort of
built upon each other over time, we're at this place now
where we're all using these cloud based tools, right? We're using some sort of git
repository with like code review. We're using
an issue tracker somewhere, we're using a CI CD system.
My guess would be that 90% of you are using some subset of
these tools that are up on the screen right here. Now, what that means is
if we want to make an impactful change to the way that your
development teams are working, you need to be able to automate
across all of this because things start in Jira and then they move into a
pull request, and then they move into a production environment via CI
CD, and then they move into pagerduty when you've messed up.
And you need to be able to talk to all of these systems to really
take best practice workflows and implement them
in your services. So let's
talk about this in terms of a story. We talked about the traits of
a good automation. Now, I have a number of different stories,
but I like this one the best. So for our startup
sleuth, question,
is that right?
Okay, well, maybe I
meant to say four then. Yeah, sorry about that.
Telling a story about something that I think fulfills these categories.
Now, we cheated a little bit in the sense that sleuth is built to do
a lot of these things and to basically bring the low lift side of
this. But there's a number of different best
practices that are out there in the world. One of these is to say,
I'm going to make a change and I'm going to deploy that change to a
pre production environment. And I want to have a culture on
my team of people identifying that their change
has hit a pre production environment. Take a hot minute,
do a smoke test, check that it works in that pre production environment
the way that we want it to before. We're going to go ahead and
merge that into a production environment. Right. That's a reasonably common workflow,
but you can imagine that that's a little difficult. Right. We have to merge a
thing. We have to know that CI CD has deployed that to a specific environment.
I need to know who it is that I need to mention, and then I
need to hopefully collect that information in an environment where the people are
working. So like something like slack. And then I need to trigger
a CI CD pipeline in some other system to promote things
to production. A complicated automation,
but you can see how that's going to drive reliability. It's going to drive
accountability, it's going to promote my
team doing smoke tests and those sorts of things.
So a great example of taking all
of those traits of an automation and building it into something that
if we can adopt really quickly, we can understand how that impacts our
flow. And it is something that really defines the culture of our
team. It's holding up so many pillars of
what we want to do. And in our case, actually, we used that as an
experiment, and we decided we needed some nuance to that. We started
with just straight up approvals, and then we said, if it's
no big deal and it's a small change after 10 minutes, if everything's
okay based off of the monitoring that's coming in from other systems,
let's auto promote it. And then we went, oh,
how about if we have a label on a pull request that's like quick fix?
It doesn't even stop at all. It just goes straight out. But because
we could experiment with that automation and move really quickly,
we could see how it would fit the flow that we were arguably trying to
attack ourselves.
Okay, so that's a little bit about the why,
like the theoretical, like how we should build automations to be really impactful.
Let's talk a little bit about what teams are doing today. So the
good news is you all have been doing these things for the last
ten or 15 years, so we have a ton of best practices out there,
and teams have adopted these things, and they tend
to fall into a bunch of different buckets, four major buckets,
basically, guardrails, notifications,
actions and workflows. So let's walk through
these, and I'll tell you a little bit about what they mean, and I'll give
you an example of each. Okay. First off,
guardrails, we were kind of talking about this a little bit before. Think of
a guardrail as defining the boundary that your team agrees not
to cross. So when you say we always or
we never, for example, we never merge a pull request when we're
in an incident, right. It's a reasonable thing. Lots of teams
do that. Or we never open a pull request without an issue key.
Right. It's saying, these are the guardrails. As an organization,
as a team, we want to live up to a certain level of excellence,
and we won't cross these guardrails. These are types of
automations that tend to be somewhat binary,
right. They're going to either keep you from doing a thing or keep you doing
a thing. And a great example of this
is batch size. Right? So for those who are
familiar with the state of DevOps and Dora metrics,
frequency is something that DevOps teams are trying to
maximize. And the way that you maximize that
is by driving your batch size down. You want to make the smallest amount of
change that's going to have an impact, but have a very small blast area,
small best radius, in case something goes wrong.
And as a team, you can agree, hey, we want to try and keep our
batch size down so you could add an automation that says a pull request can't
have more than x change files where x makes sense for
your team. And you can see how that's going to say, as a group,
if I've opened one that's too big, I need to split it. I need to
go back. I need to just decide how am I going to
stay within these boundaries. And of course, with any of these sorts of things,
maybe you've reformatted all of your code or run some sort of new linter,
and you decide, this is the time where we're going to ignore this check.
But again, that's the exception, not the rule. And that allows
you to understand, how often are we exceeding this thing?
Is this a rule that makes sense for our team?
All right, next up is notifications.
My guess is, like, 100% of you have a cell phone, so you
probably know what notifications are. It brings visibility
and attention to critical information. But critically,
a good notification, just like on your phone,
is context sensitive. It hits you at the
right time, it hits you with the right information,
and it's hitting the right people. Right. Because I'm sure you all have notification
fatigue. I'm one of those people who can't have 400
little red dots on my desktop. I know some of you can. I don't understand
how you do it, but I can't.
But you can use notifications not just
to keep people aware of what's going on, but you can
nudge behavior in the right direction. So one
of the examples that I like is that a PR must not
be open for more than a certain number of days. A draft PR.
Right. So if you want to have this culture of keeping your work flowing
and making sure that you're not spending a ton of time on this dead branch
or whatever, you can open a draft pull request. And maybe you say, we don't
want that to be open for more than ten days. Well, the notifications comes along
and says, it's been eight days for this thing being open.
Perhaps you should either move this into real work so that you can start to
get this reviewed and get this into your flow, or move on to something else
and close this draft pr. You're nudging the behavior
in the direction that you want people to move.
Another great example of this is goals, where you say,
I want everybody to be reviewing a pr that's been opened within, say,
like 10 hours or something like that. You identify and notify
people at the seven hour mark. Well, cool. I've got 3 hours to
hit this thing that we all agreed on. Notify them again an hour before
you can nudge their behavior with a notification into
the direction that you guys want people to be
behaving. All right, actions. I don't
know. I've heard people call this a different thing. I call it actions,
but basically it's can if this then that. So I detect
a condition in some sort of system. I am confident
enough in the signal of that condition that I want to change the state
in another system as a result. Right. So maybe
I notice an incident in pagerduty and I
want to lock all of the pull requests in GitHub. Right. It's a
if this, then that. So again, attacking that
thing from way back at the start of this talk where we're saying one of
the types of toil is keeping these systems in sync so
you can have something that you trust instigate
changing that state rather than having somebody have to toil after that one.
That is really, really popular with a lot of teams. We see people use this
all the time is the issue tracker stuff. So like
I said, I don't think any of you would think I'm crazy to say that
Jira is where things go to be out of sync.
So you do a deploy,
you get that deploy into your production environment, you automatically
transition an issue into deployed to production. And why is this cool?
Well, you got support engineers who are looking to get back to your customer
and let them know that an issue was fixed. Great. They can see that the
pull request was merged, but maybe at your organization that could be
anywhere between 2 hours to seven days before it
actually gets into the customer's hands. So you're using this to keep work
in sync so that you can have visibility, so that you can better service your
people.
Finally, we have workflows. And workflows are just what you
think. It's how you get work from concept all the way
through to launch. It can be something really small
that guides the way that you're doing work, or it can be something really big,
like I was mentioning with the slack based approvals. Right. It's a
fairly complicated flow, but it's saying this is how we work.
This is the flow with which we are going to take work from one place
to the other. Now, automations that tend to be workflow automations are
often a combination of the other ones. Like it's saying I'm going to have some
sort of notification, I'm going to have some sort of, if this, then that
I'm going to have some guardrails that check the thing and I'm going to bring
it all together into a larger process that we can adopt
that's going to help the flow of our team.
One of the favorite ones that I have is
that idea of auto locking a project when your environment
is in an incident. Now I've done this
with leads that I've run for probably ten years or
something like that, but I've done it in ways that were inconsistent
and not super effective.
Right. Like you have the SRE team say, hey, I'm going to throw a slack
notifications in there and be like, everybody don't deploy. We're in the middle of
a know and well, what's the first thing that a developer does is I
didn't read Slack and hit merge. Right now you've got more change on
top or better yet, oh, this one I've done a thousand times which is
you think, oh, everything's on fire. It's like all the cpus
are at like 90%. It's a total mess. We need to get a handle on
this and somebody does a deploy and nukes your old infrastructure and spins
up new infrastructure. Now it's not 90%, it's like 160% and it's
really on fire now it's like hard down, right?
So the cost of not doing this right every time
and getting it into a place where developers live is high.
So being able to take this and say I'm going
to use my pager duty as the source of truth for nastiness. I am going
to put a merge block and get hub so that developers where they
live, they're going to go in there and they're going to try and click that
button and it's going to say no, right? And yeah, I'm going to add some
notifications in there as well. And then when the thing gets resolved
I'm going to make it all better. Right. And that's just how we work.
So it's a good example of how you take a
somewhat simple situation, bring together a lot of different tools
and make that work in a way that's going to help your team and help
your customers. Okay. So hopefully
with all of this you can sense a theme and it's a pretty simple
theme. You need to measure experiments,
measure and repeat and you just need to do that over
and over and over again. And automations,
they're just like performance issues. Maybe you've
worked with a junior dev who you ask to do some sort of performance issue.
And they look at the code and they say, I think that it's
inefficient in this area, right? And your red flags are flying off
the shelf because you realize that's not how you understand a
performance problem. You measure it and then you look and you go,
oh, it looks like that's the area that's slow. And you try something and
you push that change out and you measure it again and you go, turns out
I didn't get it. And you do it again, right? And then you say,
turns out I did get it. And it was not the thing that the junior
Developer thought was potentially slow. It's the same with
automations. It's the same with the software development flow.
We just so happen to be in a place now we're
on that staircase of automations where we can start to
do these things. These were really hard to do in the past,
it's a lot easier to do now. And we have a lot of best practices
and things in place that we can utilize.
So it wouldn't be a sponsored talk if I didn't shill my
own product at some point. So if you like the way that I'm talking about
these things, go check out our sleuth automations marketplace.
Obviously I'm passionate about these things. And so
we've built a product that works in very much that same way. It's covering all
of those traits. We make it one click so that you can treat it as
can experiment. We'll give you the efficacy of the thing based on
what you used to be doing and now what you're doing here. And probably most
importantly, it's a catalog. It's a catalog of what is the art
of possible. And I will remind everybody that not every
automation is made for every team. It might not be the right automations for
your team. You can try it and see how it fits, see how it feels.
I would guarantee that there will be something in there.
There will be like three to five that work for you and probably 25
that don't. But yeah, take a look,
check it out. And as
we move into this golden age of automation, maybe that original vision
that we had of developers actually getting to work and code and
not having to have distractions and do obnoxious
toil, we can get there. We can get a lot closer than
we ever could get by embracing automations, the right
type of automations and treating them as a continuous learning and
experimentation framework.
That's pretty much all I got. If you want to go check out more,
you can go download our book. You can check out the marketplace. Like I say,
you don't have to be a sleuth user to
see that. It's just visible for everybody. So you can go browse and see if
there's anything that works for you. And if you like it, give us a try
or chat to us and tell us why you don't. Either way, that works for
us. All right, well, thank you all so much. I really appreciate it.