Transcript
This transcript was autogenerated. To make changes, submit a PR.
Are you an SRE, a developer,
a quality engineer who wants to tackle the challenge
of improving reliability in your DevOps? You can enable
your DevOps for reliability with chaos native.
Create your free account at Chaos native.
Litmus Cloud hi,
I'm Yushai with Linearb and today I'm going to talk
about cycle time. I'm going to talk about what is cycle time
and how do you use that to measure and help your dev teams improve.
I'm going to talk about why cycle time should be
the first metric you should be looking at when you're just beginning
to introduce metrics and measurements to your dev process.
And finally, I'm going to talk about why just measuring and
collecting information and metrics and dashboards is a good first step,
but is not enough to really push your dev team
and your dev process to its best. I'm going to show the
next steps beyond metrics and how you can use them to really improve
your dev team. Linearb is a startup that's
all about improving dev teams and improving the dev process.
We deliver software delivery intelligence by
providing dev teams with the right information at
the right time in the right place so that they can improve without an
effort. So let's start talking about the
problem or the goal of measuring the engineering process.
Why do we need to measure things? So I think it's pretty
common, that common knowledge that you can't really improve
anything if you're not able to measure it, if you're not able to compare,
how was I doing or how was the team doing yesterday?
How we're doing now? What has changed?
Once you have some baseline and you have some numbers
that you can use to describe some
model of your dev process, you can start talking about improving,
about making these numbers look better and reflecting
the improvement that your team is making by adopting better process,
by removing obstacles and
blockages during the process. So measuring is
important, but how do we measure the work of developers
and development teams without really
falling into traps? Like measuring the wrong things,
like creating a culture problem where people
feel they're being measured for the raw output.
We've all seen previous cases where
you measure the wrong thing. You get people optimizing for
things like code lines or pull requests
or comment in the code. That's not the thing you want to get.
And it creates a really bad culture of big
Brother. I'm being stack ranked against the other developers and
we all know that development is can artistic or creative
process where measuring raw output
is probably not going to get you what you want.
So on one hand, you want to start measuring things.
On the other hand, you want to make sure you're measuring
the right thing. And the right thing in development is
the team and the process. Instead of focusing on
individual metrics, instead of focusing on technical
metrics like code lines, like commits,
things that are easy to game and that don't really get you any benefit,
you should be measuring the process. You should be looking at how is the team
working? Can we find places where
the process is not efficient? Can we shine
a light on things that kill productivity?
And I'm showing here three of the top productivity killers
in dev teams. One of them is context switches.
Can we eliminate or reduce needless
context switching where people have to switch between tasks,
load some new context into their brains,
then drop that context for doing something else, then go
back. There's a huge cognitive load and
a real productivity tax. When you switch between tasks
and when you finish working on something, then start
looking at a new thing and then go back to your initial or original
task, you're going to spend a lot of time. Some research puts that
at least 20 minutes just to get back into something that you
may have been working on earlier this morning or yesterday.
So context switching is a well known
pain in the development process. There is the
issue of work in progress, and it's
kind of a dual to context switching, where people
like CTO start things. There's always a lot of things to do.
Dev teams are always busy and it feels
good to start thing to get something started on an item.
Typically that's the interesting part of working on an issue
or a fix of a bug or some new feature,
whereas pushing something to completion requires
collaboration with other people to get code reviews. CTO eventually
merge your code and to deploy it. So a
lot of teams eventually end up with a lot of things
happening together, a lot of work in progress, and it really takes
discipline and mindfulness to be
able to keep the work in progress low
and focus on finishing things before starting new things.
So a lot of work in progress will cause context switching,
will increase risk of delivery, and is
another well known pain around dev
team productivity. And finally, there's what
we call dead value. You worked on can item.
It could be a bug fix or a new feature. You had
your pull request reviewed, you made some changes, you finally
got this approved and you merged the code. But now this
is waiting to be deployed. In some places this could be hours,
days, weeks, even months. At the high
end of the scale. This is value
that has been created, but it's sitting there until it's deployed and
actually helping customers actually live in production.
This is dead value and you're not learning anything.
The team is not learning anything. The company is not learning anything by
having that code sit there in your master
or development branch but not deployed.
Your risk when deploying things, the more
code that is waiting to be deployed, the higher the risk.
So being able to ship code all the way
into production rapidly, as again, a well
known source of increased productivity and
having dead value in your code base, is a well
known pain. So how do we measure the
process and the way the team works to eliminate these and
similar productivity killers? Notice that I'm not focusing at
all about individual developers or about
output. This is about looking at the process and highlighting
inefficiencies, highlighting opportunities
to improve. So the single metric
that I'm proposing that you start with
if you're not measuring anything today, or that you add to your metrics,
if you already are measuring things, is cycle time.
Cycle time is a pretty well known metric. It's got
some recent attention in the
past few years. You can look at the accelerate
book or look at the Dora metrics and
in different variations. Cycle time has become a
pretty well known standard for modern measurement
of modern dev teams and dev processes.
So what is cycle time? Cycle time focuses on
how long does it take the team CTO deliver a piece of work end
to end, starting with when we first start to code
against an issue.
It could be fixing a bug, introducing a
new feature, doing some non functional work,
but we begin to code it. Typically this lives as a
branch or a set of branches that progress
and the code base and eventually the
product and the deployed services are
modified through a change in the code,
some form of code review,
typically a pull request or a merge request that gets reviewed
by peers in my team,
after some back and forth, it gets approved,
it gets merged. CTO,
the main code base, and eventually gates deployed
to serve its purpose in production settings.
The time it takes from the beginning CTO, the end of this entire process
is what we call cycle time. And when you measure that across
all of your bits and pieces of work, all of your branches and
pull requests, you get some idea
of how quickly the team is able to turn around value.
And it turns out in a lot of research in the
accelerate book and others shows that there
is a very high correlation between consistently
reducing and having short cycle time to
well performing and elite performing teams. Let me
drill down a bit into the four main segments of
the cycle time and why each of them is important.
CTO measure and to understand as a separate part of cycle time.
So the first piece is coding. Typically a
developer begins to work on a new piece of
work by creating a branch and starting CTO code.
The changes and the additions required to serve
the purpose of this change. They could be
fixing some existing code or refactoring. They could be adding new
code, adding tests, whatever is needed to get this change
into the code base. This is typically the
work of a single person. Again, there's no always
in code development, but typically this is a single person working
on a slice or a piece of functionality
that's being added to the code base.
When that is done, and when the developer feels this is ready for review,
the developer basically creates a pull request or a merge request
or a similar process, and the coding phase is
over. So coding measures the time it takes for me to
begin working on something until it's ready for review.
The next phase is pickup, where my pull request
is waiting for someone to take a look at it.
Pickup is basically a dead there's
no value happening in this segment. It's all about
waiting for someone to begin reviewing. So once
someone actually picks up the pull request and begins to
create some review, begins to review my code, provide some
comments, that is when pickup ends and the review process begins.
So pickup is how long does it take for the team to
pick up a piece of change that I have requested
review on and begin to actually review it?
Then the review phase is about the
entire back and forth dance between multiple
people, at least the original coder
and one reviewer, sometimes in many cases more
than one reviewer, commenting, making additional changes
to the code and eventually reaching a
decision that the change is good enough and can be merged to the
code base. So the review phase ends when the
code is merged back to the code base and the change is
accepted. So review captures the time it takes
for the team to actually look at
a piece of change to the code base, discuss it
either asynchronously or in other means,
and reach a conclusion that this is accepted,
and actually merge it back to the code base. Finally,
the deploy phase talks about how long does
it take the teams to take this piece of code that it's already
back in the code base, the code base has already changed. How long
does it take the team to actually get this to work in production,
to actually serve its purpose by
improving the users or customers
experience, by improving some other non functional goal,
by fixing a bug, et cetera.
And once we have completed that, the entire cycle,
time for this piece of change can
now be measured from end to end. And when we look at this across
all of the changes the team is pushing, you get a measure
that really captures the ability of the
team to quickly deliver work by
having pieces of work take less time. End to end, you get
less context switches because you're not
going to have as many things in play at the same
time, you're going to have the team focus on starting
and finishing things quickly.
You will get much less work in progress.
Reduce context switches and by having
smaller bits that are deployed, you're going to reduce your
team's risk around deploying code. And of course,
dead value. If the deploy time is short, then dead value in my code
base is going to be minimal.
So having talked about the four segments of cycle
time, let me drill down a bit and talk about how
do we actually improve, how do we improve on each of these segments?
And they each have different dynamics, which is why it's interesting to
measure them separately as part of the combined
cycle time, and why improving
each of these segments leads to better outcomes for
the team. And it's only worth measuring if
you can improve by improving the measure. So,
coding, again, this is how long that takes a developer
from starting to work on something until it's ready for
review. And what we typically see is that coding tends
to the time it takes to code something blows
up for
some very well known reasons.
First is to have good requirements.
It's very common. If the requirements are not great,
I begin to work on something, I start coding and then
I hit it a wall where some edge cases are not well
defined, the requirements are not clear. What really happened
in each of these cases, I need to go back CTO product or to whoever
owns requirements and negotiate some clarifications.
This is going to take longer. Like by definition, it's going to increase
the time it takes me to code this. Notice that we're not measuring
actual work time spent, we're measuring calendar time
or clock time. So if I started to work,
I wrote some code for 2 hours and then I had to start
talking with product for better requirements. That is going to take maybe
a few more hours, maybe a day or two. Coding time is going to explode.
So having great requirements upfront, being able
to maybe begin my work by checking
out the requirements and really reading through and understanding that
they're fleshed enough, is a great way to
get coding time down and will also
naturally reduce waste
it work. If I've started to code something and it's not the right direction
and the requirements are changing because they're not fleshed out
as they should have been.
These are all things we can gain by having better requirements upfront.
The other obvious way to reduce coding
time and cycle time in general is to cut down the item
of work that I can start and finish working on. So instead
of doing a huge change, a huge feature,
let's just take one thing, one small thing that we can code
review, merge and deploy as a standalone
slice of what we need. If we can do that,
we have chopped down our work. Our pull requests are going to
be smaller and everything coding obviously,
but also the rest of the phases are going to be much shorter
because it's much easier to push through a small change,
get it reviewed quickly, get it merged, and smaller
risk to deploy it than it is to take a huge thing.
So if you're working on something and you can
chop it down to smaller items that can be individually
delivered all the way, then that is almost always
a win for the team.
Much better predictability about being able to deliver these
things, much lower risk and a faster cycle
altogether. And finally,
the third contributor to long coding time is
areas in my code base, which are very hard and difficult,
where every change that I'm making that touches those
areas is going to take longer because
everything breaks because it's brittle. So by
refactoring places in the code that are very difficult to
touch, I can improve coding time for the next items
and pieces of code and changes to my code base that need to happen.
So sometimes by being proactive and refactoring code, we can
reduce coding time for the next
tasks that we have. So coding again,
if we are able to push that down, we are typically going to end up
with smaller pieces of work and probably
with a better process for requirements. Otherwise it's
going to be very hard to drive coding time down.
The next segment is pickup, and pickup is all about communication
and coordination in the team. Remember, nothing happens in
pickup. It's all about a pull request or a merge request
waiting to get picked up by someone in the team.
So by improving the way we communicate around it,
so the people know that they need to look at my change
and look at my code, and by
having some processes in the team to coordinate around this, so that
no one forgets to review code, so that I don't have to
nag others to look at my code, so I don't forget that
my code is waiting for someone. All of these measures
are going to help push things through faster and by
providing feedback on my changes quickly,
I'm able to act while it's still fresh in my mind.
If I get your comments on my pr within,
I don't know, an hour or two after writing my code, I'm in a much
better place to address this quickly and efficiently than
I am if I get these comments tomorrow or in three days after the
weekend. The other thing that can help with
pickup is to think about context switching in a deliberate
way. Typically, people are coding. Most of my peers
in the dev team are going CTO be coding and busy with their own tasks,
and then code review is going to be a context switch for them.
By being deliberate about when do I context switch to help review other
people's changes, we can be much more
efficient. For example, if I'm beginning my
day by taking some code reviews instead
of jumping into my
prs and my code changes,
I'm just starting my day so I can go
and do some reviews without paying tax of a context switch if I'm
getting back from lunch, it's the same story.
If I just finished a meeting. And again,
these are all context switches that already happened. I can be deliberate
and piggyback on those to
take some code reviews while
not incurring a context switch tax just for the review.
So by being deliberate and by being
aware of the importance CTO drive pickup
time down, teams can be really
learn to be efficient and get a great turnaround
and help each other finish things by
starting and jumping on reviews quickly instead
of having them languish and need to nag and be
the task that I'm always pushing to the end of the day and then I
don't have time and so on. The next
segment is the review process, and the review process is
perhaps the most complex or elaborate
segment here because it typically involves two
or more people. It's multiple
stages where people review the code, have some comments,
then the original coder needs CTo get back to this and make some changes
or respond.
And the way to drive review
time down to have all this dance of
people coordination, communication and
context switching all happen in a shorter time frame and in
a more predictable way, is first of all to instill
a culture that getting something done is
worth much more than starting something. So by helping
and by really focusing on finishing reviews that I'm part of,
either as the PR owner or one of the reviewers,
or even jumping into help to help a review
get finished and get some agreement on what's
needed to be able to merge this by focusing on
done instead of starting things, people in
the team can really help reviews start be
effective and finish quickly. Sometimes there is a bottleneck.
Some teams have dedicated reviewers or dedicated
reviewers for parts of the code, and a long review
time as measured as part of cycle time will
indicate that I have a bottleneck. I don't have enough reviewers,
and there's a lot of reviews waiting for a few people. So by deliberately
adding more reviewers, by reducing or
relaxing the requirements to review,
I can make sure that review time remains short and there's not
too much onus of reviews on just
a few people in the teams. And finally,
in some cases, the review process for a pull request
is going to be long because the original code in that
change is done the wrong way.
It's not well designed. Maybe the
choices made by the developer need to be reversed or revisited.
So by improving the design,
maybe having an explicit design phase when
coding or getting some review before writing all the code.
In some cases, at least in those cases where the actual code
changes need to be heavily edited,
doing that upfront, getting more feedback on my directions
and the way I'm suggesting to solve
the problem or to make the code change is more
effective in the coding phase than it is in the review phase
because it's going to be less wasted work and more informed work
while I'm coding. So these are the main contributors to
review time. And by driving review time down,
we can get to these better outcomes of having more
emphasis on done versus started,
better code to begin with, and avoiding bottlenecks
around reviewers. Finally,
the deploy phase. This is a pretty well known pain
and also a well known gain where modern teams
strive to have streamlined deployment,
where there's zero or close to zero time and effort needed
from when a piece of code change has made it and has
been merged to the code base and until it's deployed
and is live in product. It's obviously not easy or possible in
all scenarios, but it's
a pretty common understanding that the shorter the time is and the smaller the
effort is to deploy code, we're going to be in a better shape.
So things that improve our deploy are
obviously CI and CD systems, having reliable
tests, comprehensive tests that give us confidence to
be able to deploy code that has been
merged, and of course to have smaller
work items, smaller prs, smaller changes that are focused
on just one thing, much easier to test, much easier to deploy,
much easier to roll back,
lower the risk of deploying. And that typically lends to
faster deploys because we're willing CTO take on
more deploys when we know the risk is smaller.
So I've looked at the four segments of cycle time,
and how each segment has different behaviors
and different dynamics that lead to longer times and
different ways to address so that we can shorten each of the segments
and end up with a short cycle time across the
board. So some
of the benefits that elite teams get when they really reduce cycle
time consistently across all of the code changes that they're
driving. So these are like
all over. I'm going to talk about a few. So first
of all, there is the issue of predictability.
Regardless of the way you manage your development work, this could be scrum
or kanban or a variety of other methods.
By having shorter cycles, smaller pieces
of work that start and finish more quickly instead
of longer, and more
work in progress. Where each item takes longer, you get better
predictability. You get the ability to
say or estimate when this will land becomes much better.
When your items are smaller and they're shorter
in their cycle, you have a much more
efficient and short learning cycle. If you are able to deploy those changes,
you're making a small change, you get it deployed quickly.
You can now learn from how that behaves in production.
Did it give you the right benefit or the right deliver
the goal that you needed? How are customers responding to this if this is
a new feature or capability?
Did this actually deliver the cpu load reduction
or the database load reduction that you were aiming
for? The faster you get this out and actually running in
production on real data with real usage,
the faster you will be able to learn and make another change,
or make the next change that relies on the first one.
So learning cycle is an obvious wins with
shorter cycle time all the way to production.
By definition, you will have improved the way
your team works, communicates and coordinates with
now everyone being remote or hybrid, remote communication
has become even more difficult. It's no
longer a question of swiveling in the chair and hollering, CTO, someone,
can you take a look at my code? So you
cannot really improve cycle time if your team communicates
or coordinates inefficiently. So this is going to force
the way you communicate. CTo be much better.
You will have reduced the work in progress. You have
less items in play and more items delivered,
which typically reduces risk and reduces again,
context switching. You have less things in play.
You need to switch context between less things.
And even when context switching, this is within a
shorter time frame. So things are still fresh in your mind.
If I'm able to write my code, get some comments,
respond to these comments, make some changes, and eventually finish
my work in a day. This is so much better in terms
of my cognitive load and being able CTO returning
back to things than if the same process is spread or
the same net time of work is spread across two or three days.
It's the same effort, but then I have to remember and go back to
things that I've already left behind, and that will add up time and
cognitive load. And then by having much smaller
chunks of work, which is almost a requirement for great cycle
time, you will reduce the delivery risk.
Every item is a well defined change,
very small, much lower risk to deliver,
easy CTO revert in case you need to compare
that to having huge changes accrued
with a lot of dead value in my code base until I eventually deploy.
Something that is going to be a huge risk to deploy
will require much more elaborate testing
to be sure that this change does not break anything, and a much
harder thing to revert once deployed.
So these are all just some of the benefits
that lead teams get when really driving cycle time down.
And I've been talking with some of our customers.
We have some customers that are able to drive cycle time down to a
day or even less for a typical change,
and really are reaping the benefits of having this
predictable, short, small items approach
to delivering code changes.
So some of the numbers that we are seeing
across this is from Linearb B's customers.
These numbers are here to show that, yes, it is
very, very possible to improve cycle time dramatically by
just paying attention, by measuring it with the right tools,
by thinking about the four segments and
applying improvements across each segment as a result
of what we are seeing in our measurements. If I'm seeing, we have
a lot of, if our pickup time is high,
we can invest in improving our communication or creating
a schedule to review code twice a day across the team.
If our coding time is high, we can look at the
sizes of our prs, the sizes of our code changes and our
requirements, and so on, like I've described earlier. So by measuring
and then paying attention to what I'm getting across the
segments, our customers are able to
really rapidly
improve the average cycle time by
50%, even by up to 75%,
which is, that's like a four x shorter cycle time. Instead of typically
taking, I don't know, four days, your work can
start and finish, your typical items can finish in a day,
which is, this is a huge gain for productivity,
for developer well being, and for
predictability across.
How quickly can I deliver something?
That kind of insight really becomes easier when your
items are smaller and begin and end within
a day. Compared to a week or four or five days and so on.
So I've talked about measuring the dev
process. I've talked about focusing on the process
rather than individuals, and by focusing on
cycle time and its segments, how we
can start learning about the main productivity killers like context
switching and work in progress and dead value.
But I'm going to say that, yeah, measuring is a great first
step. You need to start measuring to be able to improve.
It's almost like otherwise you're flying blind. And it's very hard
for dev teams to consistently improve
without measurements. But measurements
is a great first step, but it's not enough. And that
is something that here at Linearb V, we've learned by working
with our customers,
hundreds of dev teams,
and we've seen that there is the first step that you take.
You begin to measure, you begin to improve based on these measurements. You've seen the
numbers, this gets you so far,
but there is so much more you can do when you go beyond measurements.
So why is just measuring typically gives
you some numbers that go into a dashboard.
Again, great basis to start improving, but why is it not enough?
Because by the time you have a
measurement that highlights a problem, for example,
if I have a measurement in a dashboard that someone visits every two or
three weeks, and it shows that my pickup time is high,
our team typically takes two or three days
to begin reviewing a PR.
By that time, it's already too late for those prs, and it's a lagging
indicator of where my problem is.
Another reason where measurements alone are not enough
to really help the team improve is that the problems
and the delays are not evenly spread. It's not all prs that will take
two, three days. It's some of the prs that are going to
be very quickly addressed and reviewed and stream cruise
quickly, and then have a short cycle time and some
other part of the prs or some other piece of my
code changes are going to be delayed, are going to be forgotten, are going
to be roadblocked. So like anything else in
life, this is an 80 20 case or a 90 ten. And being able
to know where my problems are is very important.
So having a metric is great. But now, to really solve
the problem and improve,
I need something that shines a light on the specific PR,
the specific code changes that are stuck, that are going
to have a long cycle time and are going to affect the overall
average or median or whatever metric I'm looking at.
So by moving from a metric CTO, an insight,
and then highlighting the specific items
that need attention. We can start talking about taking action
on specific prs, on specific code changes,
not just retrospect action on a measurement
or on a metric. And at the end of this,
at the very high end of solving these problems,
you get to automation. And I'm showing here in this slide just
two examples of how we
help dev teams improve by moving from metrics
and measurements to proactive action and automation.
For example, if a specific PR is
waiting to be reviewed, pickup time begins to accrue.
We don't have to wait until to wait
two weeks and look at the metric to know that we have a problem.
Once this PR has been waiting for some
predetermined threshold of time, we can now alert the team
and say, hey, this PR looks stuck. This PR is waiting for someone to
review. It's already been open for, I don't know, 5 hours, 10 hours.
Whatever your threshold is, you can now
take action and improve this
specific PR, improve the eventual cycle
time across all of your work.
So by highlighting where a problem is beginning
to happen, you can let the team address this in
real time. Curb the problem,
curb the growth of pickup or
review time or any other parts of the cycle,
and eventually your metric will be much will improve
even more. But by solving and automating the way you solve
the specific prs, you're not waiting to see a metric
and then think about what went wrong. We can tell you what went wrong or
what is starting to go wrong in a specific item,
which could be a branch or a pull request, a very specific
piece of code change that your team is working on.
So by highlighting and looking in real time,
what is actually in play right now, what begins to look like it's
stuck or delayed, highlight that.
Give that context to the team in
the right place, not in a dashboard, but like a slack alert or similar,
where this goes to where the
team is already living there and communicating there and in
real time. Now the team can respond and just organically
start reviewing that pr or fix the problem and
move along.
We see that in our customer data, and this improves cycle
time by yet another huge jump. So like the
first jump is when you start measuring things, and then the next
jump, which really puts you into or gives you a chance to go
into the elite land, is by automating and
finding and fixing the problems when they just begin to happen
on specific prs, on specific code changes with
very focused alerts and very focused automation
to solve those or remove those roadblocks
when they just begin to happen.
So to summarize we've talked about cycle
time, why this is a great measure to look at the dev
process, and why if you're not measuring anything,
that's where you should start.
Culturally, it focuses on the process, not on
finger planning to anyone or stack ranking developers. It's about
can we find and empower the team to remove
inefficiencies in the process? Focusing on context switching
on work in progress, on dead value in the code base.
We've looked at the four segments across
cycle time. We're focusing on
the coding segment where the developer is
working on the code, then the pickup segment where the
team or the change is waiting for someone to begin reviewing
it. Then the review process, which is about people
collaborating to get the code change to
a state where it can be merged and actually merging it and finally deploying
it into production. All of these segments together.
If we can drive the time it takes for a single piece of
work to go through each of these segments, if we can drive that down,
we will have improved our cycle time, improved our
productivity, removed cognitive load and context switching.
And then after looking at measurements,
I've shown the next step beyond measurements and metrics, which is
driving insights and automation that
really helps the team focus on the specific changes,
the specific items that begin to look stuck,
that roadblocks that are beginning to happen address
them in real time instead of waiting for what the metric will
say in two weeks or three weeks and doing some retrospect
thinking. And I've shown how by
taking these steps, dev teams are able to
really slice and slash down their
cycle time by up to four x improvement
in their cycle time in a very short time by adopting
a measurement tool and an automation tool.
I'm inviting everyone listening to this to
join our dev underrated community. We have a very
lively discord community with over 1500 dev leaders
discussing anything and everything that is interesting to development
leaders. We're obviously hiring
aggressively, so I'm welcoming everybody to take a
look and find your dream position at Leonard and
Linearb is free for dev team. Sir, you're more than
welcome to jump on and begin
measuring your cycle time. Begin improving by measuring your cycle
time by introducing automation to your dev process
and really go all the way to how elite
dev teams work. Thank you.