Transcript
This transcript was autogenerated. To make changes, submit a PR.
This is Joel Tosi. We'll be talking about metrics that matter. There's the
slides, all that fun stuff, all the social media stuff.
Let's jump into it, call right some quick
background so you know where I'm coming from with this.
I do a lot of help with organizations. And of course, just as
I'm sure you all are asked, people ask me all
the time for metrics. For a while there, I was
the guy that kept on saying, those metrics are bad. You can't
just look at cycle time. You can't just look at burn down charts. You can't
just look at cpu percentages. And I kept on saying,
those are bad metrics. You need to do better. And then I realized
I couldn't just be the person that said all the metrics were bad.
I actually had to have some ideas on what was better. So anyway,
that's where I'm coming from. I was tired of being the person that said all
metrics were bad, and I wanted to be the person that said, here's some better
ideas. Let's see what we can do about it. Quick takeaway
for you. Look, you can't ignore metrics as much as we want to,
as much as we wanted to say, make it easier, make it better.
People want metrics. They want to know if they're making good investments.
They want to be able to prove that ideas are working. So we need
to be able to guide better metrics for organizations.
When we can't measure what
we should, we measure what we can. And then, of course,
we optimize for the wrong thing. So let's do better.
All right, first thought for you, and then
we'll jump into some fancy math. We're talking about orienting and grouping. I'll put this
up for you real quick, just around the same page. This is a quick
idea of a value stream. On the left, customer wants something,
a business hypothesis. On the right, the customer gets it. Everybody's happy in
the mill. You can see some ideas around there on variability.
When ideas are cheap, for example, there hasn't
been code written or deployed yet or to be
supported. We probably want to exploit variability. We want to
take more ideas and find out if the ideas should
be invested in before it's too late.
Once we commit to an idea, we want to minimize variability and deliver quickly.
We'll get through those ideas here a little bit more later around variability.
What I want us to think about, though, right off the bat, is when we're
thinking about metrics, we have to be careful that we don't suboptimize the system,
that we don't optimize one aspect at the cost of the
whole. For example, if we're focusing on how quickly we can
deploy servers on that very far side of the value stream,
the deployment and operations, if we're focusing only on
the quickness of deployment, but not on the whole value stream,
we could be suboptimizing the delivery. So again,
think about these things in context and we'll get back into more of this later
on. For quick groupings for
organizations I work with, I want them to be able to know
where they're at and to be able to think about what could be better.
So I did these groupings right off the bat. Talk about simple metrics.
This is where a lot of organizations, if they don't have metrics at all,
this could be a good place to start. Simple things that
organizations can start capturing, easy to count, easy to collect.
How many defects do we have? How many teams are doing automated
deployments? What's our rate of delivery?
Very interesting. I'm sorry, not very interesting. Very simple metrics
that are very isolated in that value stream. Now,
again, if you don't have any metrics, this gives you someplace
to start. Are they interesting over long periods of time?
Probably not. They don't tell you a whole story. But if you don't have anything,
knowing your defect rate would be a good start.
Assuming you're already there and you want to get to a better space,
you could look at this idea of directional metrics. So if you think about
simple metrics and we add a period of
time, a time horizon in them, then we can start talking about directional metrics.
Is our defect rate going down over time? Are we increasing our
code coverage? Is there a percentage reduction in defects?
Is our cycle time decreasing? So again, what we could say here is
given some investment or some focus,
we've invested in infrastructure automation,
has that actually reduced our cycle time? We've invested
in test driven development or better test automation,
has that had a significant reduction in our defects. So again,
if we can start looking at causation and correlation over time,
that's where I start seeing these directional metrics come in.
Again, if you're in the simple space, getting to directional is better. If you're indirectional,
you should be looking at what also might be better before we
get to what we better in that space. I love this book from Don Reinerson,
principles of product development flow. If you haven't read it, I highly recommend it.
What I love about Don Reinerson's book, among many things
here is this quote. When cycle times are long, innovation happens
so late that it becomes imitation. If our cycle times for our
teams are months and quarters or years,
it's hard to say, why aren't you being innovative? You can be
innovative. You learn too late what the next things might
be, and your ability to adapt to the market is out the window.
So again, I put that up there for
context, that cycle times are interesting in the context
of making better products. And so if we think about this last grouping here,
impactful or economic metrics,
these metrics actually require intentionality. So not just cycle time
reduction, but for cycle time for a delivery that actually mattered.
Now, if you take back that earlier value stream we talked, but where it was
a little bit of product discovery and product framing and then delivery
and operations, reduction of a cycle time for a delivery that mattered
is interesting because it actually starts going across that whole value stream. It talks
about finding products that are interesting and then being able to deliver them
in an efficient and effective manner.
Systemic cost reductions, lowering the cost of deployments
or testing or overall running of businesses,
stopping bad ideas, reducing queues, reducing toil inside of an
organization. These are impactful and economic metrics.
And these are more interesting than just
finding out if we have less defects. These are more interesting than
just counting how many story points a team is delivering.
This is actually saying, is the work we're doing making a difference?
And I find this to be super interesting. It's also very challenging for
a lot of organizations because to actually measure items that are
impactful or economic, you have to agree organizations
on what these items actually mean. You'd have to agree what a delivery
that mattered actually is. You'd actually have to agree what
systemic cost reductions are out there. You'd have to agree upon what
do you do with bad ideas and not just have sunk cost fallacies, you'd actually
have to do agree upon why queues are bad and why toil is
bad. So you have to have this higher level agreement to actually get to
this point. Again, if you have nothing, simple is good. If you have simple,
directional is better. If you're having directional,
impactful is a good place to get to. That being said,
how would you actually know if you made a change, that these metrics
were improving? How do you separate signal from noise?
To actually move from simple and directional to impactful require
new thinking. And so this is where we actually need some math,
this dapper young gentleman here, Walter Schuert. So if you've heard of Schuert
charts, or process behavior charts. This is what we're talking about, control charts.
Walter Schuert came up with a way, using statistics, to distinguish
between common cause variation and special
cause variation. Put more succinctly, if you make a change,
you would use Schuer charts to say if your change actually made a
difference. Here's where it comes from, right? The way you
actually deliver value is a system. How you go about building,
deploying your products is actually a system. That's how you
work internally. If you do nothing at all, a Stable system
will continue to deliver within a given range. Right? You might have
a delivery every two weeks. You might have
a certain number of defects, you might have certain market share.
If you change nothing at all within a certain range,
you will get repeatable results. So our goal is
not to react to noise just because we invested in infrastructure
automation. Are we actually seeing a reduction in toil?
Are we actually seeing more stable environments? Right? So how do we actually know
what to expect? Let's talk about how to do these.
Here's a quick example. Imagine you
had a new product released. This could be product,
this could be tech, this could be deployments, this could be defects. But in this
example, we're just using a product that we're trying to make sales.
And your sales day to day kind of look like this. Eight on the first
day, six the second day, ten the third day, six the fourth day. So you
can kind of see how this plays out, right? If we were to graph that
on a time graph, with time
being the x axes and conversions being the y axes, it would look a little
bit like this right here, the red line in the middle being the average over
the period of time. Now, imagine if
we had that. That was our number of conversions per day. And on
day eleven, we had 14 conversions. Now, obviously,
we would say whatever we did on day eleven was awesome. We should do that
again. Whatever we released, the team should be celebrated
and get raises. But hold on. On day twelve, there was only four
sales. I guess we have to let that team go
because they're just not performing as well as we thought they would. But on day
13, it goes to eleven, and you can see how this goes, right? So if
we didn't use any kind of analysis,
and we said on day eleven, we released a new version of the
product, we might celebrate for no reason,
how would we know it actually made a difference or not?
You go about using these sure charts. Now, the math
is relatively straightforward. You can kind of see in the top here around 15,
there's a yellow dotted line, and then there's actually a dotted line at zero
to figure out the upper and lower bounds of the
stable system. So, in essence, any values within
the upper and lower bound are going to happen naturally through common
cause variation, not special cause. The way we calculate those
upper and lower bounds, we find the moving average.
So the delta between day one and day two, between day
two and day three, between day three and day four, and that's the
moving average,
we divide the total of those deltas by
the number of data points. So number the
deltas between one, day one and day two, day two and day three, day three
and day four and so on and so forth. We sum those up and divide
by nine because there's nine deltas. Once we have
that value, we multiply by 2.67.
Now we can go into. Why is it 2.67? You can read the book yourself.
It's just for ease of math. Look, the story becomes the same once
we take that moving average. We multiply by the 2.67,
we add it to the existing average, the red line, and we
subtract it from the red line to get the upper and lower bounds.
Now that we have the upper and lower bounds, you can see the upper bounds
are a little bit above 15. The lower bound would actually be negative, but we'll
say zero because you're not going to have negative sales. Now that we know
this system should produce between zero and 15 sales on
any given day, and that would be normal variation.
We could see that on day eleven when we launched the new product. With 14
sales, it actually didn't matter. And on day twelve,
with four sales, it didn't matter. On day 13, it didn't matter.
That's common variation. The change we made did not
matter. Kind of sad. But what's interesting
about this is we can't celebrate changes
that don't make a difference. And so we should use process control charts to actually
say, are the things we're doing making a difference and can we
back it up statistically? Key takeaways
for you with this idea of Schuer charts. Be intentional with what you're measuring,
right? Know what you're measuring and if it's making a difference or not.
More frequent data points obviously make this easier. If you're trying
to look at are we increasing the stability of our
environments and you only get data points once a week,
it's going to take a while for you to actually be able to predict stability.
If you're trying to look at set defects. It's the same thing if you're looking
at product releases and you're trying to see if a new feature makes a
difference, but you only check once a month, you're going to need more
time to get enough data to make it easier. So you
need to figure, but how to get more frequent data. Again,
this process has process control charts, Schuert charts, behavior charts,
whatever you want, control charts, whatever you'd like to call it. It works for product
releases, process releases, and tech releases as well.
It's just math. It works all right.
Know your actual problem, just like we talked about there.
Now, I quickly went through some groupings,
simple, directional,
impactful metrics. We talked about how to know
if the changes we're making in these metrics are actually provable.
I want to leave you with some ideas of what I think are actually
more interesting metrics than just cycle time, or even cycle
time that matters, or reducing toil. Here's where my energy is
most recently. This is a huge
one for me, this idea of predictability versus variability.
If you remember back in that first slide with the value stream, we talked,
but exploit variability on the left side and minimize
variability on the right side. Many organizations
want more predictability, but they don't monitor
their variability. This kind of ties in a little bit to the
previous area where we talked about process control charts
and Schubert charts. What I want you to think about is in many organizations,
they have high variability in the delivery side of
the value stream, in the build, test,
deploy release side of the value stream. What this looks
like, this high variability, is if you think about
when you go to test your product releases, you go to
test your next release. Is the test setup always predictable?
Is the execution. If you run the tests over and over again, are the results
the same? Is the setup of the data easy
and predictable? Is the access consistent? And so what
we see with a lot of organizations is that the tests are unpredictable
in the value stream. For product delivery,
they're unpredictable because sometimes the lower environments are
up, sometimes they're down, sometimes the dependencies
are available, sometimes they're not. Sometimes the firewalls are
blocking things in lower environments, sometimes they're not. Sometimes the data is ready,
sometimes it's not. Sometimes the data is changed. Now,
when we have high variability on the right side of the value
stream, in the delivery side of the execution,
and people are asking for more predictability,
when will it be done? How long will it take?
The problem is not getting more predictability. The problem is getting
less variability. And this is something that I work
with organizations over and over on, and you can do it as well.
And I would hope that you do it as well. Help people realize
how much variability they have. And when you have high variability,
predictability is out the window. You're all
obviously very good at math. If 80%
of the time your code works
as expected, and then the next 80% of
the time that all the test passed as expected, and then
the next 80% of the time, the build works as expected,
and then 80% of the time the deployment works as expected, and then
80% of the time the environments work as
expected. If you have those five events changed together,
chained together, the overall predictability is
not 80%, it is 0.8 to the fifth power,
right? So six, whatever,
four, three, you're probably under
20%. Does the work ever get through that whole chain
of events without having problems? So if
you want to be predictable and you're doing your best guesses,
knowing that only one out of five times, or even less, actually significantly
less than 5% time, I think it's actually going to
get through successfully. The problem isn't how do you get more predictability?
The problem isn't add more process to become more predictable. The root
of the problem is the variability. So we have to address variability.
Variability also leads to large queue times,
people and teams waiting. This is
very expensive. So again, think about
are we worried about predictability or are we worried about variability
and make sure we're solving the right problem.
Another area I work with extensively and do a lot of research lately
on is this idea of cognitive load. Many,
many teams, especially now that everybody is DevOps and devsecops
and everybody does everything,
the sheer number of contexts that teams are grappling
with is through the roof. You can see it in this example
here, not exactly a great piece of code through
nobody's fault of their own. The problem is the team is working with too
many contexts. There's too much
work happening inside here where the code and the architecture actually isn't even
quite set up right. And so teams have too much cognitive to load. The repos
and the code bases get large. You can see bad couplings across
teams and deployments, and this is a cognitive
load problem. And so then the question becomes, how do we reduce
cognitive load for teams? It could be
rearchitecting. That tends to be what needs to happen quite
a bit. It could be replatforming. But again, this idea,
if we start looking at the cognitive load on teams, and we're measuring cognitive load
on teams and we figure out ways of reducing the cognitive load.
This simplifies the work that teams do significantly.
It makes the work flow more smoothly, it makes everybody's job
less stressful, and we just end up with a better space
to be in. So again, I'm looking now at cognitive load on
teams and how do I capture it, and how do I help teams lower their
cognitive load? And that just makes everybody's day
better. I also look at
information lead time quite a bit. This is really interesting
for me because the people closest to the work should decide
the work to do and how to do the work right. So in the very
center of the bubble here, I've done this with a few organizations where we'll put
this up visually and we'll see when a problem comes in
or an idea comes in, or the team themselves observes
something that's interesting. How far up do they need to
go to get an answer to a question, can we do this?
Would this be a better idea? Is this an option instead?
And so if you think about this, if the team needs to ask
the manager's approval, and the manager needs to ask the business approval,
and the business has to ask the execs approval to make a change that the
team found out, then we actually have an information lead time
problem. The team doesn't have enough
information to make a decision close to the identification of the problem.
Now, what can you do about this? This isn't about like,
you need to be able to get the execs to answer questions faster or
the business needs to have more autonomy. The root of
this problem that we look at for how do we solve information lead
time is providing context further down into the circles, closer to
the work. If you have an information lead time problem,
getting better context closer to the team, closer to where the work is
done, is the solution. It sounds very
easy and it is very difficult to do.
We have to get the right context from the executives to the
business around why investments are being made a certain way. It also
gets into prioritization aspects. The business has to give context to management
around the returns they're looking for and
why they're identifying these opportunities to focus on right now. And the
managers have to be able to bridge that gap into the team,
and the team has to want that information and know what to do with
it. They have to want that ownership of the product and the problem space.
So again, information lead time way more interesting
to me than counting cycle time.
And lastly, I spend a lot of my time looking at social
learning. Now, what I like about social learning is
not only is it just kind of better for the team, right? Not only
is it just better for, like, we have a more skilled team,
we also have discovered things where,
by supporting social learning, we actually lower
the dependency upon team members. And so, to be very
clear, what I mean by social learning is the team that is
working on products learns together. Now,
this doesn't mean that just the engineers
learn one thing and just the testers learned one thing. It means the whole team
learns together across skills and across contexts.
It might sound silly, but I can't tell you the number of times where
a nontechnical person, a business analyst maybe,
would say, like, I don't know why the code looks like that.
Can you explain it to me? And then through explaining it to the business analyst,
they ask a question about the product that then helps the engineer.
And conversely, if an engineer is working with some kind of
deployment and they're trying to explain things to a test engineer,
and the test engineer says, but how do I test it? Like this.
You have this nice bridge of understanding, and this
idea of social learning just amplifies a team's ability to get work
done. So I really like this idea of social learning. There's this
item down here at the bottom. You can kind of see it says diffusion index.
Diffusion index is a metric that I actually look with in learning
teams, and it looks at what's
the gap between the highest performer and the lowest performer?
I'm sorry, that's the wrong way of looking at the highest skilled and the lowest
skilled on a team. And so what we mean by that is
teams and people self assess. And we look at the gap between
the people that self assess their skills are the highest versus the people that
assess their skills the lowest. And what we found is
that when we shrink that gap, when we shrink that gap between
the highest, the perceived highest skilled, and the perceived lowest skilled across
a team. So, across skills, when we can shrink that gap,
we tend to have less reliancy on
a single person to make a decision. We tend to have less reliancy on
a single person to do a certain facet of work. And so, all of a
sudden, those silos that exist within a team start to shrink.
So, again, I love this idea of social learning. I love this idea of diffusion
index. I love this idea of measuring the gaps within perceived
skills within a team and looking at how do we address those gaps?
Because, again, once we increase the capabilities of teams, we increase the
capabilities of organizations, and now, all of a sudden, work is easier.
People are less stressed out. We're not working as many hours because there
isn't one person waiting on one person to make a decision,
waiting on another person to make a decision. Work gets more
enjoyable, work flows more smoothly, people are
less stressed out. And that is a wonderful thing. So I
gave you this kind of groupings of metrics.
We went through some math talking about how do we
know the changes we're doing actually make sense? And then I
ended with where my interest is lately. And really look,
metrics are always going to improve. Metrics are always going to get better. So always
be thinking about what might be more interesting to you and to your organization
and to your team. If you're looking for books, I love these books.
The first one by a good friend, Mark Raven.
Measures of success how to react less lead better, improve more
Mark is a wonderful person in the lean community,
looking at statistical analysis, statistical controls. How do you actually get continuous
improvement in teams? I've learned a ton from Mark.
The Schuert charts and the examples, I'm sorry, the Schuer charts come exactly
from this book. Understanding variation the key to managing chaos from
Donald Wheeler if you're wondering why 2.667
is the multiplier it's explained in the book. If you're wondering, well, what happens
if the charts are nonlinear, if they're exponential or parabolic,
this book will get into how to handle those types of situations.
Again, we're looking for the story, and we're just looking at how do we create
those upper and lower bounds to find out and separate signal from noise.
And the last book there, principles of product development flow by Don Reinerson.
Again, I can't recommend it enough. A good economics book,
a good way of just looking at u curve optimizations and
other types of metrics that are probably more interesting than just counting defects.
Counting deploys monitoring cpu uptime.
Lots of good stuff inside there as well.
So to recap, you can help everybody get
better metrics. Understand where you're at, how you can improve, and always
think about the questions you're trying to answer and
think about what might be other ways of getting there. Be careful
with metrics. Make sure you see the same reality as
the people you're sharing data with. Sometimes people don't see the same reality.
So we need to talk to data, not talk to emotion.
Make sure we see the same reality and are going to a
better place. Make sure if you're making changes, make sure
your changes actually matter. And fundamentally, maybe you
ask yourself, are we actually learning anything, or are our metrics
just reinforcing what we already think and believe.
That's what I got. I'll be on discord if you
want to chat. Love to hear what questions you have. Thanks for having
me. The slides are at that link. The slides are also
with Con 42. Love to hear what other
metrics you all have and what's working for you. Thanks much.