Avoiding Goodhart's Law - Use SLO's as Tools Not Cudgels

Video size:

Abstract

The concepts of SLI, SLO and Error Budget are there to balance risk (rates of change) and reward (business contentment). Using such metrics as red lines to punish teams, or force acceptance of risk by the business is missing the point. My experiences from SLA’s in service contracts for hospitals inform this conversation identifying that SLI, SLO and Error Budgets are better as a basis for conversations about the stress an application can withstand, and the three dimensions the measures should cover.

This session takes Goodhart’s law from economic policy as a frame for reconsidering SLI’s and SLO’s, and offers a few hints for approaching the negotiation meetings. Leave this session inspired to approach your SLO negotiations in the best possible way.

Summary

You can enable your DevOps for reliability with chaos native. See the concepts of SLI and SLO and error budget. They're there to balance risk and reward risk around the acceptable rate of change. Using such metrics to punish teams for exceeding budgets is a path to failure.
Marco Coulter is an ex CTO who has worked for one of the top 50 international banks. Seeing technology from every side as an operator, a developer can analyst, a vendor, a buyer, and a CTO gives him a unique view on technology. Today's session comes in three chapters.
Marco Polo: Gaming the system means manipulating the rules meant to protect a system. In a prior life back in Australia, he worked for a service provider that supported all of the hospitals in a state. To avoid Goodhart's law, we need to focus on the measure as the target, not the outcome.
SLI, or service level indicators, are the numbers and work better when they are percentiles. Slos should capture the performance and availability levels that, if barely met, would keep your typical customer happy. Defining these ahead of time is critical.
To capture the overall environment, those nested slis supporting the customer experience should cover each of the three dimensions. From our code, we want functional code that does not fail, and additional measures and write downs of technical depth. Note the SLA goal also allows some wiggle room with the SLO.
As we add the infrastructure dimension, things get more complicated. You will be dealing with the full stack and often multiple stacks in multiple locations. What you're looking for is the infrastructure's ability to support, load, and deliver predictable latency. Hopefully that gives developers a sense of balance.
We expect the bulk to occur normally within 30 seconds, well within the technical capabilities of the infrastructure. It just needs to be enough to keep the customer happy. You don't want to set it so high that you're overspending and under innovation.
Third dimension is the business, or if you're a nonprofit or a government body, the customer experience. As we add the business dimension, it can become difficult to measure the full experience. You need to consider all three dimensions for success.
negotiating is a key skill for any SRE. Preparing to engage is about gathering information. Also identify a facilitator. Facilitation is a very specific skill.
The negotiating meeting has a general flow, right? You have your warm up, and here you set the scope for the discussion, the application, the dimensions, the business value. Now it's time to schedule the meetings and get on with your negotiations.

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Are you an sre? A developer? A quality engineer who wants to tackle the challenge of improving reliability in your DevOps? You can enable your DevOps for reliability with chaos native. Create your free account at chaos native litmus cloud G'Day welcome to avoiding Goodheart's law using slos as tools, not cuddles before I start sharing the PowerPoint, let me just cover what we're going to talk about. See the concepts of SLI and SLO and error budget. They're there to balance risk and reward risk around the acceptable rate of change and reward being the business success and customer contentment. Using such metrics to punish teams for exceeding budgets or forcing acceptance of change within the business is a path to failure. And this session is going to give you some hints for success. But first, maybe I should introduce myself. I'm Marco Coulter. I am an ex CTO who has worked for one of the top 50 international banks. I've supported data centers for hospitals and service providers. I've worked for some of the industry's largest vendors. I've lived in three countries and managed teams across 13 countries. I also spent five years as an industry analyst. I ran the data science team organization at four five one research, which has since been acquired by standards and tools. Seeing technology from every side as an operator, a developer can analyst, a vendor, a buyer, and a CTO gives me a unique view on technology. I can read or, sorry, you can read some of my writing or interviews in the publications on the left here or on my website, tech whisperer.com. So enough about me. It's good to have targets, right? Think of Robin Hood. The story where he places a child against a tree and he loads up an arrow and he aims now with an apple was the target on the child's head. This is the story of a skilled archer, but without a target. Without the target, Apple. It's the story of a crazy guy, dangerous guy, shooting arrows at children. So it's good to have targets as long as you use them correctly. And that's what we're going to talk about. Today's session comes in three chapters. I will talk about how I experiences Goodhart's law before I even knew it existed. Then we will think about slis in a better way, across dimensions. And then finally, I'm going to throw a few hints about negotiations, your Slis, and give you some links to further reading. So let's get going. I'll get to good heart's law in a moment, but first I want to share a story of my experience with you, depending on your personality, you will either relax and enjoy my story, or you might already be searching for good heart on Wikipedia. That sort of thing of searching for the answer first is that's gaming the system. And some folks see gaming the system as the smart play. Others equate the phrase to like cheating. I'm more in the second group. For me, gaming the system means manipulating the rules meant to protect a system, to instead manipulate the system towards a desired outcome. In a prior life back in Australia, I worked for a service provider that supported all of the hospitals in a state. In hospitals, nurses, they take lab samples and they get sent to the labs. They are processed and the results get transmitted back to the patient record where the nurses back in the ward can then immediately look them up. Pretty simple, right? By the way, the wards are up on the 16th floor or somewhere high up in the hospital buildings, while the labs are generally in the basement of another building on the campus. So there's some physical distance between the two. Technically, it looked a little like this. The messages from the lab's Unix system would be sent into message queues and the queues would be read by the lab. Update would feed the lab updates into the mainframe system holding the patient records. Now, everything allegedly spoke a common HL seven standard slo. There was never going to be any problems, all different vendors involved. I think you see where this is going. The support of the HL seven standard was not perfect. Malformed messages created by proprietary software along the way would get stuck in the queue. We would then get phone calls from hospitals that they had to go to manual procedures. Now, the backup procedure, if the patient record is not getting updated, was for the nurse to physically run down from the ward to the labs to get the results and bring it back to the ward. That was not optimal as the patient's health was at risk both from the delay and from the nurse being absent. To take care of this, we agreed an SLA that if the message queues got higher than 100, the service provider that I worked for had to refund money back. And that should address things, right? Looking at the thing that we thought was broken. So I coded a bash monitor script, and so when the queue length approached, 100 alerts would go off. Monocor icons would turn from green to yellow to red. As technicians, we were focusing on the measure as the target, the goal. We even built capacity plans around making sure the queue processing got all the power it needed. Now you might think, well, that's great, Marco. Top notch result lab results. Get back to the ward in time, right? The only problem was that we would get these pesky phone calls from nurses in the wards saying the system sucked. They were always having to run down and manually collect results. But the message queues were empty. So the problem was that transactions were often timing out before hitting the message queue. We hadn't seen the whole picture. We were managing the capacity plan, in fact, the whole application to the metric, not to the outcome. Now, years later, I learned that this behavior had a name. Yes, finally, we're going to get to Goodhart. Goodhart was an economist in the U. K. And in 1975 he stated, I'm just going to read this. Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes. It's kind of wordy, I guess. He was english and a politician, and wordy is what you get. Anyway, here is what he meant. Basically, Goodhart's law says that when a measure becomes a target, it ceases to be a good measure. You see, Q length was a good measure of q function. We were managing to that as a target for success, building capacity plans and so on, based on the queue instead of the successful laboratory transactions in patients'records. So how do we avoid Goodhart's law? Well, we needed to let the slis be measures and then use the slos as the goal. So what should we measure? And the key is to stand in other people's shoes, see everything from a few different angles. Hence three dimensions. Remember that SRE is really about balancing the risk of unavailability against rapid innovation and efficient operation. And we embrace that risk by giving it a value through well defined and governed service level indicators, objectives and agreements. As you are identifying slis, we need measures that see the whole picture in the three key dimensions. So let's step through a simple example of each of these dimensions based on that hostile environment. The examples will not give specific slis that you can apply to your environment. It's not meant to. The example is intended to share the thought process. So first, let's quickly review the SLI SLO slas model so that we're talking about the same thing. So SLI, or service level indicators, are the numbers and work better when they are percentiles. Avoid averages, you miss things. In mature environments. Slis will be nested. They will combine from slis that sit against the code and the technology, up to slis that sit next to the customer. They are a defined quantitative measure, a metric you will then set limits on the slis, say an upper band or maybe upper and lower, and that gives us the SLO service level objective. Slos should capture the performance and availability levels that, if barely met, would keep your typical customer happy. They are generally target failures are less than x or a range. Responses will be between x and y. Generally, I like to translate the slos into periodic budgets of between x and y over this period of time that I can track weekly, but that weekly timing might depend on your release cycles and so on. Maybe you can expend it a month, maybe you need it every single day. The slas define what actions are acceptable once the budget gets used up. Defining these ahead of time is critical. There's one additional thing. Full outages tend to happen less these days, so the focus needs to be on sort of slowdowns as well. So rather than traditional uptime availability, I try to focus on the customer domain and their experience, and then use successful customer requests as a general goal instead. Now, to capture the overall environment, those nested slis supporting the customer experience, supporting the CX ones, should cover each of the three dimensions. So first, let's start with the code. Now, from our code, we want functional code that does not fail, and additional measures and write downs of technical depth. Slo, you'll be dealing along the way with multiple languages. So you want to make sure that the metrics will work across them generically, where you can avoid metrics that would only apply to a specific coding language. It's too much extra handling. You're creating work for yourself. I guess you're creating tech debt in a sense, even in the measure. Also, it will not be limited to applications you built in house. So sometimes you're going to have to deal with third party applications like SAP and the ABAP language, or SLAs applications like Salesforce. And maybe what you're managing is environmental like know or your reaction times in pagerduty or something. And then you're not just watching the code transaction errors, you are looking at the configurations and looking for configurations errors again. So some of these nested slis there around the code might indicate yaml code accuracy or something like that. If you're running a Kubernetes environment, you want to avoiding silos of data. Here you don't want the SRE team working off one source of metrics while the development team works off another. You need a single source of SLI truth as part of deciding what you will base the slos on. So for the sample indicator in healthcare, let's embed a few clarifications. Our first step would be to focus on well formed updates. That specifies the transaction that we're looking at. We want to update the patient record and acknowledge completion successfully. That acknowledge of completion specifies the reaction. And in this case, we agreed to using an APM tool, an application performance monitoring as the tool, and that specifies the source of the data, that single place of truth. Now I'm familiar with appdynamics for ABM, but you could be just as easily use Datadog or new relic. In fact, as we are only concerned about code here, you might prefer observability offerings like honeycomb or know this observability segment. I don't know. IBM acquired Astala observability is getting very interesting right now. Now, some SLI books recommend good over bad ratios for slis. If you control all the code, that can work. But in this example, I'm avoiding the fail ratio as there were too many elements out of our control. The HL seven update transactions were coming out of purchase proprietary software running on the lab hardware. We had no way of fixing that code. We had to wait for patches to come in, so we were not going to be held accountable for those failures. Also, the queuing systems were third party software, different vendor SLO. We couldn't tweak them to respond better, to malformed entries, to not get stuck, and we could not be certain that we would reach a point where the HL seven outputs were always well formed, so that the basic code SLI would need to be focused on well formed updates, getting to the code that we were writing and controlling. So that's an SLI down the bottom of the nesting in meetings, but just an example. So for the code SLO, we are already assuming well formed HL seven records so we can set this fairly high. Again, we're being clear about the transaction, the reaction, and the source. And it's often easy to set this goal tools high. Remember, the goal is people say they like to say 100% for slos, but that means there will be no experimentation, no innovation, no risk taking. So it needs to be just over the level that will keep customers happy. Too high. And opportunity costs, opportunity costs are being wasted. Okay, so for the code SLA, we need to apply an outcome now. Note the SLA goal also allows some wiggle room with the SLO. We added a time range here now, and the SLO must be met over a sliding range of 28 days. The SLA should specify what happens when the SLA is missed. Does one department owe the other department a refund? If it's a service provider relationship. It might be something like that of just like just pay us in cash. But perhaps as the SLA is getting missed, something gets locked down. We don't allow changes for the next couple of weeks while we sort out all this nonsense. Or maybe the software release cycle gets automatically frozen for 28 days. It is good to agree those things before you hit the heat of the moment, that you're in the middle of something going wrong and your coders are like, yeah, I think I've got it. I think I can fix this immediately. But your nurses are still phoning up and going, the records are not in the record. What the hell is going on? So the SLA is the part that is negotiations. Now, in a perfect world, this is defined by the business or customer, but in reality it is a conversation. Normally I would not put technical phrases like well formed HL seven in the SLA it would be a customer outcome, but we'll come to that a little bit later in this session when I talk about negotiations. Okay, so you're going to have some slis and slos around code. Hopefully that gives developers a sense of balance, something to measure opportunities to add features and innovate against clearing technical debt against business impact. Now, code runs on infrastructure, so that can have customer experience impacts as well. You have availability concepts of how to support updates to the infrastructure in the same way that you want to update your code, updating operating systems, moving to different cloud or network providers, adding new locations to better support remote customers. These risks to availability and performance need to be balanced as well. So excuse me, this is our new dimensions. As we add the infrastructure dimension, things get more complicated. You will be dealing with the full stack and often multiple stacks in multiple locations. In our hospitals, we had pretty much everything from Windows client applications to Unix Labs and MQ systems and mainframe and systems. And they're all scattered across a state that is physically one third the size of mainland USA in Western Australia, and by the way, not close to any cloud providers. So networks mattered. Some of the nested slis around infrastructure, they could be inherited slis from your cloud network or service providers. Like for code, you want to avoid slos of data here, it's best to have a single source of SLI truth for infrastructure. What you're looking for is the infrastructure's ability to support, load, and deliver predictable latency. Now, we need to include some more things here. The impact of all the infrastructure components into the slis slo part of our nesting process. So we look at the total transaction time as a way of doing that. This would certainly have nested slis for each piece of the puzzle. Can SLI for the labs update, leaving the lab hardware, an SLI for the message queue, adding and leaving. Remember that first one that I described in the start of this session? Can SLI for the labs update, arriving at the patient record system. An SLI for traversing the networks, an SLI for adding it into the patient record. And you might even have an SLI for the database inserts on the patient record system. You can get too crazy with slis. So define slis at sort of system boundaries or team boundaries, so that there can be can extensive ownership there. The strength of system boundaries is that they're less likely to change, but it might be too detailed. The strength of team boundaries is that you can assign responsibility more easily. Now, for slos on infrastructure, you may want to express the slos in the shape of performance curves. Here, we expect the bulk to occur normally within 30 seconds, well within the technical capabilities of the infrastructure. Right. It just needs to be enough to keep the customer happy. You don't want to set it so high that you're overspending and under innovation, under innovative. So, for example, where there's a high system load, you will see that we have a long tail here for about five minutes. At the top of the curve. As we move to the SLA, negotiations with the customer, you see a big jump. We're only committing to the five minute time with them. And this came about after conversations with the ward nurses, and this is a key aspect of what I'm talking about here, that once we'd worked out, after we'd screwed up on the message queue and realized that we weren't making the nurses happy. And nurses are very clear when they're unhappy, by the way. And I went out and talked to the nurses, I went out and watched what they did in the wards, and I went down to the lab systems to see what these people were actually doing so that we could build genuine measures for them of what they actually cared about. So we asked them about time, and we expected them to be like most customers, seeking instant response times. But their view was different. They know it takes time for the samples they takes to be delivered physically from the wards to the labs. So they had a view of overall processing time. For them, the time frame was about beating the time it took a nurse to run from the ward to the Nat lab system when the system was down. Now, that took about ten minutes. So when we offers five minutes, they were happy. That was an important lesson, and it allowed us to avoid unnecessary infrastructure costs and other things. It doesn't always have to be as fast as possible, just as fast as necessary. And of course, when I worked for banks on stock trading systems, it was different. The processing time was a competitive differentiator for the traders and as fast as possible. And never mind the cost was the approach. The dimensions of code and infrastructure are not the full picture, however. So let's talk about the third dimension, the business and customer experience. And I saved the best for last. This third dimensions is the business, or if you're a nonprofit or a government body, the customer experience. This is about the revenue and or service production capabilities of the application. As we add the business dimension, it can become difficult to measure the full experience. You will want to get out to the customer interface, and that may require business integration or mobile platform agents for availability. And to track predictability of response times, you may want to add in synthetic testing tools into your environment. I'm going to keep it a little simpler for our hospital example and just talk about we're sort of just looking at the doctor and nurse experience now. We built and owned the patient record application, the mainframe piece. So we knew we could add in our own specific measure for that piece. And from observing the behaviors and the wards, we worked out that the nurses had an instinctive expectation of when the labs would come back. They would start looking at the record. If the update wasn't, they'd open up the patient record and go, is the update there? And if they wasn't there, they would come back and try again in a few minutes. Repeated record lookups was our sign that we weren't meeting those instinctive expectations for the people in the ward, and soon they would be calling us to complain. So we coded a repeat counter into our patient record application. Now, why did we set up beyond 10 seconds in there as part of the measure? Well, we had just within five minutes at first, and then we had a problem. We kept missing the target even when the records were processing fine. You see, one or two of the nurses would not wait at all. They would just sit there hitting enter again and again and again. So we added the beyond 10 seconds as well as the sort of within five minutes to get around those sort of crazy and patient ones so that we weren't getting beaten up for their behavior. Now, the SLO here is a little different as we want a low number or zero as the outcome. You might have expected a tiny percentage here, but the SLO includes all the malformed transactions coming out of that crappy lab system. So we needed to be realistic with them. In fact, this SLO sort of best everything else in the system. And again, the SLA had reaction room against the SLO. So the eight hour timeframe came from the nurses, as they thought in terms of their shifts that know what happens in the time that I'm here in the ward. So this three way approach, we were really starting to work with the customer towards their success, instead of managing towards a metric or a contract or a specific piece of technology. Now, both parties were using the SLA as a tool instead of a cardinal to beat each other up with. So of course there were many more slis and slas and slas. In reality, with the service provider, it's generally a fairly thick contracts. But remember, my goal here with the examples wasn't to match reality, but to step you through the thought process. You need to consider all three dimensions for success. The slas are not there to beat each other up. They are there to capture the mutual understanding. You reach the mutual understanding through negotiation. Slis, slos, slas, error. Budgets are the tools to support negotiations. Now, negotiating is a key skill for any SRE. There are some great books out there, although they can be a little contradictory sometimes getting to yes versus getting to no. And many are targeted salesforce and closing a deal. I'm personally, I'm a win win negotiator. I want everybody to leave feeling like they won. That's not always possible. So here's a few quick thoughts based on my experience around negotiation, and I'll say it again, negotiating is a key skill in SRE. You may think that you just need technology skills and no, there is more to being an SRE than that. Now, know thyself is not a new idea. It was carved into the temple of Apollo in Greece around the fifth century BC. And knowing myself, it's the best place to start. Consider your level of maturity as an SRe team and as the environment. How much can you control? What risk can you absorb and the business survive and you keep your job? Is that risk spread evenly throughout the year, or do you have peak periods, like a Black Friday or a Super bowl, or a new year's Eve? Are you in a period of significant transformation as an enterprise? Will things be the same in twelve months, or be slo different? As to make today's sois meaningless? Use all this to gather your needs. You probably have a feeling for what expectations the business will have. Hopefully, what will you need to deliver that expectation? Could you accept tougher slas if you could grow your team or purchase supporting tools. Now, when consulting, I try and brainstorm this to identify where my outer boundaries are, what will be unacceptable, or what will be too easy. Preparing to engage is about gathering information. So what you want to do is build a strategic model from your information. There is a people factor here as well. So gather opinions about the people that you will be negotiating with. What are their goals, their aptitude to risk versus innovation, even subtle things like what time of day? Are they more open to ideas or in a better mood? Also, because you're going to schedule your facilitations around that. Also identify a facilitator. If this is going to be you, then read up and practice ahead of time. Facilitation is a very specific skill. Consider bringing someone in. You can bring in a contractor or a service partner who does this for a living. Facilitation is a skill. Or it can be great to bring in a leader from another part of the organization who you know is a natural facilitator, who can park their own ego and needs and draw input from everybody in the room or in the meeting. And actually that can give that volunteer career profile within the company as well. It'll let them see other sides of the company and the other sides of the company to see them. So it can be a win win. I said, I like win hints. Now it's time to schedule the meetings and get on with your negotiations. The negotiating meeting has a general flow, right? You have your warm up, and here you set the scope for the discussion, the application, the dimensions, the business value, and what they will get out of participating. Be brief. They're all experts in some aspect here, so you don't want to turn over every little stone yet. Just get everybody to talk a little bit. Ask them to spend one or two minutes describing their aspect. The nominator facilitator should politely close down anybody who starts to exceed a brief introduction. And that's why it's good to have sort of an outsider, that it doesn't create resentment in the room. It's just like people get controlled. Then you hit your test drive, and that's where you present what some of the indicators could under consideration, at least could be. Give one specific example of an SLI can SLO and SLA to clarify for them again. So in the same way that although pretty much everybody attending this conference would know the basics of slas, SLA, I took one SLI to just step through to make sure that you and I were working off a common understanding of them. And you're testing the water when you do this in the negotiation. So try to make this something that could live on your example, Slis that could live on in the final agreement, something real. Now you want to assess. So you've test driven something with them, now they have something to talk about and assess the business value. Is this the best place to start? Will pursuing this scope give return on investment? If you're a revenue based organization, do you need to balance innovation and risk on this application? What actions will be effective for missed slas? Will you just freeze changes for a while to return to stability, and if so, for how long? An important part of this phase is extracting and capturing assumptions. Clarifying assumptions is why footnotes on slas can sometimes be longer than the actual SLA. And don't be surprised if you end up that way. That the definitions and footnotes explaining things in the SLA is more detailed than the SLA. Then we reach the point of proposing. So now you have more information for everyone, you've updated your test drive, you can make a new proposal now. Now a few hints here. Predictability is often more important than speed, so the higher variance in response times, the more user experience is negatively affected. If they know it's going to take 60 seconds every time they can, I don't know. Can you make a cup of tea of coffee in 60 seconds? Probably not, but they know that that's coming. But if it's most of the time taking 15 seconds, and then, but every once in a while taking five minutes, that's a negative experience. So avoid spreads that are greater than six standard deviations of the metric. Assuming you have the metrics tracked already, greater than six standard negotiations is an indication of low capability of process. It's not good. And now you've proposed and we discussed it, now you recur. Now you assess the new proposal like you did for the test drive. You expect to iterate or recur through these takes, as this is where the real negotiating occurs. You may need to schedule follow up meetings now. For each meeting, always take a few minutes to revisit the warm up, restate the scope and goal, recap the conversations to date, and try to acknowledge something from each person during that warm up piece. Again, during the warm up, you're trying to get them in the room and feeling like they belong and that they are a participator because you want them participating and feeling respected. Then of course, the wonderful thing. Eventually you reach the agree. This is the final presentation of a finished SLI SLO SLAs for sign off. That doesn't have to be a physical signature, but it is worth saying that you will need everyone to. As an example, you need everyone to confirm the email. They will need to commit on the record. If they're reluctant, it means you missed something during the assess phase. Return to that and try again. This is the process of the negotiating meeting or meetings. Sometimes this might only take ten minutes and sometimes it might take three or four meetings. Okay, so thanks for coming on that journey with me. Here's what I hope you take away from this session. Learn from my experience. Don't manage to the metrics. Focus on the outcomes, the full transaction, the complete process, the overall experience. Don't use service levels to beat each other up. Use them to become preemptive. Use them so that you can offer more services ahead of time. And when you build out your service levels, remember to assess them against the three dimensions. Are you seeing the full picture? Is there some critical aspect that's being overlooked? One of the most critical things around customer experience that I have learned is predictability. With higher variance in response times, the more user experience is negatively affected. High variance is also an indicator, as I say, of low capability process. So keep your eyes on the transactions that are outliers. Outliers annoy the crap out of us and they annoy the crap out of users as well. SLo finally realize that if you want to be great at SRE, you will need negotiation skills. Negotiations is useful for life and it's useful for SRE. So as I promised, here are some links on reading on similar topics. SLO for those people who've swapped screens, this is the time to switch back to the screen and take a snapshot of the links here. You can catch up with most of my thoughts on my website and if you found the session interesting, then please feel free to connect with me on LinkedIn or Twitter. I may have more interesting thoughts tomorrow. You never know, so it might be worth a try. This is not my first time with the session, so I love feedback. Feedback would be great. So what caught your attention? What was important that I missed? Because I'm sure there's something that I've left out. But with that, I want to thank 42 and sre.com for hosting this session. I want to thank you for your time today and I will see you in the Discord channels. You can reach out to me there as well if you wish. I hope you have the rest of the event and the rest of your day is fantastic.

Slides

Download slides (PDF)

See all 48 talks at this event!

Conf42 Site Reliability Engineering 2021 - Online

September 30 2021

Avoiding Goodhart's Law - Use SLO's as Tools Not Cudgels

Video size:

Abstract

Summary

Transcript

Slides

Marco Coulter

Technical Evangelist @ Tech-Whisperer.com

Join the community!

Featured event

2025

2024

Info

Conf42 Site Reliability Engineering 2021 - Online

September 30 2021

Avoiding Goodhart's Law - Use SLO's as Tools Not Cudgels

Video size:

Abstract

Summary

Transcript

Slides

Marco Coulter

Technical Evangelist @ Tech-Whisperer.com

Join the community!