The Murphy's Laws of Observability

Video size:

Abstract

We’re all familiar with Murphy’s Law. “Anything that can go wrong, will go wrong.” And over time, Murphy’s Law has been extended, abstracted and applied to numerous disciplines. Within Observability, the extension of data and monitoring focused on deep insights in our complex apps and environments, Murphy’s Law also reigns supreme. But what laws apply and how do we mitigate impact of points like: * Anything that can go wrong, will go wrong, at the worst possible time * Computers always side with the hidden flaw * You can never run out of things that can go wrong * Availability is a function of time

We’ll look into examples of the laws in practice and highlight practices that might just give you the one-up on Murphy.

Summary

Dave McAllister is the open source technology evangelist at NginX. He talks about Murphy's laws for observability. You can enable your DevOps for reliability with chaos native.
There are lots of Murphy's categories, and in fact I'm introducing one now here. There are things like Murphy's technologies laws, or Murphy's military law. People are constantly coming up with new things that make sense for a Murphy's law approach.
If you perceive that there are four possible ways in which a procedure can go wrong, a fifth, why unprepared for will promptly develop. Our jobs are to try to both mitigate what could happen has well as be prepared for that fifth way to show up. We need to know what's going on at all times.
Observability is really all around data. It's the deep sources of data that let us see what's going on inside of our systems. And interestingly enough, observability really is a proxy for customer happiness.
Every solution breeds new problems. Now that we've got this new data, we now have a whole new class of issues. With microservices, a service that is as small as necessary, we find that traditional monitoring can't save us anymore. Complex environment means we need to probe deeper.
observability lets us start monitoring for those things called unknown unknowns. When we move into that last category, we're not even aware something could occur. Metrics are the piece that lets us know when something has gone wrong. But nothing is as easy as it looks.
Testing for production is not the same as running in production. Whatever can go wrong will go wrong and it will show up when you hit scale. How do you keep track of what's going on?
Data is only as useful as you can aggregate it, analyze it, visualize it, and respond to it. Observability needs both accuracy and precision. But again, that aggregation and analysis can skew this behavior.
Open telemetry covers traces, metrics, and is in the process of covering logs. The collector architecture allows us to tie those together. This means if you already got solutions in place, it's very easy to pull to an open telemetry without disrupting your current work.
Numbers are tools. They are not rules. Prediction is only as good as the data. Extrapolation is better closer to the point of contact. Make sure that your use of observability brings into the capability of sharing the data, not just a result.

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Are you an SRE, a developer, a quality engineer who wants to tackle the challenge of improving reliability in your DevOps? You can enable your DevOps for reliability with chaos native create your free account at Chaos native Litmus Cloud day my name is Dave McAllister and I'm here to talk about Murphy's laws for observability. This is going to be covering a couple of interesting concepts, both of how we look at observability, as well as how Murphy, which were all familiar with, apply to what is driving our new need for this new paradigm of observability here. I'd like to thank you for joining me today to let me to listen to me talk about observability, as well as I'd like to thank Conf 42 for giving me the chance to get up here and talk about this. But first, I'm Dave McAllister. I'm the open source technology evangelist at NginX, part of f five, and my role is to help people understand both how to get involved in the Nginx open source projects, of which there are actually a lot of them now as well, has how to best make use of the open source aspects I am an open source geek. I started with Linux in version zero point 93. I've also been a standards walk, and as you can probably tell, I'm perfectly willing to talk. But almost anything at drop of a hat. Here's my LinkedIn. You can find me on LinkedIn at Dave Mack Davemc, and I'd love to hear from you. However, nobody is just their job role. So let me share a couple of other interesting data points, or interesting to me at least for things. One, I'm owned by three cats, maybe four, because we now have a little kitten who has moved into the backyard that were trying to figure out how we can tame her. But I am owned by cats, which means I am absolutely used to being ignored. I have also spent ten years as a soccer ref football for those of you in Europe and other sensible places in the world. So I'm also used to people disagreeing with me. So feel free to do either one of those things, but I'm hoping that you won't spend too much time ignoring me. So let's start with Murphy. Murphy's law is very simple. Whatever can go wrong will go wrong. And this is something we're all familiar with. We constantly see things that we think were going to go right, and all of a sudden something has changed around that. But when we add the first corollary to this at the worst possible time. Then life starts getting interesting. It's not just enough that something has gone wrong, it's always when it makes a major difference. And so we need to start looking at how we can sort of mitigate some of this impact of what Murphy is doing. And part of that comes into this whole concept around observability. There are lots of Murphy's categories, and in fact I'm introducing one now here. Murphy's for observability here. But there are things like Murphy's technologies laws, or Murphy's military law. The military law. One of my favorite ones is if you need to find an officer, take a nap. By the way, that works just as well for vps in high tech as well. Or the technology law logic is a way of arriving at an incorrect conclusion with absolute certainty. But you'll find them on love, on cooking, on cars. And there are spinoffs such as the axioms and admissions, as well as even into our humor environments. 70 maxims of maximally effective mercenaries so Murphy has a big impact on all sorts of things. And people are constantly coming up with new things that make sense for a Murphy's law approach. But let's jump into it. Murphy's law for observability number one, if you perceive that there are four possible ways in which a procedure can go wrong and circumvent these, a fifth, why unprepared for will promptly develop. So far I have yet to hear a better description of the life of an SRE. Our jobs are to try to both mitigate what could happen has well as be prepared for that fifth way to show up. So when we start looking at this, this becomes the necessary points to make sense of our environments. We need to know what's going on at all times. And not just the oh look, the lights are blinking approach. We need to be able to look and see what's going on and how it's impacting our users, our systems and our environments at any given time. And that leads us to this concept of observability. Observability is a hot topic in the SRE world, in fact, in almost all of the technology world here. And observability is really all around data. It's the deep sources of data that let us see what's going on inside of our systems, inside of our applications, all the way from the ground, all the way up to the user viewpoint and user journey experience going through this overall system basis here, generally speaking, you'll hear it as metrics. Do I have a problem traces? Where is the problem and locks. Why is this problem happening? So detect, troubleshoot or root cause analysis. However, observability is not limited to those classes of data. Observability can and should make use of any data that's necessary for us to be able to understand and infer the operation of our underlying system. And in fact, observability is about those deeper sources, the new sources of data, and the data that tie our environment together, that lets us understand at each point in time what's happening here. Don't limit yourself just because you have a source of data. Be ready to look at more sources of data inside of were. And interestingly enough, observability really is a proxy for customer happiness. If we can understand what's going on and understand the driving influence on our user base, then we can actually use observability data to help us understand their experience. Down at the bottom, observability has been around for a while is what the engineering definition is designing, defining the exposure of state variables in such a way to allow inference of internal behavior. We've expanded that our internal behavior now encompasses a lot of different points, and we need to be able to also correlate across those. And that leads us to Murphy's observability number two here. Every solution breeds new problems. So now that we've got this new data, we now have a whole new class of issues that are coming into play here. There's also the underlying concepts of where this data is coming from. Why do we have all this data here? Well, this is the Knievon framework. Knievon framework is a way of approaching and look at the transition over time of an activity here. And we start with simple we used to have monolithic systems, and we used to have single source languages, and we used to be able to look at the blinking lights and say, oh look, things are working, so things must be fine here. But now we've gone into a cloud environment. A lot of things have moved to public and private clouds here, which means we now have elastic and ephemeral things change. Things may not be were when we go to look for them. Therefore, failures don't exactly repeat. And because we now have microservices, a service that is as small as necessary, that's being pulled together through a loose communications mechanism, were we find that debugging is no longer as traditionally capable as it was. Therefore, traditional monitoring can't save us anymore. What we've now done is something that you would never do in math class. You've changed two variables at the same time here we've added the complicated world of microservices, where they run, how they run, how many of the services are running at any given times with the elastic ephemeral behavior of cloud environments or orchestrated environments here that give us a chaotic model. It's not there. In fact, we already planned that it wasn't going to be there very quickly for this. That have led us to this complex environments. Complex environment means we need to be able to probe deeper, we need able to sense better, and we need able to respond in ways that may not be as clear as they used to be. In a monolithic phrase. This is massively important and it is the driving change for what is creating this buzz around observability. And that leads us to Murphy's observability number three here. You can never run out of the way that things can go wrong. And the octopus riding a unicycle, juggling balls, is a perfect example of this. We got lots of things going on. There's lots of balls in the air at any given moment here, we're now keeping track of not only the virtual environments, the communication environments, we're keeping track of our orchestration environments. Kubernetes as an example for that. We're keeping track of the applications and the application pathways can change every single time that a transaction crosses those pathways. And that gives us this thing observability, lets us start monitoring for those things called unknown unknowns. So when we know something could happen or we're worried about it, we know to watch for it and we probably understand what caused it. This can be running out of disk space or running out of memory. We can watch for those things and we kind of know what we're doing here. We can also be looking at things that we are aware could happen, but we don't necessarily understand why they happened. These can be outside influences for this. When we get into the unknown capability, things can happen that we are not aware could happen, but when they happen, we can immediately understand, oh, that's why that happened. And that's an unknown known. But when we move into that last category of unknown unknowns, we're not even aware something could occur. And when it occurs, we don't understand why it occurred. And so observability gives us the ability to do that forensic exercise. Now let's just move back, in a sense in time to see what was going on and basically best infer and understand, to deduct what happened at that given moment. Once we've done this, we now can move this into the next category. We are now aware that something could occur, maybe move it into a known category. We may still be a known unknown. We may not understand why it occurred, but we now know to watch for it because it has occurred. That means it could occur. And once we've done the forensic exercise and the resolution, we actually want to make sure that we don't run into the same category of having to start over again. And so, observability gives us the ability to move from the unknown unknowns into the known unknowns, and even into the known knowns. Try saying that really fast, three or four times. So, sounds really great. But Murphy's number four tells us nothing is as easy as it looks. And that's because we're building in two different ways here. This is a microservices architecture. In fact, this is an ecommerce architecture. It's got checkout services, it's got the Internet coming into a front end, it's looking at cart services, it's emailing things out here. It could even be doing recommendations. There's lots of things. If you've ever touched any of the major ecommerce environments, you're probably seeing a front page environment for a single product that's somewhere in the neighborhood of but 43 microservices. Every one of those microservices connects to probably somewhere between four to eight additional microservices, and that's for a single transaction. Now imagine that you are scaling that, that you suddenly have 100,000 transactions going into your system at one point in time, and that 43 plus each of those additional pieces here now has to scale to manage the volume. We'll talk a little bit more about scale here. The headache is that every time we look at one of these things, we need to know where in the cycle we are when the problem occurred. That becomes incredibly important information. Nobody really can grasp the entire architecture in a gestalt, in a single picture viewpoint. And so we need to have our capabilities, our services and our tools help us understand that. And that's what's led to things like service maps, where we can see how the services connect, where the transactions go to, and what's happening in each independent transaction. We can also start looking at what the metrics are telling us. Metrics are the piece that lets us know when something has gone wrong. And so metrics are incredibly important. Were, and then even into the transaction level, we can look at something called red rate era duration, one of my favorite monitoring patterns of all times for this. And that will actually help us understand what the user's experience is. Keep in mind that we do have this concept where users are unique in individuals. They really only care about this transaction, the one they're looking at right now, how long it took and whether it was successful or failed. And so red gives us the ability to look at that in concrete overview, so we can look at the aggregate model and then we can drill into it should something show up. So for instance, my red environment here is showing a 25% error rate. I would love to know what's causing the 25% error rate. I would love to know who is it impacting my service map, if it's smart, can actually show me where things are not going through. That's a little red dot that you're seeing here, but I now understand the flow of the transaction and where its stoppage points are. So metrics tell me something not looking right. Traces are showing me where something might not be looking right with this is that added complexity? Now we've talked a little bit about these here with cloud based lack staticity. When we see a single service, that single service is not necessarily a single instance. That service could be multiplied times the number of elastic pieces needed to meet the scale. And because it's no longer necessary to scale the entire thing, here's my monolith. I'm running out of space. So here's my next two monoliths. It's now just scale a service that is having problems. And so our scaling becomes different. Our scaling is not random, but our scaling does come into play here around making sure that the right pieces are scaled the right time. With scaling up comes scaling down, and so things can disappear or reappear based on workloads. We're now moving into this thing called ephemeral behavior. This is where we get into serverless and serverless functions. And so you can look at this as such as AWS lambdas or Google functions, but these things are now designed to not be there. And when we start looking at the ephemeral capabilities here, the serverless capabilities, it's not unusual for warm start AWS lambda to be about 30 milliseconds and for the complete execution time of the lambda to be about 1.2 seconds. And so we can see a lot of serverless behavior that's not there anymore by the time you look for it, it's not were because we're now also in multiple environments, multiple virtual machines, multiple containers, pods, worker nodes. We can also have these two concepts of called drift and SKU. And we'll get a little bit more into drift and SKU in a moment. But imagine that we've got to bring all these things together to be able to correlate them. Timestamps are the way that we try to correlate them the best. We can also look at transaction ids or trace ids and so forth. But to know what's happening at a given moment of time, we have to be able to align on that time. So we need to be able to make sure that we understand how the drift is happening between systems and how the systems are getting skewed by the changes in the data over a time period. So, okay, if life isn't more complex, things get worse under pressure. Murphy's number five for that. And that's because we're scaling and our scale is massive these days. Yes, we do tend to start off small, but I'll also tell you one of the things is that testing for production is not the same as running in production. I don't care has engineering whatever you want here. Testing for production will catch a lot of problems. But keep in mind, whatever can go wrong will go wrong and it will show up when you hit scale. So in this environment, I'm looking at 2247 instances for this, and I can't watch 2247 instances. And so I need to be able to look at this from a tooling basis here. I'm looking at this nice little picture viewpoint. I can tell you that I've got some hotspots that's a little sort of reddish dots inside of were. I can drill into any dot. The data is there for tell me what's going on when I choose to drill into it. But in the meantime, I've also got to be able to take a look and see all the different things that are happening. But thinking about the scale here, this is a very simple picture and it's only one viewpoint. This is simply the scale for Kubernetes. We have Kubernetes objects, secrets, namespaces, nodes, ingress points. We have pod churns and pods versus nodes. Inside of here. We actually have the containers now inside of pods. So our scale is multidimensional and our scale unfortunately does not decrease. So this is one piece of the picture of what our scale looks like. This is not the underlying virtual environments. This is not the application environments with built on microservices. And it's not necessarily the communication environments. This is just the Kubernetes led scale environment, which leads us to .6 here. If it's not in the computer, it doesn't exist. And I love dealing with this one in some ways, because this one is one that most times most people will nod their head yes, if you didn't keep track of it, it never existed in the first place. And so why does bad data happen to good computers for here? Well, one of the things you'll hear, particularly when you have as much data, has we're now throwing at you, is this thing called sampling. And sampling is very useful here. And you can have lots of sampling, you can have lots of capabilities for cutting down the amount of data that you are receiving or keeping. But here's an example. The first one is a sampled environment. The second one is a non sampled environment. They are the same environments. They're running reasonably close to the same. They are hitting about the same hot points at points in time. However, the first one is doing a traditional head based sampling approach. I'm going to grab a sample someplace inside of here. And it came back and told me my latency based on the tracing effort here, was one to 2 seconds. Piece of cake. However, 3.7 seconds is considered to be where people will abandon their shopping carts, where people will abandon their pages and go someplace else. Here, when we look at the not sample data, we actually discover that our 95th percentile, our traces are running somewhere between 29 to 40 seconds. We have unhappy customer, at least one in this particular case. And so when we look at this, there's two things we also look at. The first one, the sampling, sampling didn't show us. I think they show us one error during that sampling point, because again, when you've sampled, you've got to grab this. No, sampling shows me that. I've got lots of errors showing up here, some of them significant. Before we get into it, however, you're going, well, okay, so I don't sample my metrics, so I know that were are errors. But what do you do then? You now know there was an error, but you didn't keep the data. How do you keep track of what's going on? Oh, well, okay, if I saw an error, then I saved the data. Think back to that unknown. Unknown. We don't even know necessarily what was an error until we get a chance to figure out post vac that were was an error, and then we need to be able to go back into it to figure out what's going on. So sampling is useful but problematic. Keep track of where you are and make sure that you get the data you need and the results you need at all those times. Because as Murphy seven tells us availability is a function of time, and the speed and resolution of our data impacts the insights you get. So again, I need to discuss something that's a little bit problematic, but pretty much straightforward, and that's this concept of accuracy and precision. Quite often in technology, we tend to use those terms interchangeably. They aren't. So accuracy is that the measurement is correct, that we correctly measured the results. Precision means it's consistent with all of the other measurements. So consider that you're target shooting with a bow and arrow, and you shoot six arrows, and one of them nails the bullseye and the other five are randomly scattered from ring three to ring five, maybe even outside the rings here. My God, you were accurate, but you weren't precise. And so which of those measurements was accurate is a challenge. For here, precision means that was consistent. So take those same six errors and group them within a two inch circle, all in the outer ring. Amazingly precise, completely not accurate. Observability needs both. It needs accuracy and precision. But again, that aggregation and analysis can skew this behavior. Remember back we talked about SKU drift and SKU? So when we look at that, come on, when we look at this, this is how you can actually miss the target for this here, I've taken pretty much 1 second at a time, requests coming in per second, and I think I've got ten of them here. My ten second average is 13.9 requests per second coming here. My 95th percentile over that 10 seconds is 27.5. The first five you can see, the average is 16 and 29, the second 511 and 19. However, if all you looked at was at the aggregations, you would have missed the fact that there is a one of these that actually crossed my trigger environment that oneup of them went up to. But 32, 31, 32, I can't remember the exact numbers off the top of my head here. We need to be able to look at every single data point, not just the aggregations in here, particularly when we're looking at alert capabilities. Yes, you need to be able to tailor your alerts. You don't want to be thrashed to death. But keep in mind that when you see an alert, you need to be able to use either AIML technology or your own knowledge to determine how critical that alert may be and get it to the right place. Part of the issue is that our precision resolution impacts actually the data that we're seeing in terms of that precision I'm talking about here. So if you're picking something that's being sampled or being not sampled, but bad sample is a bad word that's being chosen every second. Then the second that you're seeing this is somewhere between those two. If you report on a second basis here, you're not actually quite sure where in that second that data point is. When we actually have data now being produced or actually transmitted, telemetry wise, in the nano and picosecond ranges here, suddenly this becomes a very large issue. So keep in mind that aggregation is not your final point. Incredibly useful for your visualizations were we've got lots of data. Data is only as useful as you can aggregate it, analyze it, visualize it, and respond to it. So Murphy's number eight, if it can go wrong, it will. This one, this one has burned me a few times, has. Well, for this, and to keep track of that, we now have this larger, complex picture of the technology. We now have front end users, and we have web applications that are now living in the front end. We have back end systems, and the back end systems are made up of lots of different pieces. We have supply chain issues, we have packaged apps driven, connected to microservices. We have hybrid environments with on prem, with cloud. We have networks all over the place here, and then we have containers and orchestrations. Fortunately, we've got the data to allow us to figure out what's going on with each of these pieces. And so, synthetics, this helps us test our environment against a known pathway so that we can see if we're improving or worse. When we look at this, remember, the user only cares about his or her personal experience. Then we have user monitoring, real user monitoring, which lets us track a user's experience going through the system. We have endpoint monitoring, where we know where they're coming from, a mobile device or an IoT device in a car, going through a cave, or from a desktop as well, has being able to look at all of the different things that make up that underlying environment. But then we need to be able to aggregate it, analyze, visualize, and respond in any of the ways that are necessary. So here we come into dashboards, and here we start looking at application performance monitoring. How is the application performing? We look at the infrastructure monitoring, we look at incident response, the alerting structures here, we look at code profiling, what's happening inside of our code here. And in all of these cases, were still dependent on looking into that data set for the final thing for that root cause analysis, which is still probably going to be a log environment crossing. All of this is network performance, and so we have to have network performance monitoring that goes from end to end so that we can truly understand the user environment. So with fees, number nine, whenever you set out to do something, something else is going to have to be done first. I can't tell you the number of trips I've made to the local hardware store because I started a project and then suddenly realized that one, I didn't have something or two, the thing I thought I had wasn't any good anymore. In particular, I can tell you it's pvc glue and plumbers putty are my two nightmarish conditions that I always end up running to the hardware store to get. So when we look at this, one of the things that's happened is that we have changed. We had observerability 1.0. These topics have been around, collecting logs has been around, collecting traces have been around. Collecting metrics has been around. The problem is that we need to be able to correlate them, and each of them was being handled through a separate agent into a separate back end. Fortunately, now we have approaches, the observability 20, which does that correlation. And it's heavily driven by this thing called open telemetry. So open telemetry is the next version of both open tracing and open census, two open source projects that merge together to create a unified environment to produce the data necessary for observability. And if you want to get involved, open telemetry community on GitHub will get you a great start. It'll introduce you to the concepts, the whole works. It's the place you want to consider starting were. But I also want to talk a little bit about one specific piece. Open telemetry covers traces, metrics, and is in the process of covering logs. So it's bringing those three classes, the data together. But it didn't want to disrupt existing practices and so we had observability 1.0. With their separate back ends and their separate agents. The collector architecture allows us to tie those together. You can bring it in in whatever protocol you want. You can actually process it inside the collector, should you decide to sample, should you decide to apply machine learning, any of those things can be done inside the collector. Keep in mind that as you add things to collector, it becomes more heavy weight and then you can pass them out into anything you want. So you can bring it into the open telemetry protocol and pass it out as a Jaeger protocol or as a Prometheus protocol. You can bring Prometheus in and send it to both to a tracing environment as well as to a metrics environment or keep the logs in. Plus you can bring in fluent d and bring all those pieces together. So the collector architecture allows me to be incredibly flexible about the types of data I collect, as well as the methods by which I collect them. This means if you already got solutions in place, it's very easy to pull to an open telemetry without disrupting your current work. So I want to cover a couple of axioms. We've covered nine of the Murphy's laws, but let's start with this one. This is the Ashley perry statistical axiom. Numbers are tools. They are not rules. And quite often we tend to treat numbers as rules, and that's dangerous. Again, think back to that. Prediction accuracy. But basically, we tend to tend to use things as predictive behavior. And honestly, yeah, sometimes you just want to know what's coming inside of here. The problem is that prediction is only as good as the data. Precision and accuracy. Flashback to that. Did I measure the thing? Is the measurement correct, and are the measurements all in the right alignment? So, are they both precise and accurate? But we find this most heavily used for things like historical versus sudden change, where we can look back and say, okay, on Mondays in the last four weeks, the median environment says it should be here. Therefore, we're out of range. Or we can look at it and say, hey, wait a minute, this thing's suddenly gone along. And all of a sudden, we've seen a jump in transaction request. The latency has gone up fivefold. So this starts giving us the ability to look at some of those things. If your trend is stationary, either it's a standard flipping line or it's a flat line. Yeah, probably pretty safe. But when you start looking at predictive behavior, you have to be ready to expect false positives and false negatives, and so things will not necessarily be absolutely precise. Extrapolation is better closer to the point of contact. The farther out you go, the less likely that you can successfully predict that. Baker's law, misery no longer loves company, now insists on it. In our SRE worlds, in our DevOps environments here, everybody is responsible for things running correctly. For that, this is now a shared issue. It's not. Oh, operations needs to fix that. It's where the devs get involved. And this leads really closely to observability, because we need to be able to exchange information as well as not repeat forensic steps. Observability gives us all this data. Each of the forensic steps, because we are in correlation mode, means people don't have to go back and rediscover things. So having all the data gives us that capability, as well as having the ability to share the previous environments at any given time. Make sure that your use of observability brings into the capability of sharing the data, not just a result, but sharing the data, its context and its correlation. Hill's commentaries pretty there are four of them here, but I love the fourth one. If it doesn't matter, it does not matter. So if something breaks and nobody cares, we don't care either. For know, literally, if the machine isn't running and we don't get any complaints, then probably nobody even knows the machine is not running for this. The problem is my corollary here, it doesn't matter. It does not matter until it does. Flashback to our unknown unknowns environments here. Flashback to that concept of we don't know when things are going to go wrong, we don't know why necessarily they're going to go wrong. And in fact, the only things we can guarantee is that something sooner or later is going to go wrong until it does. When that happens, you need to have all of the data, all of the observability data, not data that's been sampled, not data that's been been filtered, not data that's band bandwidth. Make sure that you have all of the data here. Observability gives you the ability to have all that data. Figure out how you can best make use of it. And finally, Murphy's law number ten. All's well that ends. And with that, I'd like to thank you for listening here. Again, I'm Dave Mack on LinkedIn and I would love to hear your thoughts and ideas around Murphy's laws on observability. If you've got a law that you believe applies to observability, please share it with me. Or honestly, if you think that I need to expand or don't agree with me, I'd love to hear from you that has well, so with that, thanks again for listening and enjoy the rest of the show.

Slides

Download slides (PDF)

See all 48 talks at this event!

Conf42 Site Reliability Engineering 2021 - Online

September 30 2021

The Murphy's Laws of Observability

Video size:

Abstract

Summary

Transcript

Slides

Dave McAllister

Senior Technical Evangelist @ Splunk

Join the community!

Featured event

2025

2024

Info

Conf42 Site Reliability Engineering 2021 - Online

September 30 2021

The Murphy's Laws of Observability

Video size:

Abstract

Summary

Transcript

Slides

Dave McAllister

Senior Technical Evangelist @ Splunk

Join the community!